Multi-Dendrix Logo

Table Of Contents

This Page

Modules

Each of the following modules implement one the key functions in the Multi-Dendrix pipeline. Each can be run as a standalone program, or imported and used as part of a larger pipeline. The Multi-Dendrix package has the following structure.

Multi-Dendrix

This is the main module that implements the integer linear program (ILP) described in the Multi-Dendrix paper. It includes the functions for loading mutation data and interfacing with CPLEX to calculate the optimal collection of gene sets for given parameters.

Requirements:
ILP(mutation_data[, t, k_min, k_max, alpha, ...]) Implementation of Multi-Dendrix ILP. Sets up ILP, uses CPLEX Python to solve it, and parses the results.
A(patient2mutations, gene, patient) Mutation matrix accessor function (see ILP()).
W(mutation2patients, gene_set, alpha) Calculates the weight W of a set of genes, defined as the weighted
load_mutation_data(file_loc[, ...]) Loads the mutation data in the given file.
load_mutation_data_w_cutoff(file_loc[, ...]) Loads the mutation data in the given file, restricting to genes with a given mutation frequency.
white_and_blacklisting([patient_wlst, ...]) Reconciles the different white- and blacklists provided as input into Multi-Dendrix.

(Sub)type-specific mutations

(multi_dendrix.subtypes)

This module analyzes mutation data for genes (or mutation classes) that are targeted in certain (sub)types more than others. It uses Fisher’s exact test to perform the statistical test (described in multi_dendrix.subtypes.subtype_specificity()).

Requirements:
subtype_analysis(mutation_data, ...[, threshold]) Performs analysis for subtype-specific genes or mutation classes in given mutation data.
subtype_specificity(gene, patient2ty, ...) Performs a statistical test on a gene for each given (sub)type.
ty_contingency_table(ty, ty2mutations, tys, ...) Constructs the contigency table used by Fisher’s exact test for subtype-specific mutation analysis.
keep_significant(gene2specificity, threshold) Removes all associations in the input dictionary that are not significant below a given threshold.
load_patient2ty_file(patient2ty_file) Loads a file mapping patient IDs to their respective (sub)types.

Permute mutation data

(multi_dendrix.permute.mutation_data)

This module includes functions for permuting mutation data. Permuted mutation matrices are used to calculate the statistical significance of collections identified by Multi-Dendrix. Mutation data is first represented as a bipartite graph, where edges represent the mutation of a particular gene (or mutation class) in a particular patient. A description of the method for permuting the data is described in permute_mutation_data.permute_mutation_data(). Note that the mutation data provided as input to this module should be restricted to only the genes and patients used by the Multi-Dendrix algorithm after processing.

Requirements:
permute_mutation_data(G, genes, patients[, Q]) Permutes the given mutation data stored in bipartite graph G=(V, E) by performing | E | * Q edge swaps.
construct_mutation_graph(mutation2patients, ...) Converts mutation data stored as dictionaries into a bipartite NetworkX graph.
bipartite_double_edge_swap(G, genes, patients) A modified version of the double_edge_swap function in NetworkX to preserve the bipartite structure of the graph.
graph_to_mutation_data(H, genes, patients) Converts a bipartite NetworkX graph representing mutations in genes in different patients into the mutation data format used by Multi-Dendrix.

Matrix permutation test

(multi_dendrix.evaluate.matrix)

This module contains the functions for performing the matrix permutation (statistical significance) test. The matrix permutation test uses an empirical distribution of mutation data (generated by the permute mutation data module above) to evaluate the statistical significance of a collection of gene sets identified by Multi-Dendrix. Note that the matrix permutation test does not filter genes or patients (e.g. with the multi_dendrix.white_and_blacklisting() function). The mutation data provided to the permute_mutation_data.permute_mutation_data() function should already be restricted to the genes / patients used as input to Multi-Dendrix.

Requirements:
matrix_permutation_test(W_prime, ...) Computes the statistical significance of a collection found by Multi-Dendrix using an empricial distribution of mutation data.
load_w_prime(collection_file) Loads the weights and sums them from a Multi-Dendrix output file.
load_permuted_matrices(input_dir) Loads all files as mutation data in a directory of permuted mutation data.

Permute protein-protein interaction network

(multi_dendrix.permute.ppi_network)

This module permutes a protein-protein interaction network while retaining its degree distribution. Permuted PPI networks are used to empirically determine the enrichment of collections (and individual) gene sets for protein interactions (as described in the Direct Interactions Test below).

Requirements:
permute_network(G, Q) Permutes the given graph G=(V, E) by performing | E | * Q edge swaps.
load_network(network_file) Wrapper for the NetworkX read_edgelist function.

Direct interactions test

(multi_dendrix.evaluate.network)

This module implements the Direct Interactions Test as described in the Multi-Dendrix paper. For a given collection of gene sets, the Direct Interactions Test assesses its enrichment for interactions within gene sets in iRefIndex compared to an empirical distribution of permuted iRefIndex networks. Optional: average pairwise distance can be used as the test statistic instead of the number of interactions.

Requirements:
evaluate_collection(collection, G, Hs[, ...]) Given a collection of gene sets, the original network, and a set of permuted networks, calculates the empirical p-value using the direct interactions statistic (or average pairwise distance statistic).
direct_interactions_test(collection, G, Hs) Performs the direct interactions test on the collection of gene sets.
direct_interactions_stat(network, collection) Calculates the difference of the normalized number of interactions among genes within the same gene set and genes within different gene sets.
count_interactions(network, pairs) Given a PPI and a list of gene pairs, returns the number of genes that interact.
interact(network, g1, g2) Returns true if g1 interacts with g2 in PPI.
eval_gene_sets_by_interactions(collection, G, Hs) Evaluate each of the t individual gene sets by comparing the number of interactions within a gene set in the original PPI network G to the permuted networks Hs.
num_interactions_in_gene_set(network, gset) Counts the number of interactions among genes in the given set.
avg_pair_dist_test(collection, G, Hs) Performs the average pairwise distance test on the collection of gene sets.
avg_pair_dist_ratio(network, collection) Calculates the ratio of the average pairwise distance among genes within the same gene set and genes within different gene sets.
eval_gene_sets_by_dist(collection, G, Hs) Evaluate each of the t individual gene sets by comparing the number of interactions within a gene set in the original PPI network G to the permuted networks Hs.
dist(network, g1, g2) Returns the length of the shortest path between g1 and g2 in PPI. If no path exists, returns 1e100.
sum_dist(network, pairs) Given a PPI and a list of gene pairs, returns the sum of the shortest paths between each pair.
avg_pair_dist_of_gene_set(network, gset) Counts the number of interactions among genes in the given set.
remove_name_annotation(genes) Removes annotation of genes or mutation classes for CNAs.
load_collection(collection_file) Extracts the gene sets from a collection file output by Multi-Dendrix.

Core modules

(multi_dendrix.core_modules)

This module identifies the “core modules” from a set of Multi-Dendrix runs. In other words, given multiple collections of gene sets identified by Multi-Dendrix, this module outputs sets of genes that appear together for > S (default: S = 1) parameter settings.

Requirements:
extract(collections, stability_threshold) Extracts the core modules from a set of collections output by Multi-Dendrix.

Output functions

(multi_dendrix.output)

This module contains many of the output functions used as part of the Multi-Dendrix Pipeline. The purpose is to convert Multi-Dendrix results into text and HTML output. In the future, I hope to add functions for rendering mutation matrices as SVGs.

Requirements: