Multi-Dendrix Logo

Table Of Contents

This Page

Getting started

Running Multi-Dendrix

Included in the Multi-Dendrix package is a script called run_multi_dendrix.sh. This script will run the full Multi-Dendrix Pipeline on the GBM(2008) mutation data provided in examples/mutation_data. To get started, I would copy this script and create a version using your own mutation data and parameters. For additional example scripts, see Example uses of the Multi-Dendrix pipeline, which also includes output from users of Multi-Dendrix.

Improving runtime

There are a few bottlenecks when running Multi-Dendrix that can be alleviated by doing some computation in advance.

Run Multi-Dendrix on a multi-core machine

Because Multi-Dendrix uses IBM’s CPLEX, it can take advantage of how ever many cores your machine has. Thus, for larger jobs, running on a machine with multiple cores will decrease the runtime linearly with the number of cores.

Set the parameters of Multi-Dendrix carefully

In general, larger values of number t of gene sets will cause Multi-Dendrix to take longer to complete than larger values of the maximum gene set size kmax. I would start by running with kmax = 3 and t = 2 just to get a flavor of how quickly CPLEX can solve the ILP. If CPLEX is slow for large values of t and kmax, try setting kmin close to kmax to reduce the space of constraints.

The parameters delta and lambda for setting the number of overlaps between gene sets and limiting the number of gene sets a gene can be a member of are relatively untested, but will increase the number of constraints rapidly and therefore will result in much slower runtimes.

Permute data in advance

First and foremost is generating the permuted PPI networks and permuted mutation data before running the Multi-Dendrix Pipeline. Permuting the provided iRefIndex PPI network once can take up to 20 minutes, so trying to achieve statistical significance to P < 0.01 by permuting 100 networks sequentially can be a real hassle. Instead, permute the networks and mutation data first, and then provide the directories to the Multi-Dendrix Pipeline using the –permuted_networks_dir and –permuted_matrices_dir options.

Use a compute cluster

If you have access to a compute cluster, it will be immensely useful for performing the statistical significance (matrix permutation) and direct interactions tests. That is because the permutation tests are repeating the same exact computation on permuted versions of the mutation data / PPI network hundreds (if not thousands) of times. I don’t include any scripts for harnessing the power of a cluster but if you’re interested and need help contact me and I may be able provide assistance.