.. _getting_started:


***************
Getting started
***************

.. _installing-docdir:

Running Multi-Dendrix
=============================
Included in the Multi-Dendrix package is a script called *run_multi_dendrix.sh*. This script will run the full :doc:`pipeline` on the GBM(2008) mutation data provided in examples/mutation_data. To get started, I would copy this script and create a version using your own mutation data and parameters. For additional example scripts, see :doc:`examples`, which also includes output from users of Multi-Dendrix.

Improving runtime
=============================
There are a few bottlenecks when running Multi-Dendrix that can be alleviated by doing some computation in advance.

Run Multi-Dendrix on a multi-core machine
******************************************
Because Multi-Dendrix uses IBM's CPLEX, it can take advantage of how ever many cores your machine has. Thus, for larger jobs, running on a machine with multiple cores will decrease the runtime linearly with the number of cores.

Set the parameters of Multi-Dendrix carefully
*********************************************
In general, larger values of number *t* of gene sets will cause Multi-Dendrix to take longer to complete than larger values of the maximum gene set size *kmax*. I would start by running with *kmax* = 3 and *t* = 2 just to get a flavor of how quickly CPLEX can solve the ILP. If CPLEX is slow for large values of *t* and *kmax*, try setting *kmin* close to *kmax* to reduce the space of constraints.

The parameters *delta* and *lambda* for setting the number of overlaps between gene sets and limiting the number of gene sets a gene can be a member of are relatively untested, but will increase the number of constraints rapidly and therefore will result in much slower runtimes.

Permute data in advance
************************
First and foremost is generating the permuted PPI networks and permuted mutation data before running the :doc:`pipeline`. Permuting the provided iRefIndex PPI network *once* can take up to 20 minutes, so trying to achieve statistical significance to *P* < 0.01 by permuting 100 networks sequentially can be a real hassle. Instead, permute the networks and mutation data first, and then provide the directories to the :doc:`pipeline` using the --permuted_networks_dir and --permuted_matrices_dir options.

Use a compute cluster
*********************
If you have access to a compute cluster, it will be immensely useful for performing the statistical significance (matrix permutation) and direct interactions tests. That is because the permutation tests are repeating the same exact computation on permuted versions of the mutation data / PPI network hundreds (if not thousands) of times. I don't include any scripts for harnessing the power of a cluster but if you're interested and need help contact me and I may be able provide assistance.