Manual Installation¶
If you wish to install HATCHet
directly from this repository, the steps are a bit more involved.
Note that the complexity of manual installation is largely because the compute-cn
step (determination of
allele-specific copy numbers) of the HATCHet pipeline uses custom-written C++11 code that uses the
Gurobi optimizer. If you do not have a valid Gurobi license (though it is
easily available for users in academia), then the C++ parts of
HATCHet do not necessarily need to be compiled, and you can read the
Compiling HATCHet without the built-in Gurobi optimizer section of this page.
Compiling HATCHet with the built-in Gurobi optimizer¶
The core optimization module of HATCHet is written in C++11 and thus requires a modern C++ compiler (GCC >= 4.8.1, or Clang).
As long as you have a recent version of GCC or Clang installed, setuptools
should automatically be able to download a
recent version of cmake
and compile the Hatchet code into a working package.
The installation process can be broken down into the following steps:
Get Gurobi (v9.0.2)
The coordinate-method applied by HATCHet is based on several integer linear programming (ILP) formulations. Gurobi is a commercial ILP solver with two licensing options: (1) a single-host license where the license is tied to a single computer and (2) a network license for use in a compute cluster (using a license server in the cluster). Both options are freely and easily available for users in academia. Download Gurobi for your specific platform.
Set GUROBI_HOME environment variable
$ export GUROBI_HOME=/path/to/gurobi902
Set
GUROBI_HOME
to where you download Gurobi. HereXXX
is the 3-digit version of gurobi.Build Gurobi
$ cd "${GUROBI_HOME}" $ cd linux64/src/build/ $ make $ cp libgurobi_c++.a ../../lib
Substitute
mac64
forlinux64
if using the Mac OSX platform.Create a new venv/conda environment for Hatchet
Hatchet
is a Python 3 package. Unless you want to compile/install it in your default Python 3 environment, you will want to create either a new Conda environment for Python 3 and activate it:conda create --name hatchet python=3.8 conda activate hatchet
or use
virtualenv
throughpip
:python3 -m pip virtualenv env source env/bin/activate
Install basic packages
It is highly recommended that you upgrade your
pip
andsetuptools
versions to the latest, using:pip install -U pip pip install -U setuptools
Build and install HATCHet
Execute the following commands from the root of HATCHet’s repository.
$ pip install .
NOTE: If you experience a failure of compilation with an error message like:
_undefined reference to symbol 'pthread_create@@GLIBC_2.2.5'_.
you may need to set
CXXFLAGS
to-pthread
before invoking the command:$ CXXFLAGS=-pthread pip install .
When the compilation process fails or when the environment has special requirements, you may have to manually specify the required paths to Gurobi by following the detailed intructions.
Install required utilities
For reading BAM files, read counting, allele counting, and SNP calling, you need to install SAMtools, BCFtools, and tabix as well as mosdepth. If you want to perform reference-based phasing, you must also install shapeit, picard, and bgzip. The easiest way to install these is via
conda
, as all are available from thebioconda
channel (exceptshapeit
which is available from channeldranew
).
Compiling HATCHet without the built-in Gurobi optimizer¶
If you wish to use an alternate ILP optimizer, then you do not need a C++ compiler.
In this case, set the HATCHET_BUILD_NOEXT
environment variable to 1
(using export HATCHET_BUILD_NOEXT=1
),
set the environment variable HATCHET_COMPUTE_CN_SOLVER
to a Pyomo-supported solver (see the
Using a different Pyomo-supported solver section of the README for more details)
and proceed directly to step (4) above.