Demo complete for the entire HATCHet pipeline¶

: ex: set ft=markdown ;:<<’```shell’ #

The following HATCHet demo represents a guided example of the complete HATCHet pipeline starting from an exemplary dataset of tumour and matched normal BAM files publicly available. From this directory, simply run this file through BASH as a standard script to run the complete demo. The demo can also be considered as a guided example of a complete execution and is correspondingly commented.

Requirements and set up¶

The demo requires that HATCHet has been succesfully installed in the current python environment. Please make sure that you can succesfully run the required dependencies samtools, bcftools, tabix, and mosdepth. The demo includes the downloading of all the required files and will terminate in <20 minutes on machine with minimum requirements satisfied.

We guarantee that the running directory in the same directory of the demo and we remove previous results.

cd $( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )
rm -rf rdr/ baf/ snps/ bb/ bbc/ plots/ results/ summary/
:<<'```shell' # Ignore this line

We also ask the demo to terminate in case of errors and to print a trace of the execution by the following commands

set -e
set -o xtrace
PS4='[\t]'
:<<'```shell' # Ignore this line

Downloading of data¶

The demo automatically downloads the required tumor and matched-normal BAM files in data folder.

# Creating data folder
mkdir -p data

# Downloading matched-normal BAM file
echo "Downloading matched-normal BAM file from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/normal.bam?download=1' > data/normal.bam
curl -L 'https://zenodo.org/record/4046906/files/normal.bam.bai?download=1' > data/normal.bam.bai

# Downloading BAM file of tumor sample 1
echo "Downloading BAM file of tumor sample 1 from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/bulk_03clone1_06clone0_01normal.sorted.bam?download=1' > data/bulk_03clone1_06clone0_01normal.sorted.bam
curl -L 'https://zenodo.org/record/4046906/files/bulk_03clone1_06clone0_01normal.sorted.bam.bai?download=1' > data/bulk_03clone1_06clone0_01normal.sorted.bam.bai

# Downloading BAM file of tumor sample 2
echo "Downloading BAM file of tumor sample 2 from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/bulk_08clone1_Noneclone0_02normal.sorted.bam?download=1' > data/bulk_08clone1_Noneclone0_02normal.sorted.bam
curl -L 'https://zenodo.org/record/4046906/files/bulk_08clone1_Noneclone0_02normal.sorted.bam.bai?download=1' > data/bulk_08clone1_Noneclone0_02normal.sorted.bam.bai

# Downloading BAM file of tumor sample 3
echo "Downloading BAM file of tumor sample 3 from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/bulk_Noneclone1_09clone0_01normal.sorted.bam?download=1' > data/bulk_Noneclone1_09clone0_01normal.sorted.bam
curl -L 'https://zenodo.org/record/4046906/files/bulk_Noneclone1_09clone0_01normal.sorted.bam.bai?download=1' > data/bulk_Noneclone1_09clone0_01normal.sorted.bam.bai
:<<'```shell' # Ignore this line

Next the corresponding reference genome is downloaded and unpacked

echo "Downloading human reference genome, please be patient as downloading time may vary."
curl -L https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz | gzip -d > data/hg19.fa
samtools faidx data/hg19.fa
samtools dict data/hg19.fa > data/hg19.dict
:<<'```shell' # Ignore this line

Configuring the HATCHet’s execution¶

We follow the template of the HATCHet’s script.

We specify the correct path to the reference genome and the output folder, and other required flags

echo '[run]' > hatchet.ini
echo 'genotype_snps=True' >> hatchet.ini
echo 'count_alleles=True' >> hatchet.ini
echo 'count_reads=True' >> hatchet.ini
echo 'combine_counts=True' >> hatchet.ini
echo 'cluster_bins=True' >> hatchet.ini
echo 'plot_bins=True' >> hatchet.ini
echo 'compute_cn=True' >> hatchet.ini
echo 'plot_cn=True' >> hatchet.ini
echo 'reference=data/hg19.fa' >> hatchet.ini
echo 'output=output/' >> hatchet.ini
j=$(grep -c ^processor /proc/cpuinfo)
processes="processes=${j}"
echo $processes >> hatchet.ini
echo 'chromosomes="chr22"' >> hatchet.ini

:<<'```shell' # Ignore this line

We specify the path to the matched-normal BAM files

echo 'normal=data/normal.bam' >> hatchet.ini
:<<'```shell' # Ignore this line

We specify the list of paths to the tumor BAM files and corresponding names

echo 'bams=data/bulk_03clone1_06clone0_01normal.sorted.bam data/bulk_08clone1_Noneclone0_02normal.sorted.bam data/bulk_Noneclone1_09clone0_01normal.sorted.bam' >> hatchet.ini
echo 'samples=TumorSample1 TumorSample2 TumorSample3' >> hatchet.ini
:<<'```shell' # Ignore this line

We specify the min/max coverage for the genotpe_snps step

echo '[genotype_snps]' >> hatchet.ini
echo 'mincov=8' >> hatchet.ini
echo 'maxcov=300' >> hatchet.ini
:<<'```shell' # Ignore this line

We specify the reference genome and chr notation

echo 'reference_version=hg19' >> hatchet.ini
echo 'chr_notation=True' >> hatchet.ini
:<<'```shell' # Ignore this line

We specify mincov/maxcov for the count_alleles step

echo '[count_alleles]' >> hatchet.ini
echo 'mincov=8' >> hatchet.ini
echo 'maxcov=300' >> hatchet.ini
:<<'```shell' # Ignore this line

We specify the minimum number of total and SNP-covering reads in each bin for the combine_counts step

echo '[combine_counts]' >> hatchet.ini
echo 'msr=3000' >> hatchet.ini
echo 'mtr=5000' >> hatchet.ini
:<<'```shell' # Ignore this line

Running HATCHet¶

python -m hatchet run hatchet.ini
exit $?