# Demo complete for the entire HATCHet pipeline

: ex: set ft=markdown ;:<<'```shell' #

The following HATCHet demo represents a guided example of the complete HATCHet pipeline starting from an exemplary dataset of tumour and matched normal
[BAM files](https://doi.org/10.5281/zenodo.4046906) publicly available. From this directory, simply run this file through BASH as a standard script to run
the complete demo. The demo can also be considered as a guided example of a complete execution and is correspondingly commented.

## Requirements and set up

The demo requires that HATCHet has been succesfully installed in the current python environment.
Please make sure that you can succesfully run the required dependencies `samtools`, `bcftools`, `tabix`, and `mosdepth`.
The demo includes the downloading of all the required files and will terminate in <20 minutes on machine with minimum requirements satisfied.

We guarantee that the running directory in the same directory of the demo and we remove previous results.

```shell
cd $( cd "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )
rm -rf rdr/ baf/ snps/ bb/ bbc/ plots/ results/ summary/
:<<'```shell' # Ignore this line
```

We also ask the demo to terminate in case of errors and to print a trace of the execution by the following commands
```shell
set -e
set -o xtrace
PS4='[\t]'
:<<'```shell' # Ignore this line
```

## Downloading of data

The demo automatically downloads the required tumor and matched-normal BAM files in `data` folder.

```shell
# Creating data folder
mkdir -p data

# Downloading matched-normal BAM file
echo "Downloading matched-normal BAM file from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/normal.bam?download=1' > data/normal.bam
curl -L 'https://zenodo.org/record/4046906/files/normal.bam.bai?download=1' > data/normal.bam.bai

# Downloading BAM file of tumor sample 1
echo "Downloading BAM file of tumor sample 1 from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/bulk_03clone1_06clone0_01normal.sorted.bam?download=1' > data/bulk_03clone1_06clone0_01normal.sorted.bam
curl -L 'https://zenodo.org/record/4046906/files/bulk_03clone1_06clone0_01normal.sorted.bam.bai?download=1' > data/bulk_03clone1_06clone0_01normal.sorted.bam.bai

# Downloading BAM file of tumor sample 2
echo "Downloading BAM file of tumor sample 2 from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/bulk_08clone1_Noneclone0_02normal.sorted.bam?download=1' > data/bulk_08clone1_Noneclone0_02normal.sorted.bam
curl -L 'https://zenodo.org/record/4046906/files/bulk_08clone1_Noneclone0_02normal.sorted.bam.bai?download=1' > data/bulk_08clone1_Noneclone0_02normal.sorted.bam.bai

# Downloading BAM file of tumor sample 3
echo "Downloading BAM file of tumor sample 3 from Zenodo, please be patient as downloading time may vary."
curl -L 'https://zenodo.org/record/4046906/files/bulk_Noneclone1_09clone0_01normal.sorted.bam?download=1' > data/bulk_Noneclone1_09clone0_01normal.sorted.bam
curl -L 'https://zenodo.org/record/4046906/files/bulk_Noneclone1_09clone0_01normal.sorted.bam.bai?download=1' > data/bulk_Noneclone1_09clone0_01normal.sorted.bam.bai
:<<'```shell' # Ignore this line
```

Next the corresponding reference genome is downloaded and unpacked

```shell
echo "Downloading human reference genome, please be patient as downloading time may vary."
curl -L https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz | gzip -d > data/hg19.fa
samtools faidx data/hg19.fa
samtools dict data/hg19.fa > data/hg19.dict
:<<'```shell' # Ignore this line
```

## Configuring the HATCHet's execution

We follow the template of the HATCHet's [script](../../doc/doc_fullpipeline.md#fullpipelineandtutorial).

1. We specify the correct path to the reference genome and the output folder, and other required flags
```shell
echo '[run]' > hatchet.ini
echo 'genotype_snps=True' >> hatchet.ini
echo 'count_alleles=True' >> hatchet.ini
echo 'count_reads=True' >> hatchet.ini
echo 'combine_counts=True' >> hatchet.ini
echo 'cluster_bins=True' >> hatchet.ini
echo 'plot_bins=True' >> hatchet.ini
echo 'compute_cn=True' >> hatchet.ini
echo 'plot_cn=True' >> hatchet.ini
echo 'reference=data/hg19.fa' >> hatchet.ini
echo 'output=output/' >> hatchet.ini
j=$(grep -c ^processor /proc/cpuinfo)
processes="processes=${j}"
echo $processes >> hatchet.ini
echo 'chromosomes="chr22"' >> hatchet.ini

:<<'```shell' # Ignore this line
```

2. We specify the path to the matched-normal BAM files
```shell
echo 'normal=data/normal.bam' >> hatchet.ini
:<<'```shell' # Ignore this line
```

3. We specify the list of paths to the tumor BAM files and corresponding names
```shell
echo 'bams=data/bulk_03clone1_06clone0_01normal.sorted.bam data/bulk_08clone1_Noneclone0_02normal.sorted.bam data/bulk_Noneclone1_09clone0_01normal.sorted.bam' >> hatchet.ini
echo 'samples=TumorSample1 TumorSample2 TumorSample3' >> hatchet.ini
:<<'```shell' # Ignore this line
```

4. We specify the min/max coverage for the genotpe_snps step
```shell
echo '[genotype_snps]' >> hatchet.ini
echo 'mincov=8' >> hatchet.ini
echo 'maxcov=300' >> hatchet.ini
:<<'```shell' # Ignore this line
```

5. We specify the reference genome and chr notation
```shell
echo 'reference_version=hg19' >> hatchet.ini
echo 'chr_notation=True' >> hatchet.ini
:<<'```shell' # Ignore this line
```

6. We specify mincov/maxcov for the count_alleles step
```shell
echo '[count_alleles]' >> hatchet.ini
echo 'mincov=8' >> hatchet.ini
echo 'maxcov=300' >> hatchet.ini
:<<'```shell' # Ignore this line
```

7. We specify the minimum number of total and SNP-covering reads in each bin for the combine_counts step
```shell
echo '[combine_counts]' >> hatchet.ini
echo 'msr=3000' >> hatchet.ini
echo 'mtr=5000' >> hatchet.ini
:<<'```shell' # Ignore this line
```

## Running HATCHet

```shell
python -m hatchet run hatchet.ini
exit $?
```