# genotype-snps

Given the normal BAM file, this step of HATCHet identifies heterozygous germline SNP positions. The user can restrict candidate positions to a given list (e.g., dbSNP) using the `-R, --snps` argument.

## Input

genotype-snps takes in input a sorted and index BAM file from a matched-normal sample and an indexed human reference genome (preferrably the same version as was used for alignment).

| Name | Description | Usage |
|------|-------------|-------|
| `-N`, `--normal` | A sorted-indexed BAM file | The matched normal sample |
| `-r`, `--reference` | A FASTA file | The human reference genome used for germline variant calling |

## Output

genotype-snps produces a tab-separated VCF file for each chromosome which contains a list of the genomic positions that have been identified as germline heterozygous SNPs in the matched-normal sample.

| Name | Description | Format |
|------|-------------|--------|
| `-l`, `--outputsnps` | the output file for the list of identified heterozygous germline SNPs | `#CHR POS` |

## Main parameters

| Name | Description | Usage | Default |
|------|-------------|-------|---------|
| `-R`, `--snps` | VCF files | Optional list of candidate SNP positions to consider | None* |
| `-st`, `--samtools` | Path to `bin` directory of SAMtools | The path to this direcoty needs to be specified when it is not included in `$PATH` | Path is expected in the enviroment variable `$PATH` |
| `-bt`, `--bcftools` | Path to `bin` directory of BCFtools | The path to this direcoty needs to be specified when it is not included in `$PATH` | Path is expected in the enviroment variable `$PATH` |
| `-c`, `--mincov` | Minimum coverage | Minimum number of reads that have to cover a variant to be called, the value can be increased when considering a dataset with high depth (>60x) | 0 |
| `-C`, `--maxcov` | Maximum coverage | Maximum number of reads that have to cover a variant to be called, the typically suggested value should be twice higher than expected coverage to avoid sequencing and mapping artifacts | 1000 |
| `-j`, `--processes` | Number of parallel jobs | Parallel jobs are used to consider the chromosomes in different samples on parallel. The higher the number the better the running time | 2 |

*When run as a standalone module, the no SNP list is used by default. When run as part of `hatchet run` and no `--snps` argument is supplied, the correct version of dbSNP is automatically downloaded and used to call SNPs.

## Optional parameters

| Name | Description | Usage | Default |
|------|-------------|-------|---------|
| `-v`, `--verbose`  | Verbose logging flag | When enabled, count-alleles outputs a verbose log of the executiong | Not used |
| `-q`, `--readquality` | Threshold for phred-score quality of sequencing reads | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequencing reads | 0 |
| `-Q`, `--basequality` | Threshold for phred-score quality of sequenced nucleotide bases | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequenced nucleotide bases | 11 |
| `-E`,`--newbaq` | Flag to enable `newbaq` veafute of SAMtools | When selected, the user asks SAMtools to recompute alignment of reads on the fly during SNP calling | Not used |