# genotype-snps Given the normal BAM file, this step of HATCHet identifies heterozygous germline SNP positions. The user can restrict candidate positions to a given list (e.g., dbSNP) using the `-R, --snps` argument. ## Input genotype-snps takes in input a sorted and index BAM file from a matched-normal sample and an indexed human reference genome (preferrably the same version as was used for alignment). | Name | Description | Usage | |------|-------------|-------| | `-N`, `--normal` | A sorted-indexed BAM file | The matched normal sample | | `-r`, `--reference` | A FASTA file | The human reference genome used for germline variant calling | ## Output genotype-snps produces a tab-separated VCF file for each chromosome which contains a list of the genomic positions that have been identified as germline heterozygous SNPs in the matched-normal sample. | Name | Description | Format | |------|-------------|--------| | `-l`, `--outputsnps` | the output file for the list of identified heterozygous germline SNPs | `#CHR POS` | ## Main parameters | Name | Description | Usage | Default | |------|-------------|-------|---------| | `-R`, `--snps` | VCF files | Optional list of candidate SNP positions to consider | None* | | `-st`, `--samtools` | Path to `bin` directory of SAMtools | The path to this direcoty needs to be specified when it is not included in `$PATH` | Path is expected in the enviroment variable `$PATH` | | `-bt`, `--bcftools` | Path to `bin` directory of BCFtools | The path to this direcoty needs to be specified when it is not included in `$PATH` | Path is expected in the enviroment variable `$PATH` | | `-c`, `--mincov` | Minimum coverage | Minimum number of reads that have to cover a variant to be called, the value can be increased when considering a dataset with high depth (>60x) | 0 | | `-C`, `--maxcov` | Maximum coverage | Maximum number of reads that have to cover a variant to be called, the typically suggested value should be twice higher than expected coverage to avoid sequencing and mapping artifacts | 1000 | | `-j`, `--processes` | Number of parallel jobs | Parallel jobs are used to consider the chromosomes in different samples on parallel. The higher the number the better the running time | 2 | *When run as a standalone module, the no SNP list is used by default. When run as part of `hatchet run` and no `--snps` argument is supplied, the correct version of dbSNP is automatically downloaded and used to call SNPs. ## Optional parameters | Name | Description | Usage | Default | |------|-------------|-------|---------| | `-v`, `--verbose` | Verbose logging flag | When enabled, count-alleles outputs a verbose log of the executiong | Not used | | `-q`, `--readquality` | Threshold for phred-score quality of sequencing reads | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequencing reads | 0 | | `-Q`, `--basequality` | Threshold for phred-score quality of sequenced nucleotide bases | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequenced nucleotide bases | 11 | | `-E`,`--newbaq` | Flag to enable `newbaq` veafute of SAMtools | When selected, the user asks SAMtools to recompute alignment of reads on the fly during SNP calling | Not used |