genotype-snps¶
Given the normal BAM file, this step of HATCHet identifies heterozygous germline SNP positions. The user can restrict candidate positions to a given list (e.g., dbSNP) using the -R, --snps argument.
Input¶
genotype-snps takes in input a sorted and index BAM file from a matched-normal sample and an indexed human reference genome (preferrably the same version as was used for alignment).
| Name | Description | Usage |
|---|---|---|
-N, --normal |
A sorted-indexed BAM file | The matched normal sample |
-r, --reference |
A FASTA file | The human reference genome used for germline variant calling |
Output¶
genotype-snps produces a tab-separated VCF file for each chromosome which contains a list of the genomic positions that have been identified as germline heterozygous SNPs in the matched-normal sample.
| Name | Description | Format |
|---|---|---|
-l, --outputsnps |
the output file for the list of identified heterozygous germline SNPs | #CHR POS |
Main parameters¶
| Name | Description | Usage | Default |
|---|---|---|---|
-R, --snps |
VCF files | Optional list of candidate SNP positions to consider | None* |
-st, --samtools |
Path to bin directory of SAMtools |
The path to this direcoty needs to be specified when it is not included in $PATH |
Path is expected in the enviroment variable $PATH |
-bt, --bcftools |
Path to bin directory of BCFtools |
The path to this direcoty needs to be specified when it is not included in $PATH |
Path is expected in the enviroment variable $PATH |
-c, --mincov |
Minimum coverage | Minimum number of reads that have to cover a variant to be called, the value can be increased when considering a dataset with high depth (>60x) | 0 |
-C, --maxcov |
Maximum coverage | Maximum number of reads that have to cover a variant to be called, the typically suggested value should be twice higher than expected coverage to avoid sequencing and mapping artifacts | 1000 |
-j, --processes |
Number of parallel jobs | Parallel jobs are used to consider the chromosomes in different samples on parallel. The higher the number the better the running time | 2 |
*When run as a standalone module, the no SNP list is used by default. When run as part of hatchet run and no --snps argument is supplied, the correct version of dbSNP is automatically downloaded and used to call SNPs.
Optional parameters¶
| Name | Description | Usage | Default |
|---|---|---|---|
-v, --verbose |
Verbose logging flag | When enabled, count-alleles outputs a verbose log of the executiong | Not used |
-q, --readquality |
Threshold for phred-score quality of sequencing reads | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequencing reads | 0 |
-Q, --basequality |
Threshold for phred-score quality of sequenced nucleotide bases | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequenced nucleotide bases | 11 |
-E,--newbaq |
Flag to enable newbaq veafute of SAMtools |
When selected, the user asks SAMtools to recompute alignment of reads on the fly during SNP calling | Not used |