genotype-snps¶
Given the normal BAM file, this step of HATCHet identifies heterozygous germline SNP positions. The user can restrict candidate positions to a given list (e.g., dbSNP) using the -R, --snps
argument.
Input¶
genotype-snps takes in input a sorted and index BAM file from a matched-normal sample and an indexed human reference genome (preferrably the same version as was used for alignment).
Name | Description | Usage |
---|---|---|
-N , --normal |
A sorted-indexed BAM file | The matched normal sample |
-r , --reference |
A FASTA file | The human reference genome used for germline variant calling |
Output¶
genotype-snps produces a tab-separated VCF file for each chromosome which contains a list of the genomic positions that have been identified as germline heterozygous SNPs in the matched-normal sample.
Name | Description | Format |
---|---|---|
-l , --outputsnps |
the output file for the list of identified heterozygous germline SNPs | #CHR POS |
Main parameters¶
Name | Description | Usage | Default |
---|---|---|---|
-R , --snps |
VCF files | Optional list of candidate SNP positions to consider | None* |
-st , --samtools |
Path to bin directory of SAMtools |
The path to this direcoty needs to be specified when it is not included in $PATH |
Path is expected in the enviroment variable $PATH |
-bt , --bcftools |
Path to bin directory of BCFtools |
The path to this direcoty needs to be specified when it is not included in $PATH |
Path is expected in the enviroment variable $PATH |
-c , --mincov |
Minimum coverage | Minimum number of reads that have to cover a variant to be called, the value can be increased when considering a dataset with high depth (>60x) | 0 |
-C , --maxcov |
Maximum coverage | Maximum number of reads that have to cover a variant to be called, the typically suggested value should be twice higher than expected coverage to avoid sequencing and mapping artifacts | 1000 |
-j , --processes |
Number of parallel jobs | Parallel jobs are used to consider the chromosomes in different samples on parallel. The higher the number the better the running time | 2 |
*When run as a standalone module, the no SNP list is used by default. When run as part of hatchet run
and no --snps
argument is supplied, the correct version of dbSNP is automatically downloaded and used to call SNPs.
Optional parameters¶
Name | Description | Usage | Default |
---|---|---|---|
-v , --verbose |
Verbose logging flag | When enabled, count-alleles outputs a verbose log of the executiong | Not used |
-q , --readquality |
Threshold for phred-score quality of sequencing reads | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequencing reads | 0 |
-Q , --basequality |
Threshold for phred-score quality of sequenced nucleotide bases | The value can be either decreased (e.g. 10) or increased (e.g. 30) to adjust the filtering of sequenced nucleotide bases | 11 |
-E ,--newbaq |
Flag to enable newbaq veafute of SAMtools |
When selected, the user asks SAMtools to recompute alignment of reads on the fly during SNP calling | Not used |