combine-counts-fw¶

NOTE: This function (formerly called comBBo) uses the legacy fixed-width binning described in the HATCHet paper. We recommend using count-reads and combine-counts which apply an adaptive binning scheme to ensure that each genomic bin has comparable BAF signal.

This step combines the read counts and the allele counts for the identified germline SNPs to compute the read-depth ratio (RDR) and B-allele frequency (BAF) of every genomic bin.

Input¶

combine-counts-fw takes in input three tab-separate files:

A file of read counts for genomic bins obtained from the matched-normal sample, specified by the flag -c, --normalbins. The tab separated file has the following fields.

Field	Description
`SAMPLE`	Name of the matched-normal sample
`CHR`	Name of a chromosome
`START`	Starting genomic position of a genomic bin in `CHR`
`END`	Ending genomic position of a genomic bin in `CHR`
`COUNT`	Count of sequencing reads in the corresponding bin

A file of read counts for genomic bins obtained from all the tumor samples, specified by the flag -C, --tumorbins. The tab separated file has the following fields.

Field	Description
`SAMPLE`	Name of a tumor sample
`CHR`	Name of a chromosome
`START`	Starting genomic position of a genomic bin in `CHR`
`END`	Ending genomic position of a genomic bin in `CHR`
`COUNT`	Count of sequencing reads in the corresponding bin

A file of allele counts for heterozygous germline SNPs obtained from all the tumor samples, specified by the flag -B, --tumorbafs. The tab separated file has the following fields.

Field	Description
`SAMPLE`	Name of a tumor sample
`CHR`	Name of a chromosome
`POS`	Genomic position corresponding to a heterozygous germline in `CHR`
`REF_COUNT`	Count of reads covering `POS` with reference allele
`ALT_COUNT`	Count of reads covering `POS` with alternate allele

Output¶

combine-counts-fw produces a tab-separated file with the following fields.

Field	Description
`CHR`	Name of a chromosome
`START`	Starting genomic position of a genomic bin in `CHR`
`END`	Ending genomic position of a genomic bin in `CHR`
`SAMPLE`	Name of a tumor sample
`RD`	RDR of the bin in `SAMPLE`
`#SNPS`	Number of SNPs present in the bin in `SAMPLE`
`COV`	Average coverage in the bin in `SAMPLE`
`ALPHA`	Alpha parameter related to the binomial model of BAF for the bin in `SAMPLE`, typically total number of reads from A allele
`BETA`	Beta parameter related to the binomial model of BAF for the bin in `SAMPLE`, typically total number of reads from B allele
`BAF`	BAF of the bin in `SAMPLE`

Main parameters¶

combine-counts has some main parameters; the main values of these parameters allow to deal with most of datasets, but their values can be changed or tuned to accommodate the features of special datasets.

Name	Description	Usage	Default
`-d`, `--diploidbaf`	Maximum expected shift from 0.5 for BAF of diploid or tetraploid clusters	The maximum shift is used to identify all the potential clusters with base states (1, 1) or (2, 2). The value depends on the variance in the data (related to noise and coverage); generally, higher variance requires a higher shift. Information provided by plot-bins can help to decide this value in special datasets.	0.08 (other typically suggested values are 0.1-0.11 for higher variance and 0.06 for low variance)

Optional parameters¶

Name	Description	Usage	Default
`-v`, `--verbose`	Verbose logging flag	When enabled, combine-counts outputs a verbose log of the executiong	Not used
`-r`, `--disablebar`	Disabling progress-bar flag	When enabled, the output progress bar is disabled	Not used
`-b`, `--normalbafs`	File of allele counts for SNPs in matched-normal sample	When provided, combine-counts attempts to correct the estimated BAF using the variance in matched-normal sample.	Not used (deprecated)
`-d`, `--diploidbaf`	Maximum expected shift from 0.5 for BAF of diploid or tetraploid clusters	The maximum shift is used to identify potential potential bins with base states (1, 1) or (2, 2) whose BAF needs to be corrected. The value depends on the variance in the data (related to noise and coverage); generally, higher variance requires a higher shift. Information provided by plot-bins can help to decide this value in special datasets.	0.08 (other typically suggested values are 0.1-0.11 for higher variance and 0.06 for low variance)
`-t`, `--totalcounts`	File of total read counts	When provided, the total read counts are used to normalize the read counts from the corresponding sample	Not used (deprecated)
`-m`, `--mode`	Mode used to estimate BAFs	Different modes are provided to combine to combine the allele counts of SNPs	The counts from the allele with minor count are combined (deprecated)