hatchet.utils package¶

Submodules¶

hatchet.utils.ArgParsing module¶

hatchet.utils.ArgParsing.extractChromosomes(samtools, normal, tumors, reference=None)¶

Parameters:: samtools: path to samtools executable normal: tuple of (path to normal BAM file, string name) tumor: list of tuples (path to BAM file, string name) reference: path to FASTA file

hatchet.utils.ArgParsing.getSQNames(samtools, bamfile)¶

hatchet.utils.ArgParsing.parseRegions(region_file, chromosomes)¶

hatchet.utils.ArgParsing.parse_cluster_bins_args(args=None)¶: Parse command line arguments Returns:

hatchet.utils.ArgParsing.parse_cluster_bins_gmm_args(args=None)¶

hatchet.utils.ArgParsing.parse_combine_counts_args(args=None)¶

hatchet.utils.ArgParsing.parse_combine_counts_fw_args(args=None)¶

hatchet.utils.ArgParsing.parse_count_alleles_arguments(args=None)¶

hatchet.utils.ArgParsing.parse_count_reads_args(args=None)¶

hatchet.utils.ArgParsing.parse_count_reads_fw_arguments(args=None)¶

hatchet.utils.ArgParsing.parse_download_panel_arguments(args=None)¶

hatchet.utils.ArgParsing.parse_genotype_snps_arguments(args=None)¶

hatchet.utils.ArgParsing.parse_phase_snps_arguments(args=None)¶

hatchet.utils.ArgParsing.parse_plot_bins_1d2d_args(args=None)¶: Parse command line arguments for auxiliary cluster plotting command (1D and 2D plot with matching colors and optional labeled centers)

hatchet.utils.ArgParsing.parse_plot_bins_args(args=None)¶

hatchet.utils.ArgParsing.parse_plot_cn_1d2d_args(args=None)¶: Parse command line arguments for auxiliary plotting command (1D and 2D plot with labeled copy states)

hatchet.utils.BAMBinning module¶

class hatchet.utils.BAMBinning.Binner(task_queue, result_queue, progress_bar, samtools, q, size, regions, verbose)¶

Bases: Process

Attributes:

authkey
daemon: Return whether process is a daemon
exitcode: Return exit code of process or None if it has yet to stop
ident: Return identifier (PID) of process or None if it has yet to start
name
pid: Return identifier (PID) of process or None if it has yet to start
sentinel: Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

`close`()	Close the Process object.
`is_alive`()	Return whether process is alive
`join`([timeout])	Wait until child process terminates
`kill`()	Terminate process; sends SIGKILL signal or uses TerminateProcess()
`run`()	Method to be run in sub-process; can be overridden in sub-class
`start`()	Start child process
`terminate`()	Terminate process; sends SIGTERM signal or uses TerminateProcess()

binChr

binChr(bamfile, samplename, chromosome)¶

run()¶: Method to be run in sub-process; can be overridden in sub-class

hatchet.utils.BAMBinning.bin(samtools, samples, chromosomes, num_workers, q, size, regions, verbose=False)¶

hatchet.utils.CoordinateFinding module¶

hatchet.utils.CoordinateFinding.binChr(bamfile, sample, seq, size, start=0, end=0, least=-1)¶

hatchet.utils.CoordinateFinding.extractChr(ref)¶

hatchet.utils.CoordinateFinding.findEnd(bamfile, seq, least=0)¶

hatchet.utils.CoordinateFinding.findStart(bamfile, seq, least=0)¶

hatchet.utils.ProgressBar module¶

class hatchet.utils.ProgressBar.ProgressBar(total, length, counter=0, verbose=False, decimals=1, fill='█', lock=None, prefix='Progress:', suffix='Complete')¶

Bases: object

Methods

progress
progressLock
progressNoLock

progress(advance=True, msg='')¶

progressLock(advance=True, msg='')¶

progressNoLock(advance=True, msg='')¶

hatchet.utils.Supporting module¶

hatchet.utils.Supporting.argmax(d)¶

hatchet.utils.Supporting.argmin(d)¶

class hatchet.utils.Supporting.bcolors¶

Bases: object

BBLUE = '\x1b[96m'¶

BOLD = '\x1b[1m'¶

ENDC = '\x1b[0m'¶

FAIL = '\x1b[91m'¶

HEADER = '\x1b[95m'¶

OKBLUE = '\x1b[94m'¶

OKGREEN = '\x1b[92m'¶

UNDERLINE = '\x1b[4m'¶

WARNING = '\x1b[93m'¶

hatchet.utils.Supporting.checksum(filepath)¶

hatchet.utils.Supporting.close(msg)¶

hatchet.utils.Supporting.digits(string)¶

hatchet.utils.Supporting.download(url, dirpath, overwrite=False, extract=True, sentinel_file=None, chunk_size=8192)¶

hatchet.utils.Supporting.ensure(pred, msg, exception_class=<class 'ValueError'>)¶

hatchet.utils.Supporting.error(msg, raise_exception=False, exception_class=<class 'ValueError'>)¶

hatchet.utils.Supporting.log(msg, level=None, lock=None, raise_exception=False, exception_class=<class 'ValueError'>)¶

hatchet.utils.Supporting.logArgs(args, width=40)¶

hatchet.utils.Supporting.naturalOrder(text)¶

hatchet.utils.Supporting.numericOrder(text)¶

hatchet.utils.Supporting.run(commands, stdouterr_filepath=None, check_return_codes=True, error_msg=None, stdouterr_filepath_autoremove=True, **kwargs)¶

hatchet.utils.Supporting.to_tuple(s, n=2, typ=<class 'int'>, error_message=None)¶

hatchet.utils.Supporting.url_exists(path)¶

hatchet.utils.Supporting.which(program)¶

hatchet.utils.TotalCounting module¶

class hatchet.utils.TotalCounting.TotalCounter(task_queue, result_queue, progress_bar, samtools, q, verbose)¶

Bases: Process

Attributes:

authkey
daemon: Return whether process is a daemon
exitcode: Return exit code of process or None if it has yet to stop
ident: Return identifier (PID) of process or None if it has yet to start
name
pid: Return identifier (PID) of process or None if it has yet to start
sentinel: Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

`close`()	Close the Process object.
`is_alive`()	Return whether process is alive
`join`([timeout])	Wait until child process terminates
`kill`()	Terminate process; sends SIGKILL signal or uses TerminateProcess()
`run`()	Method to be run in sub-process; can be overridden in sub-class
`start`()	Start child process
`terminate`()	Terminate process; sends SIGTERM signal or uses TerminateProcess()

binChr

binChr(bamfile, samplename, chromosome)¶

run()¶: Method to be run in sub-process; can be overridden in sub-class

hatchet.utils.TotalCounting.tcount(samtools, samples, chromosomes, num_workers, q, verbose=False)¶

hatchet.utils.check module¶

hatchet.utils.check.main(hatchet_cmds=None)¶

hatchet.utils.check.suppress_stdout()¶

hatchet.utils.check_solver module¶

hatchet.utils.check_solver.main(args=None)¶

hatchet.utils.cluster_bins module¶

class hatchet.utils.cluster_bins.DiagGHMM(n_components=1, covariance_type='diag', min_covar=0.001, startprob_prior=1.0, transmat_prior=1.0, means_prior=0, means_weight=0, covars_prior=0.01, covars_weight=1, algorithm='viterbi', random_state=None, n_iter=10, tol=0.01, verbose=False, params='stmc', init_params='stmc', implementation='log')¶

Bases: GaussianHMM

Parameters:

n_componentsint

Number of states.

covariance_type{“spherical”, “diag”, “full”, “tied”}, optional

The type of covariance parameters to use:

“spherical” — each state uses a single variance value that applies to all features (default).
“diag” — each state uses a diagonal covariance matrix.
“full” — each state uses a full (i.e. unrestricted) covariance matrix.
“tied” — all states use the same full covariance matrix.

min_covarfloat, optional

Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.

startprob_priorarray, shape (n_components, ), optional

Parameters of the Dirichlet prior distribution for startprob_.

transmat_priorarray, shape (n_components, n_components), optional

Parameters of the Dirichlet prior distribution for each row of the transition probabilities transmat_.

means_prior, means_weightarray, shape (n_components, ), optional

Mean and precision of the Normal prior distribtion for means_.

covars_prior, covars_weightarray, shape (n_components, ), optional

Parameters of the prior distribution for the covariance matrix covars_.

If covariance_type is “spherical” or “diag” the prior is the inverse gamma distribution, otherwise — the inverse Wishart distribution.

algorithm{“viterbi”, “map”}, optional

Decoder algorithm.

“viterbi”: finds the most likely sequence of states, given all emissions.
“map” (also known as smoothing or forward-backward): finds the sequence of the individual most-likely states, given all emissions.

random_state: RandomState or an int seed, optional

A random number generator instance.

n_iterint, optional

Maximum number of iterations to perform.

tolfloat, optional

Convergence threshold. EM will stop if the gain in log-likelihood is below this value.

verbosebool, optional

Whether per-iteration convergence reports are printed to sys.stderr. Convergence can also be diagnosed using the monitor_ attribute.

params, init_paramsstring, optional

The parameters that get updated during (params) or initialized before (init_params) the training. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars. Defaults to all parameters.

implementationstring, optional

Determines if the forward-backward algorithm is implemented with logarithms (“log”), or using scaling (“scaling”). The default is to use logarithms for backwards compatability.

Attributes:

covars_: Return covars as a full matrix.

Methods

`aic`(X[, lengths])	Akaike information criterion for the current model on the input X.
`bic`(X[, lengths])	Bayesian information criterion for the current model on the input X.
`decode`(X[, lengths, algorithm])	Find most likely state sequence corresponding to `X`.
`fit`(X[, lengths])	Estimate model parameters.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`get_stationary_distribution`()	Compute the stationary distribution of states.
`predict`(X[, lengths])	Find most likely state sequence corresponding to `X`.
`predict_proba`(X[, lengths])	Compute the posterior probability for each state in the model.
`sample`([n_samples, random_state, currstate])	Generate random samples from the model.
`score`(X[, lengths])	Compute the log probability under the model.
`score_samples`(X[, lengths])	Compute the log probability under the model and compute posteriors.
`set_fit_request`(*[, lengths])	Request metadata passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.
`set_predict_proba_request`(*[, lengths])	Request metadata passed to the `predict_proba` method.
`set_predict_request`(*[, lengths])	Request metadata passed to the `predict` method.
`set_score_request`(*[, lengths])	Request metadata passed to the `score` method.

form_transition_matrix

form_transition_matrix(diag)¶

set_fit_request(*, lengths: bool | None | str = '$UNCHANGED$') → DiagGHMM¶

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for lengths parameter in fit.

Returns:

selfobject: The updated object.

set_predict_proba_request(*, lengths: bool | None | str = '$UNCHANGED$') → DiagGHMM¶

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict_proba.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for lengths parameter in predict_proba.

Returns:

selfobject: The updated object.

set_predict_request(*, lengths: bool | None | str = '$UNCHANGED$') → DiagGHMM¶

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for lengths parameter in predict.

Returns:

selfobject: The updated object.

set_score_request(*, lengths: bool | None | str = '$UNCHANGED$') → DiagGHMM¶

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for lengths parameter in score.

Returns:

selfobject: The updated object.

hatchet.utils.cluster_bins.form_seg(bbc, balanced_threshold)¶

hatchet.utils.cluster_bins.hmm_model_select(tracks, minK=20, maxK=50, tau=1e-05, tmat='diag', decode_alg='viterbi', covar='diag', state_selection='bic', restarts=10)¶

hatchet.utils.cluster_bins.main(args=None)¶

hatchet.utils.cluster_bins.make_transmat(diag, K)¶

hatchet.utils.cluster_bins.read_bb(bbfile, subset=None, allow_gaps=False)¶

Constructs arrays to represent the bin in each chromosome or arm. If bbfile was binned around chromosome arm, then uses chromosome arms. Otherwise, uses chromosomes.

Returns:

botht: list of np.ndarrays of size (n_bins, n_tracks): where n_tracks = n_samples * 2

bb: table read from input bbfile sample_labels: order in which samples are represented in each array in botht chr_lables: order in which chromosomes or arms are represented in botht

each array contains 1 track per sample for a single chromosome arm.

hatchet.utils.cluster_bins.reindex(labels)¶: Given a list of labels, reindex them as integers from 1 to n_labels Also orders them in nonincreasing order of prevalence

hatchet.utils.cluster_bins_gmm module¶

hatchet.utils.cluster_bins_gmm.cluster(points, clouds=None, concentration_prior=None, K=100, restarts=10, seed=0)¶

Clusters a set of data points lying in an arbitrary number of clusters. Arguments:

data (list of lists of floats): list of data points to be clustered. clouds (list or lists of floats, same second dimension as data): bootstrapped bins for clustering sampleName (string): The name of the input sample. concentration_prior (float): Tuning parameter for clustering, must be between 0 and 1. Used to determine

concentration of points in clusters – higher favors more clusters, lower favors fewer clusters.

K (int): maximum number of clusters to infer restarts (int): number of initializations to try for GMM seed (int): random number generator seed for GMM

Returns:

mus (list of lists of floats): List of cluster means. sigmas (list of 2D lists of floats): List of cluster covariances. clusterAssignments (list of ints): The assignment of each interval to a cluster, where an entry

j at index i means the ith interval has been assigned to the jth meta-interval.

numPoints (list of ints): Number of points assigned to each cluster numClusters (int): The number of clusters.

hatchet.utils.cluster_bins_gmm.generateClouds(points, density, seed, sdeven=0.02, sdodd=0.02)¶

hatchet.utils.cluster_bins_gmm.getPoints(data, samples)¶

hatchet.utils.cluster_bins_gmm.main(args=None)¶

hatchet.utils.cluster_bins_gmm.minSegmentBins(sbb, nbins, rd, nsnps, cov, clusters, samples)¶

hatchet.utils.cluster_bins_gmm.readBB(bbfile)¶

hatchet.utils.cluster_bins_gmm.refineClustering(combo, assign, assignidx, samples, rdtol, baftol)¶

hatchet.utils.cluster_bins_gmm.reindex(labels)¶: Given a list of labels, reindex them as integers from 1 to n_labels Also orders them in nonincreasing order of prevalence

hatchet.utils.cluster_bins_gmm.roundAlphasBetas(baf, alpha, beta)¶

hatchet.utils.cluster_bins_gmm.scaleBAF(segments, samples, diploidbaf)¶

hatchet.utils.cluster_bins_gmm.segmentBins(bb, clusters, samples)¶

hatchet.utils.cluster_bins_gmm.splitBAF(baf, scale)¶

hatchet.utils.combine_counts module¶

hatchet.utils.combine_counts.EM(totals_in, alts_in, start, tol=1e-06)¶: Adapted from chisel/Combiner.py

hatchet.utils.combine_counts.adaptive_bins_arm(snp_thresholds, total_counts, snp_positions, snp_counts, min_snp_reads=2000, min_total_reads=5000)¶

Compute adaptive bins for a single chromosome arm. Parameters:

snp_thresholds: length <n> array of 1-based genomic positions of candidate bin thresholds

total_counts: <n> x <2d> np.ndarray
entry [i, 2j] contains the number of reads starting in [snp_thresholds[i], snp_thresholds[i + 1]) in sample j (only the first n-1 positions are populated) entry [i, 2j + 1] contains the number of reads covering position snp_thresholds[i] in sample j

snp_positions: length <m> list of 1-based genomic positions of SNPs
NOTE: this function requires that m = n-1 for convenience of programming (could be relaxed in a different implementation)

snp_counts: <m> x <d> np.ndarray containing the number of overlapping reads at each of the <n - 1> snp
positions in <d> samples

min_snp_reads: the minimum number of SNP-covering reads required in each bin and each sample min_total_reads: the minimum number of total reads required in each bin and each sample

hatchet.utils.combine_counts.apply_EM(totals_in, alts_in)¶

hatchet.utils.combine_counts.backtrack(bp)¶

hatchet.utils.combine_counts.binom_prop_test(alt1, ref1, flip1, alt2, ref2, flip2, alpha=0.1)¶: Returns True if there is sufficient evidence that SNPs 1 and 2 should not be merged, False otherwise.

hatchet.utils.combine_counts.block_segment(df, blocksize, max_snps_per_block)¶: Given a pandas dataframe containing read counts for a contiguous segment of SNPs, collects SNPs into phase blocks of size at most <blocksize> containing at most <max_snps_per_block> SNPs each.

hatchet.utils.combine_counts.collapse_blocks(df, blocks, singletons, orphans, ch)¶

hatchet.utils.combine_counts.compute_baf_task_multi(bin_snps, blocksize, max_snps_per_block, test_alpha)¶: Estimates the BAF for the bin containing exactly <bin_snps> SNPs. <bin_snps> is a dataframe with at least ALT and REF columns containing read counts. <blocksize>, <max_snps_per_block>, and <test_alpha> are used only for constructing phase blocks.

hatchet.utils.combine_counts.compute_baf_task_single(bin_snps, blocksize, max_snps_per_block, test_alpha)¶: Estimates the BAF for the bin containing exactly <bin_snps> SNPs. <bin_snps> is a dataframe with at least ALT and REF columns containing read counts. <blocksize>, <max_snps_per_block>, and <test_alpha> are used only for constructing phase blocks.

hatchet.utils.combine_counts.compute_baf_wrapper(bin_snps, blocksize, max_snps_per_block, test_alpha, multisample)¶

hatchet.utils.combine_counts.consecutive(data, stepsize=1)¶

hatchet.utils.combine_counts.correct_haplotypes(orig_bafs, min_prop_switch=0.01, n_segments=20, min_switch_density=0.1, min_mean_baf=0.48, minmax_al_imb=0.02)¶

hatchet.utils.combine_counts.get_chr_end(stem, chromosome)¶

hatchet.utils.combine_counts.main(args=None)¶

hatchet.utils.combine_counts.merge_data(bins, dfs, bafs, sample_names, chromosome)¶

Merge bins data (starts, ends, total counts, RDRs) with SNP data and BAF data for each bin. Parameters: bins: output from call to adaptive_bins_arm dfs: (only for troubleshooting) pandas DataFrame, each containing the SNP information for the corresponding bin bafs: the ith element is the output from compute_baf_task(dfs[i])

Produces a BB file with a few additional columns.

hatchet.utils.combine_counts.merge_phasing(_, all_phase_data)¶: Merge phasing results across all samples: if a pair of SNPs is split in any sample, they won’t be split.

hatchet.utils.combine_counts.multisample_em(alts, refs, start, tol=1e-05)¶

hatchet.utils.combine_counts.phase_blocks_sequential(df, blocksize=50000.0, max_snps_per_block=10, alpha=0)¶

hatchet.utils.combine_counts.read_snps(baf_file, ch, all_names, phasefile=None)¶: Read and validate SNP data for this patient (TSV table output from HATCHet deBAF.py).

hatchet.utils.combine_counts.run_chromosome(baffile, all_names, chromosome, outfile, centromere_start, centromere_end, min_snp_reads, min_total_reads, arraystem, xy, multisample, phasefile, blocksize, max_snps_per_block, test_alpha)¶: Perform adaptive binning and infer BAFs to produce a HATCHet BB file for a single chromosome.

hatchet.utils.combine_counts.run_chromosome_wrapper(param)¶

hatchet.utils.combine_counts.segmented_piecewise(X, pieces=2)¶

hatchet.utils.combine_counts_fw module¶

hatchet.utils.combine_counts_fw.blocking(L, sample, phase, blocksize)¶

hatchet.utils.combine_counts_fw.combine(normalbins, tumorbins, tumorbafs, diploidbaf, totalcounts, chromosomes, samples, normal, gamma, verbose=False, disable=False, phase=None, block=None)¶

hatchet.utils.combine_counts_fw.computeBAFs(partition, samples, diploidbaf, phase=None, block=0)¶

hatchet.utils.combine_counts_fw.main(args=None)¶

hatchet.utils.combine_counts_fw.readBAFs(tumor)¶

hatchet.utils.combine_counts_fw.readBINs(normalbins, tumorbins)¶

hatchet.utils.combine_counts_fw.readPhase(f)¶

hatchet.utils.combine_counts_fw.readTotalCounts(filename, samples, normal)¶

hatchet.utils.combine_counts_fw.splitBAF(baf, scale)¶

hatchet.utils.commands module¶

hatchet.utils.config module¶

class hatchet.utils.config.Config(name, filenames)¶

Bases: object

Methods

init_from_files
read
sections

init_from_files(filenames)¶

read(filename)¶

sections()¶

class hatchet.utils.config.ConfigSection(config, section_proxy)¶

Bases: object

A thin wrapper over a ConfigParser’s SectionProxy object, that tries to infer the types of values, and makes them available as attributes Currently int/float/str are supported.

Methods

items

items()¶

hatchet.utils.count_alleles module¶

class hatchet.utils.count_alleles.AlleleCounter(bcftools, reference, q, Q, mincov, dp, E, snplist, verbose, outdir)¶

Bases: Worker

Methods

countAlleles
run
work

countAlleles(bamfile, samplename, chromosome)¶

work(bamfile, samplename, chromosome)¶

hatchet.utils.count_alleles.checkShift(countA, countB, maxshift)¶

hatchet.utils.count_alleles.counting(bcftools, reference, samples, chromosomes, num_workers, snplist, q, Q, mincov, dp, E, verbose, outdir)¶

hatchet.utils.count_alleles.isHet(countA, countB, gamma)¶

hatchet.utils.count_alleles.main(args=None)¶

hatchet.utils.count_alleles.selectHetSNPs(counts, gamma, maxshift)¶

hatchet.utils.count_reads module¶

hatchet.utils.count_reads.check_array_files(darray, chrs)¶

hatchet.utils.count_reads.check_counts_files(dcounts, chrs, all_names)¶

hatchet.utils.count_reads.count_chromosome(ch, outdir, samtools, bam, sample_name, readquality, compression_level=6)¶

hatchet.utils.count_reads.count_chromosome_wrapper(param)¶

hatchet.utils.count_reads.expected_arrays(darray, chrs)¶

hatchet.utils.count_reads.expected_counts_files(dcounts, chrs, all_names)¶

hatchet.utils.count_reads.form_counts_array(starts_files, perpos_files, thresholds, chromosome, tabix, chunksize=100000.0)¶

NOTE: Assumes that starts_files[i] corresponds to the same sample as perpos_files[i] Parameters:

starts_files: list of <sample>.<chromosome>.starts.gz files each containing a list of start positions perpos_files: list of <sample>.per-base.bed.gz files containing per-position coverage from mosdepth thresholds: list of potential bin start positions (thresholds between SNPs) chromosome: chromosome to extract read counts for

Returns: <n> x <2d> np.ndarray: entry [i, 2j] contains the number of reads starting in (starts[i], starts[i + 1]) in sample j entry [i, 2j + 1] contains the number of reads covering position starts[i] in sample j

hatchet.utils.count_reads.get_chr_end(stem, all_names, chromosome)¶

hatchet.utils.count_reads.main(args=None)¶

hatchet.utils.count_reads.mosdepth_wrapper(params)¶

hatchet.utils.count_reads.read_snps(baf_file, ch, all_names)¶: Read and validate SNP data for this patient (TSV table output from HATCHet deBAF.py).

hatchet.utils.count_reads.run_chromosome(outdir, all_names, chromosome, centromere_start, centromere_end, baf_file, tabix)¶: Construct arrays that contain all counts needed to perform adaptive binning for a single chromosome (across all samples).

hatchet.utils.count_reads.run_chromosome_wrapper(param)¶

hatchet.utils.count_reads.run_mosdepth(outdir, sample_name, bam, threads, mosdepth, readquality)¶

hatchet.utils.count_reads_fw module¶

hatchet.utils.count_reads_fw.knownRegions(refdict, chromosomes)¶

hatchet.utils.count_reads_fw.logArgs(args, width)¶

hatchet.utils.count_reads_fw.main(args=None)¶

hatchet.utils.download_panel module¶

hatchet.utils.download_panel.dwnld_chains(dirpath)¶

hatchet.utils.download_panel.dwnld_refpanel_genome(path)¶

hatchet.utils.download_panel.main(args=None)¶

hatchet.utils.download_panel.mk_rename_file(path)¶

hatchet.utils.genotype_snps module¶

class hatchet.utils.genotype_snps.Caller(task_queue, result_queue, progress_bar, bcftools, reference, q, Q, mincov, dp, E, outdir, snplist, verbose)¶

Bases: Process

Attributes:

authkey
daemon: Return whether process is a daemon
exitcode: Return exit code of process or None if it has yet to stop
ident: Return identifier (PID) of process or None if it has yet to start
name
pid: Return identifier (PID) of process or None if it has yet to start
sentinel: Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

`close`()	Close the Process object.
`is_alive`()	Return whether process is alive
`join`([timeout])	Wait until child process terminates
`kill`()	Terminate process; sends SIGKILL signal or uses TerminateProcess()
`run`()	Method to be run in sub-process; can be overridden in sub-class
`start`()	Start child process
`terminate`()	Terminate process; sends SIGTERM signal or uses TerminateProcess()

callSNPs

callSNPs(bamfile, samplename, chromosome)¶

run()¶: Method to be run in sub-process; can be overridden in sub-class

hatchet.utils.genotype_snps.call(bcftools, reference, samples, chromosomes, num_workers, q, Q, mincov, dp, E, outdir, snplist=None, verbose=False)¶

hatchet.utils.genotype_snps.main(args=None)¶

hatchet.utils.multiprocessing module¶

class hatchet.utils.multiprocessing.TaskHandler(worker, task_queue, result_queue, progress_bar)¶

Bases: Process

Attributes:

authkey
daemon: Return whether process is a daemon
exitcode: Return exit code of process or None if it has yet to stop
ident: Return identifier (PID) of process or None if it has yet to start
name
pid: Return identifier (PID) of process or None if it has yet to start
sentinel: Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

`close`()	Close the Process object.
`is_alive`()	Return whether process is alive
`join`([timeout])	Wait until child process terminates
`kill`()	Terminate process; sends SIGKILL signal or uses TerminateProcess()
`run`()	Method to be run in sub-process; can be overridden in sub-class
`start`()	Start child process
`terminate`()	Terminate process; sends SIGTERM signal or uses TerminateProcess()

run()¶: Method to be run in sub-process; can be overridden in sub-class

class hatchet.utils.multiprocessing.Worker¶

Bases: object

Methods

run
work

run(work, n_instances=None, show_progress=True)¶

work(*args)¶

hatchet.utils.phase_snps module¶

class hatchet.utils.phase_snps.Phaser(panel, outdir, hg19, ref, chains, rename, refvers, chrnot, verbose, bcftools, shapeit, picard, bgzip)¶

Bases: Worker

Methods

biallelic
change_chr
index
liftover
run
run_shapeit
stage_vcfs
work

biallelic(infile, chromosome)¶

change_chr(infile, chromosome, outname, rename)¶

index(infile, chromosome)¶

liftover(infile, chromosome, outname, chain, refgen, ch)¶

run_shapeit(infile, chromosome)¶

stage_vcfs(infile, chromosome)¶

work(*args)¶

hatchet.utils.phase_snps.cleanup(outdir)¶

hatchet.utils.phase_snps.concat(vcfs, outdir, bcftools)¶

hatchet.utils.phase_snps.main(args=None)¶

hatchet.utils.phase_snps.print_log(path, chromosomes)¶

hatchet.utils.plot_bins module¶

hatchet.utils.plot_bins.addchr(pos)¶

hatchet.utils.plot_bins.argmax(d)¶

hatchet.utils.plot_bins.argmin(d)¶

hatchet.utils.plot_bins.baf(bbc, args, out)¶

hatchet.utils.plot_bins.bb(bbc, clusters, args, out)¶

hatchet.utils.plot_bins.clubaf(bbc, clusters, args, out)¶

hatchet.utils.plot_bins.clurdr(bbc, clusters, args, out)¶

hatchet.utils.plot_bins.clus(seg, args, out)¶

hatchet.utils.plot_bins.cluster_bins(bbc, clusters, args, out, clust_order, pal)¶

hatchet.utils.plot_bins.coordinates(args, g=None)¶

hatchet.utils.plot_bins.debug(msg)¶

hatchet.utils.plot_bins.error(msg)¶

hatchet.utils.plot_bins.info(msg)¶

hatchet.utils.plot_bins.isfloat(value)¶

hatchet.utils.plot_bins.join(bbc, clusters, resolution)¶

hatchet.utils.plot_bins.log(msg)¶

hatchet.utils.plot_bins.main(args=None)¶

hatchet.utils.plot_bins.rdr(bbc, args, out)¶

hatchet.utils.plot_bins.readBBC(inp)¶

hatchet.utils.plot_bins.readSEG(inp)¶

hatchet.utils.plot_bins.select(bbc, clusters, args)¶

hatchet.utils.plot_bins.sortchr(x)¶

hatchet.utils.plot_bins.warning(msg)¶

hatchet.utils.plot_bins_1d2d module¶

hatchet.utils.plot_bins_1d2d.main(args=None)¶

hatchet.utils.plot_bins_1d2d.plot_1d(bbc, baf_lim=None, rdr_lim=None, display=False, outdir=None, alpha=1, show_centromeres=False)¶

hatchet.utils.plot_bins_1d2d.plot_2d(bbc, seg=None, show_centers=False, xlim=None, ylim=None, figsize=(4, 4), display=True, outdir=None, alpha=1)¶: For each sample, plot the mBAF and RDR of each bin colored by cluster. Colors will match clusters in corresponding 1D plots.

hatchet.utils.plot_bins_1d2d.plot_track(bb, chr_ends, chr2centro, yval='RD', ylabel=None, display=True, ylim=None, alpha=1, color_field=None, title=None, show_centromeres=False)¶: NOTE: this function assumes that 1) bb contains data for a single sample 2) chromosomes are specified using “chr” notation

hatchet.utils.plot_cn module¶

hatchet.utils.plot_cn.addchr(g, pos, color=None)¶

hatchet.utils.plot_cn.addchrplt(pos)¶

hatchet.utils.plot_cn.allelicprofiles(tumor, clones, props, args, out)¶

hatchet.utils.plot_cn.allelicproportions(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.argmax(d)¶

hatchet.utils.plot_cn.argmin(d)¶

hatchet.utils.plot_cn.cndistance(u, v)¶

hatchet.utils.plot_cn.cnproportions(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.debug(msg)¶

hatchet.utils.plot_cn.error(msg)¶

hatchet.utils.plot_cn.gridmixtures(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.gridprofiles(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.gridprofilesreduced(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.info(msg)¶

hatchet.utils.plot_cn.intergridfullprofiles(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.intergridreducedprofiles(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.intergridsamplesclusters(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.intergridsubclonality(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.interjoin(tumor, clones, resolution)¶

hatchet.utils.plot_cn.interreduction(proj, base)¶

hatchet.utils.plot_cn.isfloat(value)¶

hatchet.utils.plot_cn.join(tumor, clones, resolution)¶

hatchet.utils.plot_cn.log(msg)¶

hatchet.utils.plot_cn.main(args=None)¶

hatchet.utils.plot_cn.multiple(tumor, clones, props, base, args)¶

hatchet.utils.plot_cn.parsing_arguments(args=None)¶: Parse command line arguments Returns:

hatchet.utils.plot_cn.pp(tumor, clones, props, args)¶

hatchet.utils.plot_cn.profiles(tumor, clones, props, args, out)¶

hatchet.utils.plot_cn.readUCN(inputs, patnames)¶

hatchet.utils.plot_cn.reduction(proj, base)¶

hatchet.utils.plot_cn.segmenting(tumor, clones, props)¶

hatchet.utils.plot_cn.similarity(u, v)¶

hatchet.utils.plot_cn.similaritysample(u, v)¶

hatchet.utils.plot_cn.single(tumor, clones, props, base, args)¶

hatchet.utils.plot_cn.sortchr(x)¶

hatchet.utils.plot_cn.subclonal(tumor, base, clones, props, args, out)¶

hatchet.utils.plot_cn.warning(msg)¶

hatchet.utils.plot_cn_1d2d module¶

hatchet.utils.plot_cn_1d2d.cn2evs(cns, props)¶

hatchet.utils.plot_cn_1d2d.cn2total(s)¶

hatchet.utils.plot_cn_1d2d.cn2totals(x)¶

hatchet.utils.plot_cn_1d2d.compute_gamma(bbc)¶

hatchet.utils.plot_cn_1d2d.generate_1D2D_plots(bbc, fcn_lim=None, baf_lim=None, title=None, show_centromeres=False, by_sample=False, outdir=None, resample_balanced=False)¶

hatchet.utils.plot_cn_1d2d.limits_valid(lim)¶

hatchet.utils.plot_cn_1d2d.main(args=None)¶

hatchet.utils.plot_cn_1d2d.plot_clusters(bbc, mapping, figsize=(4, 4), fname=None, dpi=300, xlim=None, ylim=None, save_samples=False, save_prefix=None, coloring='original')¶

hatchet.utils.plot_cn_1d2d.plot_genome(big_bbc, mapping, chr_ends, chr2centro, chromosomes=None, dpi=400, figsize=(8, 5), fname=None, show_centromeres=False, fcn_ylim=None, baf_ylim=None, save_samples=False, save_prefix=None)¶

hatchet.utils.plot_cn_1d2d.recompose_state(l)¶: Read copy-number state vector from list of allele-specific values

hatchet.utils.plot_cn_1d2d.reindex(labels)¶: Given a list of labels, reindex them as integers from 1 to n_labels Labels are in nonincreasing order of prevalence

hatchet.utils.plot_cn_1d2d.str2cn(x)¶

hatchet.utils.plot_cn_1d2d.str2state(cn)¶

hatchet.utils.rd_gccorrect module¶

hatchet.utils.rd_gccorrect.rd_gccorrect(bb, ref_genome)¶: Function to correct GC bias in read depth data for each sample. Parameters: - bb: DataFrame containing read depth data, including columns ‘#CHR’, ‘START’, ‘END’, ‘RD’, ‘SAMPLE’ - ref_genome: File path to the reference genome in FASTA format Return type: DataFrame with corrected read depth data

hatchet.utils.run module¶

hatchet.utils.run.main(args=None)¶

hatchet.utils package¶

Subpackages¶

Submodules¶

hatchet.utils.ArgParsing module¶

hatchet.utils.BAMBinning module¶

hatchet.utils.CoordinateFinding module¶

hatchet.utils.ProgressBar module¶

hatchet.utils.Supporting module¶

hatchet.utils.TotalCounting module¶

hatchet.utils.check module¶

hatchet.utils.check_solver module¶

hatchet.utils.cluster_bins module¶

hatchet.utils.cluster_bins_gmm module¶

hatchet.utils.combine_counts module¶

hatchet.utils.combine_counts_fw module¶

hatchet.utils.commands module¶

hatchet.utils.config module¶

hatchet.utils.count_alleles module¶

hatchet.utils.count_reads module¶

hatchet.utils.count_reads_fw module¶

hatchet.utils.download_panel module¶

hatchet.utils.genotype_snps module¶

hatchet.utils.multiprocessing module¶

hatchet.utils.phase_snps module¶

hatchet.utils.plot_bins module¶

hatchet.utils.plot_bins_1d2d module¶

hatchet.utils.plot_cn module¶

hatchet.utils.plot_cn_1d2d module¶

hatchet.utils.rd_gccorrect module¶

hatchet.utils.run module¶

Module contents¶