hatchet.utils package¶
Subpackages¶
- hatchet.utils.solve package
- Submodules
- hatchet.utils.solve.cd module
- hatchet.utils.solve.ilp_subset module
ILPSubset
ILPSubset.M
ILPSubset.base
ILPSubset.build_random_u()
ILPSubset.build_symmetry_breaking()
ILPSubset.create_model()
ILPSubset.first_hot_start()
ILPSubset.fix_c()
ILPSubset.fix_given_cn()
ILPSubset.fix_u()
ILPSubset.hot_start()
ILPSubset.optimized_cA
ILPSubset.optimized_cB
ILPSubset.optimized_u
ILPSubset.run()
ILPSubset.symmCoeff()
ILPSubsetSplit
- hatchet.utils.solve.utils module
- Module contents
Submodules¶
hatchet.utils.ArgParsing module¶
- hatchet.utils.ArgParsing.extractChromosomes(samtools, normal, tumors, reference=None)¶
- Parameters:
samtools: path to samtools executable normal: tuple of (path to normal BAM file, string name) tumor: list of tuples (path to BAM file, string name) reference: path to FASTA file
- hatchet.utils.ArgParsing.getSQNames(samtools, bamfile)¶
- hatchet.utils.ArgParsing.parseRegions(region_file, chromosomes)¶
- hatchet.utils.ArgParsing.parse_cluster_bins_args(args=None)¶
Parse command line arguments Returns:
- hatchet.utils.ArgParsing.parse_cluster_bins_gmm_args(args=None)¶
- hatchet.utils.ArgParsing.parse_combine_counts_args(args=None)¶
- hatchet.utils.ArgParsing.parse_combine_counts_fw_args(args=None)¶
- hatchet.utils.ArgParsing.parse_count_alleles_arguments(args=None)¶
- hatchet.utils.ArgParsing.parse_count_reads_args(args=None)¶
- hatchet.utils.ArgParsing.parse_count_reads_fw_arguments(args=None)¶
- hatchet.utils.ArgParsing.parse_download_panel_arguments(args=None)¶
- hatchet.utils.ArgParsing.parse_genotype_snps_arguments(args=None)¶
- hatchet.utils.ArgParsing.parse_phase_snps_arguments(args=None)¶
- hatchet.utils.ArgParsing.parse_plot_bins_1d2d_args(args=None)¶
Parse command line arguments for auxiliary cluster plotting command (1D and 2D plot with matching colors and optional labeled centers)
- hatchet.utils.ArgParsing.parse_plot_bins_args(args=None)¶
- hatchet.utils.ArgParsing.parse_plot_cn_1d2d_args(args=None)¶
Parse command line arguments for auxiliary plotting command (1D and 2D plot with labeled copy states)
hatchet.utils.BAMBinning module¶
- class hatchet.utils.BAMBinning.Binner(task_queue, result_queue, progress_bar, samtools, q, size, regions, verbose)¶
Bases:
Process
- Attributes:
- authkey
daemon
Return whether process is a daemon
exitcode
Return exit code of process or None if it has yet to stop
ident
Return identifier (PID) of process or None if it has yet to start
- name
pid
Return identifier (PID) of process or None if it has yet to start
sentinel
Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.
Methods
close
()Close the Process object.
is_alive
()Return whether process is alive
join
([timeout])Wait until child process terminates
kill
()Terminate process; sends SIGKILL signal or uses TerminateProcess()
run
()Method to be run in sub-process; can be overridden in sub-class
start
()Start child process
terminate
()Terminate process; sends SIGTERM signal or uses TerminateProcess()
binChr
- binChr(bamfile, samplename, chromosome)¶
- run()¶
Method to be run in sub-process; can be overridden in sub-class
- hatchet.utils.BAMBinning.bin(samtools, samples, chromosomes, num_workers, q, size, regions, verbose=False)¶
hatchet.utils.CoordinateFinding module¶
- hatchet.utils.CoordinateFinding.binChr(bamfile, sample, seq, size, start=0, end=0, least=-1)¶
- hatchet.utils.CoordinateFinding.extractChr(ref)¶
- hatchet.utils.CoordinateFinding.findEnd(bamfile, seq, least=0)¶
- hatchet.utils.CoordinateFinding.findStart(bamfile, seq, least=0)¶
hatchet.utils.ProgressBar module¶
- class hatchet.utils.ProgressBar.ProgressBar(total, length, counter=0, verbose=False, decimals=1, fill='█', lock=None, prefix='Progress:', suffix='Complete')¶
Bases:
object
Methods
progress
progressLock
progressNoLock
- progress(advance=True, msg='')¶
- progressLock(advance=True, msg='')¶
- progressNoLock(advance=True, msg='')¶
hatchet.utils.Supporting module¶
- hatchet.utils.Supporting.argmax(d)¶
- hatchet.utils.Supporting.argmin(d)¶
- class hatchet.utils.Supporting.bcolors¶
Bases:
object
- BBLUE = '\x1b[96m'¶
- BOLD = '\x1b[1m'¶
- ENDC = '\x1b[0m'¶
- FAIL = '\x1b[91m'¶
- HEADER = '\x1b[95m'¶
- OKBLUE = '\x1b[94m'¶
- OKGREEN = '\x1b[92m'¶
- UNDERLINE = '\x1b[4m'¶
- WARNING = '\x1b[93m'¶
- hatchet.utils.Supporting.checksum(filepath)¶
- hatchet.utils.Supporting.close(msg)¶
- hatchet.utils.Supporting.digits(string)¶
- hatchet.utils.Supporting.download(url, dirpath, overwrite=False, extract=True, sentinel_file=None, chunk_size=8192)¶
- hatchet.utils.Supporting.ensure(pred, msg, exception_class=<class 'ValueError'>)¶
- hatchet.utils.Supporting.error(msg, raise_exception=False, exception_class=<class 'ValueError'>)¶
- hatchet.utils.Supporting.log(msg, level=None, lock=None, raise_exception=False, exception_class=<class 'ValueError'>)¶
- hatchet.utils.Supporting.logArgs(args, width=40)¶
- hatchet.utils.Supporting.naturalOrder(text)¶
- hatchet.utils.Supporting.numericOrder(text)¶
- hatchet.utils.Supporting.run(commands, stdouterr_filepath=None, check_return_codes=True, error_msg=None, stdouterr_filepath_autoremove=True, **kwargs)¶
- hatchet.utils.Supporting.to_tuple(s, n=2, typ=<class 'int'>, error_message=None)¶
- hatchet.utils.Supporting.url_exists(path)¶
- hatchet.utils.Supporting.which(program)¶
hatchet.utils.TotalCounting module¶
- class hatchet.utils.TotalCounting.TotalCounter(task_queue, result_queue, progress_bar, samtools, q, verbose)¶
Bases:
Process
- Attributes:
- authkey
daemon
Return whether process is a daemon
exitcode
Return exit code of process or None if it has yet to stop
ident
Return identifier (PID) of process or None if it has yet to start
- name
pid
Return identifier (PID) of process or None if it has yet to start
sentinel
Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.
Methods
close
()Close the Process object.
is_alive
()Return whether process is alive
join
([timeout])Wait until child process terminates
kill
()Terminate process; sends SIGKILL signal or uses TerminateProcess()
run
()Method to be run in sub-process; can be overridden in sub-class
start
()Start child process
terminate
()Terminate process; sends SIGTERM signal or uses TerminateProcess()
binChr
- binChr(bamfile, samplename, chromosome)¶
- run()¶
Method to be run in sub-process; can be overridden in sub-class
- hatchet.utils.TotalCounting.tcount(samtools, samples, chromosomes, num_workers, q, verbose=False)¶
hatchet.utils.check module¶
- hatchet.utils.check.main(hatchet_cmds=None)¶
- hatchet.utils.check.suppress_stdout()¶
hatchet.utils.check_solver module¶
- hatchet.utils.check_solver.main(args=None)¶
hatchet.utils.cluster_bins module¶
- class hatchet.utils.cluster_bins.DiagGHMM(n_components=1, covariance_type='diag', min_covar=0.001, startprob_prior=1.0, transmat_prior=1.0, means_prior=0, means_weight=0, covars_prior=0.01, covars_weight=1, algorithm='viterbi', random_state=None, n_iter=10, tol=0.01, verbose=False, params='stmc', init_params='stmc', implementation='log')¶
Bases:
GaussianHMM
- Parameters:
- n_componentsint
Number of states.
- covariance_type{“spherical”, “diag”, “full”, “tied”}, optional
The type of covariance parameters to use:
“spherical” — each state uses a single variance value that applies to all features (default).
“diag” — each state uses a diagonal covariance matrix.
“full” — each state uses a full (i.e. unrestricted) covariance matrix.
“tied” — all states use the same full covariance matrix.
- min_covarfloat, optional
Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.
- startprob_priorarray, shape (n_components, ), optional
Parameters of the Dirichlet prior distribution for
startprob_
.- transmat_priorarray, shape (n_components, n_components), optional
Parameters of the Dirichlet prior distribution for each row of the transition probabilities
transmat_
.- means_prior, means_weightarray, shape (n_components, ), optional
Mean and precision of the Normal prior distribtion for
means_
.- covars_prior, covars_weightarray, shape (n_components, ), optional
Parameters of the prior distribution for the covariance matrix
covars_
.If
covariance_type
is “spherical” or “diag” the prior is the inverse gamma distribution, otherwise — the inverse Wishart distribution.- algorithm{“viterbi”, “map”}, optional
Decoder algorithm.
“viterbi”: finds the most likely sequence of states, given all emissions.
“map” (also known as smoothing or forward-backward): finds the sequence of the individual most-likely states, given all emissions.
- random_state: RandomState or an int seed, optional
A random number generator instance.
- n_iterint, optional
Maximum number of iterations to perform.
- tolfloat, optional
Convergence threshold. EM will stop if the gain in log-likelihood is below this value.
- verbosebool, optional
Whether per-iteration convergence reports are printed to
sys.stderr
. Convergence can also be diagnosed using themonitor_
attribute.- params, init_paramsstring, optional
The parameters that get updated during (
params
) or initialized before (init_params
) the training. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars. Defaults to all parameters.- implementationstring, optional
Determines if the forward-backward algorithm is implemented with logarithms (“log”), or using scaling (“scaling”). The default is to use logarithms for backwards compatability.
- Attributes:
covars_
Return covars as a full matrix.
Methods
aic
(X[, lengths])Akaike information criterion for the current model on the input X.
bic
(X[, lengths])Bayesian information criterion for the current model on the input X.
decode
(X[, lengths, algorithm])Find most likely state sequence corresponding to
X
.fit
(X[, lengths])Estimate model parameters.
get_metadata_routing
()Get metadata routing of this object.
get_params
([deep])Get parameters for this estimator.
get_stationary_distribution
()Compute the stationary distribution of states.
predict
(X[, lengths])Find most likely state sequence corresponding to
X
.predict_proba
(X[, lengths])Compute the posterior probability for each state in the model.
sample
([n_samples, random_state, currstate])Generate random samples from the model.
score
(X[, lengths])Compute the log probability under the model.
score_samples
(X[, lengths])Compute the log probability under the model and compute posteriors.
set_fit_request
(*[, lengths])Request metadata passed to the
fit
method.set_params
(**params)Set the parameters of this estimator.
set_predict_proba_request
(*[, lengths])Request metadata passed to the
predict_proba
method.set_predict_request
(*[, lengths])Request metadata passed to the
predict
method.set_score_request
(*[, lengths])Request metadata passed to the
score
method.form_transition_matrix
- form_transition_matrix(diag)¶
- set_fit_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
lengths
parameter infit
.
- Returns:
- selfobject
The updated object.
- set_predict_proba_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM ¶
Request metadata passed to the
predict_proba
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict_proba
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict_proba
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
lengths
parameter inpredict_proba
.
- Returns:
- selfobject
The updated object.
- set_predict_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM ¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
lengths
parameter inpredict
.
- Returns:
- selfobject
The updated object.
- set_score_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
lengths
parameter inscore
.
- Returns:
- selfobject
The updated object.
- hatchet.utils.cluster_bins.form_seg(bbc, balanced_threshold)¶
- hatchet.utils.cluster_bins.hmm_model_select(tracks, minK=20, maxK=50, tau=1e-05, tmat='diag', decode_alg='viterbi', covar='diag', state_selection='bic', restarts=10)¶
- hatchet.utils.cluster_bins.main(args=None)¶
- hatchet.utils.cluster_bins.make_transmat(diag, K)¶
- hatchet.utils.cluster_bins.read_bb(bbfile, subset=None, allow_gaps=False)¶
Constructs arrays to represent the bin in each chromosome or arm. If bbfile was binned around chromosome arm, then uses chromosome arms. Otherwise, uses chromosomes.
- Returns:
- botht: list of np.ndarrays of size (n_bins, n_tracks)
where n_tracks = n_samples * 2
bb: table read from input bbfile sample_labels: order in which samples are represented in each array in botht chr_lables: order in which chromosomes or arms are represented in botht
each array contains 1 track per sample for a single chromosome arm.
- hatchet.utils.cluster_bins.reindex(labels)¶
Given a list of labels, reindex them as integers from 1 to n_labels Also orders them in nonincreasing order of prevalence
hatchet.utils.cluster_bins_gmm module¶
- hatchet.utils.cluster_bins_gmm.cluster(points, clouds=None, concentration_prior=None, K=100, restarts=10, seed=0)¶
Clusters a set of data points lying in an arbitrary number of clusters. Arguments:
data (list of lists of floats): list of data points to be clustered. clouds (list or lists of floats, same second dimension as data): bootstrapped bins for clustering sampleName (string): The name of the input sample. concentration_prior (float): Tuning parameter for clustering, must be between 0 and 1. Used to determine
concentration of points in clusters – higher favors more clusters, lower favors fewer clusters.
K (int): maximum number of clusters to infer restarts (int): number of initializations to try for GMM seed (int): random number generator seed for GMM
- Returns:
mus (list of lists of floats): List of cluster means. sigmas (list of 2D lists of floats): List of cluster covariances. clusterAssignments (list of ints): The assignment of each interval to a cluster, where an entry
j at index i means the ith interval has been assigned to the jth meta-interval.
numPoints (list of ints): Number of points assigned to each cluster numClusters (int): The number of clusters.
- hatchet.utils.cluster_bins_gmm.generateClouds(points, density, seed, sdeven=0.02, sdodd=0.02)¶
- hatchet.utils.cluster_bins_gmm.getPoints(data, samples)¶
- hatchet.utils.cluster_bins_gmm.main(args=None)¶
- hatchet.utils.cluster_bins_gmm.minSegmentBins(sbb, nbins, rd, nsnps, cov, clusters, samples)¶
- hatchet.utils.cluster_bins_gmm.readBB(bbfile)¶
- hatchet.utils.cluster_bins_gmm.refineClustering(combo, assign, assignidx, samples, rdtol, baftol)¶
- hatchet.utils.cluster_bins_gmm.reindex(labels)¶
Given a list of labels, reindex them as integers from 1 to n_labels Also orders them in nonincreasing order of prevalence
- hatchet.utils.cluster_bins_gmm.roundAlphasBetas(baf, alpha, beta)¶
- hatchet.utils.cluster_bins_gmm.scaleBAF(segments, samples, diploidbaf)¶
- hatchet.utils.cluster_bins_gmm.segmentBins(bb, clusters, samples)¶
- hatchet.utils.cluster_bins_gmm.splitBAF(baf, scale)¶
hatchet.utils.combine_counts module¶
- hatchet.utils.combine_counts.EM(totals_in, alts_in, start, tol=1e-06)¶
Adapted from chisel/Combiner.py
- hatchet.utils.combine_counts.adaptive_bins_arm(snp_thresholds, total_counts, snp_positions, snp_counts, min_snp_reads=2000, min_total_reads=5000)¶
Compute adaptive bins for a single chromosome arm. Parameters:
snp_thresholds: length <n> array of 1-based genomic positions of candidate bin thresholds
- total_counts: <n> x <2d> np.ndarray
entry [i, 2j] contains the number of reads starting in [snp_thresholds[i], snp_thresholds[i + 1]) in sample j (only the first n-1 positions are populated) entry [i, 2j + 1] contains the number of reads covering position snp_thresholds[i] in sample j
- snp_positions: length <m> list of 1-based genomic positions of SNPs
NOTE: this function requires that m = n-1 for convenience of programming (could be relaxed in a different implementation)
- snp_counts: <m> x <d> np.ndarray containing the number of overlapping reads at each of the <n - 1> snp
positions in <d> samples
min_snp_reads: the minimum number of SNP-covering reads required in each bin and each sample min_total_reads: the minimum number of total reads required in each bin and each sample
- hatchet.utils.combine_counts.apply_EM(totals_in, alts_in)¶
- hatchet.utils.combine_counts.backtrack(bp)¶
- hatchet.utils.combine_counts.binom_prop_test(alt1, ref1, flip1, alt2, ref2, flip2, alpha=0.1)¶
Returns True if there is sufficient evidence that SNPs 1 and 2 should not be merged, False otherwise.
- hatchet.utils.combine_counts.block_segment(df, blocksize, max_snps_per_block)¶
Given a pandas dataframe containing read counts for a contiguous segment of SNPs, collects SNPs into phase blocks of size at most <blocksize> containing at most <max_snps_per_block> SNPs each.
- hatchet.utils.combine_counts.collapse_blocks(df, blocks, singletons, orphans, ch)¶
- hatchet.utils.combine_counts.compute_baf_task_multi(bin_snps, blocksize, max_snps_per_block, test_alpha)¶
Estimates the BAF for the bin containing exactly <bin_snps> SNPs. <bin_snps> is a dataframe with at least ALT and REF columns containing read counts. <blocksize>, <max_snps_per_block>, and <test_alpha> are used only for constructing phase blocks.
- hatchet.utils.combine_counts.compute_baf_task_single(bin_snps, blocksize, max_snps_per_block, test_alpha)¶
Estimates the BAF for the bin containing exactly <bin_snps> SNPs. <bin_snps> is a dataframe with at least ALT and REF columns containing read counts. <blocksize>, <max_snps_per_block>, and <test_alpha> are used only for constructing phase blocks.
- hatchet.utils.combine_counts.compute_baf_wrapper(bin_snps, blocksize, max_snps_per_block, test_alpha, multisample)¶
- hatchet.utils.combine_counts.consecutive(data, stepsize=1)¶
- hatchet.utils.combine_counts.correct_haplotypes(orig_bafs, min_prop_switch=0.01, n_segments=20, min_switch_density=0.1, min_mean_baf=0.48, minmax_al_imb=0.02)¶
- hatchet.utils.combine_counts.get_chr_end(stem, chromosome)¶
- hatchet.utils.combine_counts.main(args=None)¶
- hatchet.utils.combine_counts.merge_data(bins, dfs, bafs, sample_names, chromosome)¶
Merge bins data (starts, ends, total counts, RDRs) with SNP data and BAF data for each bin. Parameters: bins: output from call to adaptive_bins_arm dfs: (only for troubleshooting) pandas DataFrame, each containing the SNP information for the corresponding bin bafs: the ith element is the output from compute_baf_task(dfs[i])
Produces a BB file with a few additional columns.
- hatchet.utils.combine_counts.merge_phasing(_, all_phase_data)¶
Merge phasing results across all samples: if a pair of SNPs is split in any sample, they won’t be split.
- hatchet.utils.combine_counts.multisample_em(alts, refs, start, tol=1e-05)¶
- hatchet.utils.combine_counts.phase_blocks_sequential(df, blocksize=50000.0, max_snps_per_block=10, alpha=0)¶
- hatchet.utils.combine_counts.read_snps(baf_file, ch, all_names, phasefile=None)¶
Read and validate SNP data for this patient (TSV table output from HATCHet deBAF.py).
- hatchet.utils.combine_counts.run_chromosome(baffile, all_names, chromosome, outfile, centromere_start, centromere_end, min_snp_reads, min_total_reads, arraystem, xy, multisample, phasefile, blocksize, max_snps_per_block, test_alpha)¶
Perform adaptive binning and infer BAFs to produce a HATCHet BB file for a single chromosome.
- hatchet.utils.combine_counts.run_chromosome_wrapper(param)¶
- hatchet.utils.combine_counts.segmented_piecewise(X, pieces=2)¶
hatchet.utils.combine_counts_fw module¶
- hatchet.utils.combine_counts_fw.blocking(L, sample, phase, blocksize)¶
- hatchet.utils.combine_counts_fw.combine(normalbins, tumorbins, tumorbafs, diploidbaf, totalcounts, chromosomes, samples, normal, gamma, verbose=False, disable=False, phase=None, block=None)¶
- hatchet.utils.combine_counts_fw.computeBAFs(partition, samples, diploidbaf, phase=None, block=0)¶
- hatchet.utils.combine_counts_fw.main(args=None)¶
- hatchet.utils.combine_counts_fw.readBAFs(tumor)¶
- hatchet.utils.combine_counts_fw.readBINs(normalbins, tumorbins)¶
- hatchet.utils.combine_counts_fw.readPhase(f)¶
- hatchet.utils.combine_counts_fw.readTotalCounts(filename, samples, normal)¶
- hatchet.utils.combine_counts_fw.splitBAF(baf, scale)¶
hatchet.utils.commands module¶
hatchet.utils.config module¶
hatchet.utils.count_alleles module¶
- class hatchet.utils.count_alleles.AlleleCounter(bcftools, reference, q, Q, mincov, dp, E, snplist, verbose, outdir)¶
Bases:
Worker
Methods
countAlleles
run
work
- countAlleles(bamfile, samplename, chromosome)¶
- work(bamfile, samplename, chromosome)¶
- hatchet.utils.count_alleles.checkShift(countA, countB, maxshift)¶
- hatchet.utils.count_alleles.counting(bcftools, reference, samples, chromosomes, num_workers, snplist, q, Q, mincov, dp, E, verbose, outdir)¶
- hatchet.utils.count_alleles.isHet(countA, countB, gamma)¶
- hatchet.utils.count_alleles.main(args=None)¶
- hatchet.utils.count_alleles.selectHetSNPs(counts, gamma, maxshift)¶
hatchet.utils.count_reads module¶
- hatchet.utils.count_reads.check_array_files(darray, chrs)¶
- hatchet.utils.count_reads.check_counts_files(dcounts, chrs, all_names)¶
- hatchet.utils.count_reads.count_chromosome(ch, outdir, samtools, bam, sample_name, readquality, compression_level=6)¶
- hatchet.utils.count_reads.count_chromosome_wrapper(param)¶
- hatchet.utils.count_reads.expected_arrays(darray, chrs)¶
- hatchet.utils.count_reads.expected_counts_files(dcounts, chrs, all_names)¶
- hatchet.utils.count_reads.form_counts_array(starts_files, perpos_files, thresholds, chromosome, tabix, chunksize=100000.0)¶
NOTE: Assumes that starts_files[i] corresponds to the same sample as perpos_files[i] Parameters:
starts_files: list of <sample>.<chromosome>.starts.gz files each containing a list of start positions perpos_files: list of <sample>.per-base.bed.gz files containing per-position coverage from mosdepth thresholds: list of potential bin start positions (thresholds between SNPs) chromosome: chromosome to extract read counts for
- Returns: <n> x <2d> np.ndarray
entry [i, 2j] contains the number of reads starting in (starts[i], starts[i + 1]) in sample j entry [i, 2j + 1] contains the number of reads covering position starts[i] in sample j
- hatchet.utils.count_reads.get_chr_end(stem, all_names, chromosome)¶
- hatchet.utils.count_reads.main(args=None)¶
- hatchet.utils.count_reads.mosdepth_wrapper(params)¶
- hatchet.utils.count_reads.read_snps(baf_file, ch, all_names)¶
Read and validate SNP data for this patient (TSV table output from HATCHet deBAF.py).
- hatchet.utils.count_reads.run_chromosome(outdir, all_names, chromosome, centromere_start, centromere_end, baf_file, tabix)¶
Construct arrays that contain all counts needed to perform adaptive binning for a single chromosome (across all samples).
- hatchet.utils.count_reads.run_chromosome_wrapper(param)¶
- hatchet.utils.count_reads.run_mosdepth(outdir, sample_name, bam, threads, mosdepth, readquality)¶
hatchet.utils.count_reads_fw module¶
- hatchet.utils.count_reads_fw.knownRegions(refdict, chromosomes)¶
- hatchet.utils.count_reads_fw.logArgs(args, width)¶
- hatchet.utils.count_reads_fw.main(args=None)¶
hatchet.utils.download_panel module¶
- hatchet.utils.download_panel.dwnld_chains(dirpath)¶
- hatchet.utils.download_panel.dwnld_refpanel_genome(path)¶
- hatchet.utils.download_panel.main(args=None)¶
- hatchet.utils.download_panel.mk_rename_file(path)¶
hatchet.utils.genotype_snps module¶
- class hatchet.utils.genotype_snps.Caller(task_queue, result_queue, progress_bar, bcftools, reference, q, Q, mincov, dp, E, outdir, snplist, verbose)¶
Bases:
Process
- Attributes:
- authkey
daemon
Return whether process is a daemon
exitcode
Return exit code of process or None if it has yet to stop
ident
Return identifier (PID) of process or None if it has yet to start
- name
pid
Return identifier (PID) of process or None if it has yet to start
sentinel
Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.
Methods
close
()Close the Process object.
is_alive
()Return whether process is alive
join
([timeout])Wait until child process terminates
kill
()Terminate process; sends SIGKILL signal or uses TerminateProcess()
run
()Method to be run in sub-process; can be overridden in sub-class
start
()Start child process
terminate
()Terminate process; sends SIGTERM signal or uses TerminateProcess()
callSNPs
- callSNPs(bamfile, samplename, chromosome)¶
- run()¶
Method to be run in sub-process; can be overridden in sub-class
- hatchet.utils.genotype_snps.call(bcftools, reference, samples, chromosomes, num_workers, q, Q, mincov, dp, E, outdir, snplist=None, verbose=False)¶
- hatchet.utils.genotype_snps.main(args=None)¶
hatchet.utils.multiprocessing module¶
- class hatchet.utils.multiprocessing.TaskHandler(worker, task_queue, result_queue, progress_bar)¶
Bases:
Process
- Attributes:
- authkey
daemon
Return whether process is a daemon
exitcode
Return exit code of process or None if it has yet to stop
ident
Return identifier (PID) of process or None if it has yet to start
- name
pid
Return identifier (PID) of process or None if it has yet to start
sentinel
Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.
Methods
close
()Close the Process object.
is_alive
()Return whether process is alive
join
([timeout])Wait until child process terminates
kill
()Terminate process; sends SIGKILL signal or uses TerminateProcess()
run
()Method to be run in sub-process; can be overridden in sub-class
start
()Start child process
terminate
()Terminate process; sends SIGTERM signal or uses TerminateProcess()
- run()¶
Method to be run in sub-process; can be overridden in sub-class
hatchet.utils.phase_snps module¶
- class hatchet.utils.phase_snps.Phaser(panel, outdir, hg19, ref, chains, rename, refvers, chrnot, verbose, bcftools, shapeit, picard, bgzip)¶
Bases:
Worker
Methods
biallelic
change_chr
index
liftover
run
run_shapeit
stage_vcfs
work
- biallelic(infile, chromosome)¶
- change_chr(infile, chromosome, outname, rename)¶
- index(infile, chromosome)¶
- liftover(infile, chromosome, outname, chain, refgen, ch)¶
- run_shapeit(infile, chromosome)¶
- stage_vcfs(infile, chromosome)¶
- work(*args)¶
- hatchet.utils.phase_snps.cleanup(outdir)¶
- hatchet.utils.phase_snps.concat(vcfs, outdir, bcftools)¶
- hatchet.utils.phase_snps.main(args=None)¶
- hatchet.utils.phase_snps.print_log(path, chromosomes)¶
hatchet.utils.plot_bins module¶
- hatchet.utils.plot_bins.addchr(pos)¶
- hatchet.utils.plot_bins.argmax(d)¶
- hatchet.utils.plot_bins.argmin(d)¶
- hatchet.utils.plot_bins.baf(bbc, args, out)¶
- hatchet.utils.plot_bins.bb(bbc, clusters, args, out)¶
- hatchet.utils.plot_bins.clubaf(bbc, clusters, args, out)¶
- hatchet.utils.plot_bins.clurdr(bbc, clusters, args, out)¶
- hatchet.utils.plot_bins.clus(seg, args, out)¶
- hatchet.utils.plot_bins.cluster_bins(bbc, clusters, args, out, clust_order, pal)¶
- hatchet.utils.plot_bins.coordinates(args, g=None)¶
- hatchet.utils.plot_bins.debug(msg)¶
- hatchet.utils.plot_bins.error(msg)¶
- hatchet.utils.plot_bins.info(msg)¶
- hatchet.utils.plot_bins.isfloat(value)¶
- hatchet.utils.plot_bins.join(bbc, clusters, resolution)¶
- hatchet.utils.plot_bins.log(msg)¶
- hatchet.utils.plot_bins.main(args=None)¶
- hatchet.utils.plot_bins.rdr(bbc, args, out)¶
- hatchet.utils.plot_bins.readBBC(inp)¶
- hatchet.utils.plot_bins.readSEG(inp)¶
- hatchet.utils.plot_bins.select(bbc, clusters, args)¶
- hatchet.utils.plot_bins.sortchr(x)¶
- hatchet.utils.plot_bins.warning(msg)¶
hatchet.utils.plot_bins_1d2d module¶
- hatchet.utils.plot_bins_1d2d.main(args=None)¶
- hatchet.utils.plot_bins_1d2d.plot_1d(bbc, baf_lim=None, rdr_lim=None, display=False, outdir=None, alpha=1, show_centromeres=False)¶
- hatchet.utils.plot_bins_1d2d.plot_2d(bbc, seg=None, show_centers=False, xlim=None, ylim=None, figsize=(4, 4), display=True, outdir=None, alpha=1)¶
For each sample, plot the mBAF and RDR of each bin colored by cluster. Colors will match clusters in corresponding 1D plots.
- hatchet.utils.plot_bins_1d2d.plot_track(bb, chr_ends, chr2centro, yval='RD', ylabel=None, display=True, ylim=None, alpha=1, color_field=None, title=None, show_centromeres=False)¶
NOTE: this function assumes that 1) bb contains data for a single sample 2) chromosomes are specified using “chr” notation
hatchet.utils.plot_cn module¶
- hatchet.utils.plot_cn.addchr(g, pos, color=None)¶
- hatchet.utils.plot_cn.addchrplt(pos)¶
- hatchet.utils.plot_cn.allelicprofiles(tumor, clones, props, args, out)¶
- hatchet.utils.plot_cn.allelicproportions(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.argmax(d)¶
- hatchet.utils.plot_cn.argmin(d)¶
- hatchet.utils.plot_cn.cndistance(u, v)¶
- hatchet.utils.plot_cn.cnproportions(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.debug(msg)¶
- hatchet.utils.plot_cn.error(msg)¶
- hatchet.utils.plot_cn.gridmixtures(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.gridprofiles(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.gridprofilesreduced(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.info(msg)¶
- hatchet.utils.plot_cn.intergridfullprofiles(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.intergridreducedprofiles(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.intergridsamplesclusters(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.intergridsubclonality(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.interjoin(tumor, clones, resolution)¶
- hatchet.utils.plot_cn.interreduction(proj, base)¶
- hatchet.utils.plot_cn.isfloat(value)¶
- hatchet.utils.plot_cn.join(tumor, clones, resolution)¶
- hatchet.utils.plot_cn.log(msg)¶
- hatchet.utils.plot_cn.main(args=None)¶
- hatchet.utils.plot_cn.multiple(tumor, clones, props, base, args)¶
- hatchet.utils.plot_cn.parsing_arguments(args=None)¶
Parse command line arguments Returns:
- hatchet.utils.plot_cn.pp(tumor, clones, props, args)¶
- hatchet.utils.plot_cn.profiles(tumor, clones, props, args, out)¶
- hatchet.utils.plot_cn.readUCN(inputs, patnames)¶
- hatchet.utils.plot_cn.reduction(proj, base)¶
- hatchet.utils.plot_cn.segmenting(tumor, clones, props)¶
- hatchet.utils.plot_cn.similarity(u, v)¶
- hatchet.utils.plot_cn.similaritysample(u, v)¶
- hatchet.utils.plot_cn.single(tumor, clones, props, base, args)¶
- hatchet.utils.plot_cn.sortchr(x)¶
- hatchet.utils.plot_cn.subclonal(tumor, base, clones, props, args, out)¶
- hatchet.utils.plot_cn.warning(msg)¶
hatchet.utils.plot_cn_1d2d module¶
- hatchet.utils.plot_cn_1d2d.cn2evs(cns, props)¶
- hatchet.utils.plot_cn_1d2d.cn2total(s)¶
- hatchet.utils.plot_cn_1d2d.cn2totals(x)¶
- hatchet.utils.plot_cn_1d2d.compute_gamma(bbc)¶
- hatchet.utils.plot_cn_1d2d.generate_1D2D_plots(bbc, fcn_lim=None, baf_lim=None, title=None, show_centromeres=False, by_sample=False, outdir=None, resample_balanced=False)¶
- hatchet.utils.plot_cn_1d2d.limits_valid(lim)¶
- hatchet.utils.plot_cn_1d2d.main(args=None)¶
- hatchet.utils.plot_cn_1d2d.plot_clusters(bbc, mapping, figsize=(4, 4), fname=None, dpi=300, xlim=None, ylim=None, save_samples=False, save_prefix=None, coloring='original')¶
- hatchet.utils.plot_cn_1d2d.plot_genome(big_bbc, mapping, chr_ends, chr2centro, chromosomes=None, dpi=400, figsize=(8, 5), fname=None, show_centromeres=False, fcn_ylim=None, baf_ylim=None, save_samples=False, save_prefix=None)¶
- hatchet.utils.plot_cn_1d2d.recompose_state(l)¶
Read copy-number state vector from list of allele-specific values
- hatchet.utils.plot_cn_1d2d.reindex(labels)¶
Given a list of labels, reindex them as integers from 1 to n_labels Labels are in nonincreasing order of prevalence
- hatchet.utils.plot_cn_1d2d.str2cn(x)¶
- hatchet.utils.plot_cn_1d2d.str2state(cn)¶
hatchet.utils.rd_gccorrect module¶
- hatchet.utils.rd_gccorrect.rd_gccorrect(bb, ref_genome)¶
Function to correct GC bias in read depth data for each sample. Parameters: - bb: DataFrame containing read depth data, including columns ‘#CHR’, ‘START’, ‘END’, ‘RD’, ‘SAMPLE’ - ref_genome: File path to the reference genome in FASTA format Return type: DataFrame with corrected read depth data
hatchet.utils.run module¶
- hatchet.utils.run.main(args=None)¶