hatchet.utils package

Subpackages

Submodules

hatchet.utils.ArgParsing module

hatchet.utils.ArgParsing.extractChromosomes(samtools, normal, tumors, reference=None)
Parameters:

samtools: path to samtools executable normal: tuple of (path to normal BAM file, string name) tumor: list of tuples (path to BAM file, string name) reference: path to FASTA file

hatchet.utils.ArgParsing.getSQNames(samtools, bamfile)
hatchet.utils.ArgParsing.parseRegions(region_file, chromosomes)
hatchet.utils.ArgParsing.parse_cluster_bins_args(args=None)

Parse command line arguments Returns:

hatchet.utils.ArgParsing.parse_cluster_bins_gmm_args(args=None)
hatchet.utils.ArgParsing.parse_combine_counts_args(args=None)
hatchet.utils.ArgParsing.parse_combine_counts_fw_args(args=None)
hatchet.utils.ArgParsing.parse_count_alleles_arguments(args=None)
hatchet.utils.ArgParsing.parse_count_reads_args(args=None)
hatchet.utils.ArgParsing.parse_count_reads_fw_arguments(args=None)
hatchet.utils.ArgParsing.parse_download_panel_arguments(args=None)
hatchet.utils.ArgParsing.parse_genotype_snps_arguments(args=None)
hatchet.utils.ArgParsing.parse_phase_snps_arguments(args=None)
hatchet.utils.ArgParsing.parse_plot_bins_1d2d_args(args=None)

Parse command line arguments for auxiliary cluster plotting command (1D and 2D plot with matching colors and optional labeled centers)

hatchet.utils.ArgParsing.parse_plot_bins_args(args=None)
hatchet.utils.ArgParsing.parse_plot_cn_1d2d_args(args=None)

Parse command line arguments for auxiliary plotting command (1D and 2D plot with labeled copy states)

hatchet.utils.BAMBinning module

class hatchet.utils.BAMBinning.Binner(task_queue, result_queue, progress_bar, samtools, q, size, regions, verbose)

Bases: Process

Attributes:
authkey
daemon

Return whether process is a daemon

exitcode

Return exit code of process or None if it has yet to stop

ident

Return identifier (PID) of process or None if it has yet to start

name
pid

Return identifier (PID) of process or None if it has yet to start

sentinel

Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

close()

Close the Process object.

is_alive()

Return whether process is alive

join([timeout])

Wait until child process terminates

kill()

Terminate process; sends SIGKILL signal or uses TerminateProcess()

run()

Method to be run in sub-process; can be overridden in sub-class

start()

Start child process

terminate()

Terminate process; sends SIGTERM signal or uses TerminateProcess()

binChr

binChr(bamfile, samplename, chromosome)
run()

Method to be run in sub-process; can be overridden in sub-class

hatchet.utils.BAMBinning.bin(samtools, samples, chromosomes, num_workers, q, size, regions, verbose=False)

hatchet.utils.CoordinateFinding module

hatchet.utils.CoordinateFinding.binChr(bamfile, sample, seq, size, start=0, end=0, least=-1)
hatchet.utils.CoordinateFinding.extractChr(ref)
hatchet.utils.CoordinateFinding.findEnd(bamfile, seq, least=0)
hatchet.utils.CoordinateFinding.findStart(bamfile, seq, least=0)

hatchet.utils.ProgressBar module

class hatchet.utils.ProgressBar.ProgressBar(total, length, counter=0, verbose=False, decimals=1, fill='█', lock=None, prefix='Progress:', suffix='Complete')

Bases: object

Methods

progress

progressLock

progressNoLock

progress(advance=True, msg='')
progressLock(advance=True, msg='')
progressNoLock(advance=True, msg='')

hatchet.utils.Supporting module

hatchet.utils.Supporting.argmax(d)
hatchet.utils.Supporting.argmin(d)
class hatchet.utils.Supporting.bcolors

Bases: object

BBLUE = '\x1b[96m'
BOLD = '\x1b[1m'
ENDC = '\x1b[0m'
FAIL = '\x1b[91m'
HEADER = '\x1b[95m'
OKBLUE = '\x1b[94m'
OKGREEN = '\x1b[92m'
UNDERLINE = '\x1b[4m'
WARNING = '\x1b[93m'
hatchet.utils.Supporting.checksum(filepath)
hatchet.utils.Supporting.close(msg)
hatchet.utils.Supporting.digits(string)
hatchet.utils.Supporting.download(url, dirpath, overwrite=False, extract=True, sentinel_file=None, chunk_size=8192)
hatchet.utils.Supporting.ensure(pred, msg, exception_class=<class 'ValueError'>)
hatchet.utils.Supporting.error(msg, raise_exception=False, exception_class=<class 'ValueError'>)
hatchet.utils.Supporting.log(msg, level=None, lock=None, raise_exception=False, exception_class=<class 'ValueError'>)
hatchet.utils.Supporting.logArgs(args, width=40)
hatchet.utils.Supporting.naturalOrder(text)
hatchet.utils.Supporting.numericOrder(text)
hatchet.utils.Supporting.run(commands, stdouterr_filepath=None, check_return_codes=True, error_msg=None, stdouterr_filepath_autoremove=True, **kwargs)
hatchet.utils.Supporting.to_tuple(s, n=2, typ=<class 'int'>, error_message=None)
hatchet.utils.Supporting.url_exists(path)
hatchet.utils.Supporting.which(program)

hatchet.utils.TotalCounting module

class hatchet.utils.TotalCounting.TotalCounter(task_queue, result_queue, progress_bar, samtools, q, verbose)

Bases: Process

Attributes:
authkey
daemon

Return whether process is a daemon

exitcode

Return exit code of process or None if it has yet to stop

ident

Return identifier (PID) of process or None if it has yet to start

name
pid

Return identifier (PID) of process or None if it has yet to start

sentinel

Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

close()

Close the Process object.

is_alive()

Return whether process is alive

join([timeout])

Wait until child process terminates

kill()

Terminate process; sends SIGKILL signal or uses TerminateProcess()

run()

Method to be run in sub-process; can be overridden in sub-class

start()

Start child process

terminate()

Terminate process; sends SIGTERM signal or uses TerminateProcess()

binChr

binChr(bamfile, samplename, chromosome)
run()

Method to be run in sub-process; can be overridden in sub-class

hatchet.utils.TotalCounting.tcount(samtools, samples, chromosomes, num_workers, q, verbose=False)

hatchet.utils.check module

hatchet.utils.check.main(hatchet_cmds=None)
hatchet.utils.check.suppress_stdout()

hatchet.utils.check_solver module

hatchet.utils.check_solver.main(args=None)

hatchet.utils.cluster_bins module

class hatchet.utils.cluster_bins.DiagGHMM(n_components=1, covariance_type='diag', min_covar=0.001, startprob_prior=1.0, transmat_prior=1.0, means_prior=0, means_weight=0, covars_prior=0.01, covars_weight=1, algorithm='viterbi', random_state=None, n_iter=10, tol=0.01, verbose=False, params='stmc', init_params='stmc', implementation='log')

Bases: GaussianHMM

Parameters:
n_componentsint

Number of states.

covariance_type{“spherical”, “diag”, “full”, “tied”}, optional

The type of covariance parameters to use:

  • “spherical” — each state uses a single variance value that applies to all features (default).

  • “diag” — each state uses a diagonal covariance matrix.

  • “full” — each state uses a full (i.e. unrestricted) covariance matrix.

  • “tied” — all states use the same full covariance matrix.

min_covarfloat, optional

Floor on the diagonal of the covariance matrix to prevent overfitting. Defaults to 1e-3.

startprob_priorarray, shape (n_components, ), optional

Parameters of the Dirichlet prior distribution for startprob_.

transmat_priorarray, shape (n_components, n_components), optional

Parameters of the Dirichlet prior distribution for each row of the transition probabilities transmat_.

means_prior, means_weightarray, shape (n_components, ), optional

Mean and precision of the Normal prior distribtion for means_.

covars_prior, covars_weightarray, shape (n_components, ), optional

Parameters of the prior distribution for the covariance matrix covars_.

If covariance_type is “spherical” or “diag” the prior is the inverse gamma distribution, otherwise — the inverse Wishart distribution.

algorithm{“viterbi”, “map”}, optional

Decoder algorithm.

  • “viterbi”: finds the most likely sequence of states, given all emissions.

  • “map” (also known as smoothing or forward-backward): finds the sequence of the individual most-likely states, given all emissions.

random_state: RandomState or an int seed, optional

A random number generator instance.

n_iterint, optional

Maximum number of iterations to perform.

tolfloat, optional

Convergence threshold. EM will stop if the gain in log-likelihood is below this value.

verbosebool, optional

Whether per-iteration convergence reports are printed to sys.stderr. Convergence can also be diagnosed using the monitor_ attribute.

params, init_paramsstring, optional

The parameters that get updated during (params) or initialized before (init_params) the training. Can contain any combination of ‘s’ for startprob, ‘t’ for transmat, ‘m’ for means, and ‘c’ for covars. Defaults to all parameters.

implementationstring, optional

Determines if the forward-backward algorithm is implemented with logarithms (“log”), or using scaling (“scaling”). The default is to use logarithms for backwards compatability.

Attributes:
covars_

Return covars as a full matrix.

Methods

aic(X[, lengths])

Akaike information criterion for the current model on the input X.

bic(X[, lengths])

Bayesian information criterion for the current model on the input X.

decode(X[, lengths, algorithm])

Find most likely state sequence corresponding to X.

fit(X[, lengths])

Estimate model parameters.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

get_stationary_distribution()

Compute the stationary distribution of states.

predict(X[, lengths])

Find most likely state sequence corresponding to X.

predict_proba(X[, lengths])

Compute the posterior probability for each state in the model.

sample([n_samples, random_state, currstate])

Generate random samples from the model.

score(X[, lengths])

Compute the log probability under the model.

score_samples(X[, lengths])

Compute the log probability under the model and compute posteriors.

set_fit_request(*[, lengths])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_predict_proba_request(*[, lengths])

Request metadata passed to the predict_proba method.

set_predict_request(*[, lengths])

Request metadata passed to the predict method.

set_score_request(*[, lengths])

Request metadata passed to the score method.

form_transition_matrix

form_transition_matrix(diag)
set_fit_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for lengths parameter in fit.

Returns:
selfobject

The updated object.

set_predict_proba_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for lengths parameter in predict_proba.

Returns:
selfobject

The updated object.

set_predict_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for lengths parameter in predict.

Returns:
selfobject

The updated object.

set_score_request(*, lengths: bool | None | str = '$UNCHANGED$') DiagGHMM

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
lengthsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for lengths parameter in score.

Returns:
selfobject

The updated object.

hatchet.utils.cluster_bins.form_seg(bbc, balanced_threshold)
hatchet.utils.cluster_bins.hmm_model_select(tracks, minK=20, maxK=50, tau=1e-05, tmat='diag', decode_alg='viterbi', covar='diag', state_selection='bic', restarts=10)
hatchet.utils.cluster_bins.main(args=None)
hatchet.utils.cluster_bins.make_transmat(diag, K)
hatchet.utils.cluster_bins.read_bb(bbfile, subset=None, allow_gaps=False)

Constructs arrays to represent the bin in each chromosome or arm. If bbfile was binned around chromosome arm, then uses chromosome arms. Otherwise, uses chromosomes.

Returns:
botht: list of np.ndarrays of size (n_bins, n_tracks)

where n_tracks = n_samples * 2

bb: table read from input bbfile sample_labels: order in which samples are represented in each array in botht chr_lables: order in which chromosomes or arms are represented in botht

each array contains 1 track per sample for a single chromosome arm.

hatchet.utils.cluster_bins.reindex(labels)

Given a list of labels, reindex them as integers from 1 to n_labels Also orders them in nonincreasing order of prevalence

hatchet.utils.cluster_bins_gmm module

hatchet.utils.cluster_bins_gmm.cluster(points, clouds=None, concentration_prior=None, K=100, restarts=10, seed=0)

Clusters a set of data points lying in an arbitrary number of clusters. Arguments:

data (list of lists of floats): list of data points to be clustered. clouds (list or lists of floats, same second dimension as data): bootstrapped bins for clustering sampleName (string): The name of the input sample. concentration_prior (float): Tuning parameter for clustering, must be between 0 and 1. Used to determine

concentration of points in clusters – higher favors more clusters, lower favors fewer clusters.

K (int): maximum number of clusters to infer restarts (int): number of initializations to try for GMM seed (int): random number generator seed for GMM

Returns:

mus (list of lists of floats): List of cluster means. sigmas (list of 2D lists of floats): List of cluster covariances. clusterAssignments (list of ints): The assignment of each interval to a cluster, where an entry

j at index i means the ith interval has been assigned to the jth meta-interval.

numPoints (list of ints): Number of points assigned to each cluster numClusters (int): The number of clusters.

hatchet.utils.cluster_bins_gmm.generateClouds(points, density, seed, sdeven=0.02, sdodd=0.02)
hatchet.utils.cluster_bins_gmm.getPoints(data, samples)
hatchet.utils.cluster_bins_gmm.main(args=None)
hatchet.utils.cluster_bins_gmm.minSegmentBins(sbb, nbins, rd, nsnps, cov, clusters, samples)
hatchet.utils.cluster_bins_gmm.readBB(bbfile)
hatchet.utils.cluster_bins_gmm.refineClustering(combo, assign, assignidx, samples, rdtol, baftol)
hatchet.utils.cluster_bins_gmm.reindex(labels)

Given a list of labels, reindex them as integers from 1 to n_labels Also orders them in nonincreasing order of prevalence

hatchet.utils.cluster_bins_gmm.roundAlphasBetas(baf, alpha, beta)
hatchet.utils.cluster_bins_gmm.scaleBAF(segments, samples, diploidbaf)
hatchet.utils.cluster_bins_gmm.segmentBins(bb, clusters, samples)
hatchet.utils.cluster_bins_gmm.splitBAF(baf, scale)

hatchet.utils.combine_counts module

hatchet.utils.combine_counts.EM(totals_in, alts_in, start, tol=1e-06)

Adapted from chisel/Combiner.py

hatchet.utils.combine_counts.adaptive_bins_arm(snp_thresholds, total_counts, snp_positions, snp_counts, min_snp_reads=2000, min_total_reads=5000)

Compute adaptive bins for a single chromosome arm. Parameters:

snp_thresholds: length <n> array of 1-based genomic positions of candidate bin thresholds

total_counts: <n> x <2d> np.ndarray

entry [i, 2j] contains the number of reads starting in [snp_thresholds[i], snp_thresholds[i + 1]) in sample j (only the first n-1 positions are populated) entry [i, 2j + 1] contains the number of reads covering position snp_thresholds[i] in sample j

snp_positions: length <m> list of 1-based genomic positions of SNPs

NOTE: this function requires that m = n-1 for convenience of programming (could be relaxed in a different implementation)

snp_counts: <m> x <d> np.ndarray containing the number of overlapping reads at each of the <n - 1> snp

positions in <d> samples

min_snp_reads: the minimum number of SNP-covering reads required in each bin and each sample min_total_reads: the minimum number of total reads required in each bin and each sample

hatchet.utils.combine_counts.apply_EM(totals_in, alts_in)
hatchet.utils.combine_counts.backtrack(bp)
hatchet.utils.combine_counts.binom_prop_test(alt1, ref1, flip1, alt2, ref2, flip2, alpha=0.1)

Returns True if there is sufficient evidence that SNPs 1 and 2 should not be merged, False otherwise.

hatchet.utils.combine_counts.block_segment(df, blocksize, max_snps_per_block)

Given a pandas dataframe containing read counts for a contiguous segment of SNPs, collects SNPs into phase blocks of size at most <blocksize> containing at most <max_snps_per_block> SNPs each.

hatchet.utils.combine_counts.collapse_blocks(df, blocks, singletons, orphans, ch)
hatchet.utils.combine_counts.compute_baf_task_multi(bin_snps, blocksize, max_snps_per_block, test_alpha)

Estimates the BAF for the bin containing exactly <bin_snps> SNPs. <bin_snps> is a dataframe with at least ALT and REF columns containing read counts. <blocksize>, <max_snps_per_block>, and <test_alpha> are used only for constructing phase blocks.

hatchet.utils.combine_counts.compute_baf_task_single(bin_snps, blocksize, max_snps_per_block, test_alpha)

Estimates the BAF for the bin containing exactly <bin_snps> SNPs. <bin_snps> is a dataframe with at least ALT and REF columns containing read counts. <blocksize>, <max_snps_per_block>, and <test_alpha> are used only for constructing phase blocks.

hatchet.utils.combine_counts.compute_baf_wrapper(bin_snps, blocksize, max_snps_per_block, test_alpha, multisample)
hatchet.utils.combine_counts.consecutive(data, stepsize=1)
hatchet.utils.combine_counts.correct_haplotypes(orig_bafs, min_prop_switch=0.01, n_segments=20, min_switch_density=0.1, min_mean_baf=0.48, minmax_al_imb=0.02)
hatchet.utils.combine_counts.get_chr_end(stem, chromosome)
hatchet.utils.combine_counts.main(args=None)
hatchet.utils.combine_counts.merge_data(bins, dfs, bafs, sample_names, chromosome)

Merge bins data (starts, ends, total counts, RDRs) with SNP data and BAF data for each bin. Parameters: bins: output from call to adaptive_bins_arm dfs: (only for troubleshooting) pandas DataFrame, each containing the SNP information for the corresponding bin bafs: the ith element is the output from compute_baf_task(dfs[i])

Produces a BB file with a few additional columns.

hatchet.utils.combine_counts.merge_phasing(_, all_phase_data)

Merge phasing results across all samples: if a pair of SNPs is split in any sample, they won’t be split.

hatchet.utils.combine_counts.multisample_em(alts, refs, start, tol=1e-05)
hatchet.utils.combine_counts.phase_blocks_sequential(df, blocksize=50000.0, max_snps_per_block=10, alpha=0)
hatchet.utils.combine_counts.read_snps(baf_file, ch, all_names, phasefile=None)

Read and validate SNP data for this patient (TSV table output from HATCHet deBAF.py).

hatchet.utils.combine_counts.run_chromosome(baffile, all_names, chromosome, outfile, centromere_start, centromere_end, min_snp_reads, min_total_reads, arraystem, xy, multisample, phasefile, blocksize, max_snps_per_block, test_alpha)

Perform adaptive binning and infer BAFs to produce a HATCHet BB file for a single chromosome.

hatchet.utils.combine_counts.run_chromosome_wrapper(param)
hatchet.utils.combine_counts.segmented_piecewise(X, pieces=2)

hatchet.utils.combine_counts_fw module

hatchet.utils.combine_counts_fw.blocking(L, sample, phase, blocksize)
hatchet.utils.combine_counts_fw.combine(normalbins, tumorbins, tumorbafs, diploidbaf, totalcounts, chromosomes, samples, normal, gamma, verbose=False, disable=False, phase=None, block=None)
hatchet.utils.combine_counts_fw.computeBAFs(partition, samples, diploidbaf, phase=None, block=0)
hatchet.utils.combine_counts_fw.main(args=None)
hatchet.utils.combine_counts_fw.readBAFs(tumor)
hatchet.utils.combine_counts_fw.readBINs(normalbins, tumorbins)
hatchet.utils.combine_counts_fw.readPhase(f)
hatchet.utils.combine_counts_fw.readTotalCounts(filename, samples, normal)
hatchet.utils.combine_counts_fw.splitBAF(baf, scale)

hatchet.utils.commands module

hatchet.utils.config module

class hatchet.utils.config.Config(name, filenames)

Bases: object

Methods

init_from_files

read

sections

init_from_files(filenames)
read(filename)
sections()
class hatchet.utils.config.ConfigSection(config, section_proxy)

Bases: object

A thin wrapper over a ConfigParser’s SectionProxy object, that tries to infer the types of values, and makes them available as attributes Currently int/float/str are supported.

Methods

items

items()

hatchet.utils.count_alleles module

class hatchet.utils.count_alleles.AlleleCounter(bcftools, reference, q, Q, mincov, dp, E, snplist, verbose, outdir)

Bases: Worker

Methods

countAlleles

run

work

countAlleles(bamfile, samplename, chromosome)
work(bamfile, samplename, chromosome)
hatchet.utils.count_alleles.checkShift(countA, countB, maxshift)
hatchet.utils.count_alleles.counting(bcftools, reference, samples, chromosomes, num_workers, snplist, q, Q, mincov, dp, E, verbose, outdir)
hatchet.utils.count_alleles.isHet(countA, countB, gamma)
hatchet.utils.count_alleles.main(args=None)
hatchet.utils.count_alleles.selectHetSNPs(counts, gamma, maxshift)

hatchet.utils.count_reads module

hatchet.utils.count_reads.check_array_files(darray, chrs)
hatchet.utils.count_reads.check_counts_files(dcounts, chrs, all_names)
hatchet.utils.count_reads.count_chromosome(ch, outdir, samtools, bam, sample_name, readquality, compression_level=6)
hatchet.utils.count_reads.count_chromosome_wrapper(param)
hatchet.utils.count_reads.expected_arrays(darray, chrs)
hatchet.utils.count_reads.expected_counts_files(dcounts, chrs, all_names)
hatchet.utils.count_reads.form_counts_array(starts_files, perpos_files, thresholds, chromosome, tabix, chunksize=100000.0)

NOTE: Assumes that starts_files[i] corresponds to the same sample as perpos_files[i] Parameters:

starts_files: list of <sample>.<chromosome>.starts.gz files each containing a list of start positions perpos_files: list of <sample>.per-base.bed.gz files containing per-position coverage from mosdepth thresholds: list of potential bin start positions (thresholds between SNPs) chromosome: chromosome to extract read counts for

Returns: <n> x <2d> np.ndarray

entry [i, 2j] contains the number of reads starting in (starts[i], starts[i + 1]) in sample j entry [i, 2j + 1] contains the number of reads covering position starts[i] in sample j

hatchet.utils.count_reads.get_chr_end(stem, all_names, chromosome)
hatchet.utils.count_reads.main(args=None)
hatchet.utils.count_reads.mosdepth_wrapper(params)
hatchet.utils.count_reads.read_snps(baf_file, ch, all_names)

Read and validate SNP data for this patient (TSV table output from HATCHet deBAF.py).

hatchet.utils.count_reads.run_chromosome(outdir, all_names, chromosome, centromere_start, centromere_end, baf_file, tabix)

Construct arrays that contain all counts needed to perform adaptive binning for a single chromosome (across all samples).

hatchet.utils.count_reads.run_chromosome_wrapper(param)
hatchet.utils.count_reads.run_mosdepth(outdir, sample_name, bam, threads, mosdepth, readquality)

hatchet.utils.count_reads_fw module

hatchet.utils.count_reads_fw.knownRegions(refdict, chromosomes)
hatchet.utils.count_reads_fw.logArgs(args, width)
hatchet.utils.count_reads_fw.main(args=None)

hatchet.utils.download_panel module

hatchet.utils.download_panel.dwnld_chains(dirpath)
hatchet.utils.download_panel.dwnld_refpanel_genome(path)
hatchet.utils.download_panel.main(args=None)
hatchet.utils.download_panel.mk_rename_file(path)

hatchet.utils.genotype_snps module

class hatchet.utils.genotype_snps.Caller(task_queue, result_queue, progress_bar, bcftools, reference, q, Q, mincov, dp, E, outdir, snplist, verbose)

Bases: Process

Attributes:
authkey
daemon

Return whether process is a daemon

exitcode

Return exit code of process or None if it has yet to stop

ident

Return identifier (PID) of process or None if it has yet to start

name
pid

Return identifier (PID) of process or None if it has yet to start

sentinel

Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

close()

Close the Process object.

is_alive()

Return whether process is alive

join([timeout])

Wait until child process terminates

kill()

Terminate process; sends SIGKILL signal or uses TerminateProcess()

run()

Method to be run in sub-process; can be overridden in sub-class

start()

Start child process

terminate()

Terminate process; sends SIGTERM signal or uses TerminateProcess()

callSNPs

callSNPs(bamfile, samplename, chromosome)
run()

Method to be run in sub-process; can be overridden in sub-class

hatchet.utils.genotype_snps.call(bcftools, reference, samples, chromosomes, num_workers, q, Q, mincov, dp, E, outdir, snplist=None, verbose=False)
hatchet.utils.genotype_snps.main(args=None)

hatchet.utils.multiprocessing module

class hatchet.utils.multiprocessing.TaskHandler(worker, task_queue, result_queue, progress_bar)

Bases: Process

Attributes:
authkey
daemon

Return whether process is a daemon

exitcode

Return exit code of process or None if it has yet to stop

ident

Return identifier (PID) of process or None if it has yet to start

name
pid

Return identifier (PID) of process or None if it has yet to start

sentinel

Return a file descriptor (Unix) or handle (Windows) suitable for waiting for process termination.

Methods

close()

Close the Process object.

is_alive()

Return whether process is alive

join([timeout])

Wait until child process terminates

kill()

Terminate process; sends SIGKILL signal or uses TerminateProcess()

run()

Method to be run in sub-process; can be overridden in sub-class

start()

Start child process

terminate()

Terminate process; sends SIGTERM signal or uses TerminateProcess()

run()

Method to be run in sub-process; can be overridden in sub-class

class hatchet.utils.multiprocessing.Worker

Bases: object

Methods

run

work

run(work, n_instances=None, show_progress=True)
work(*args)

hatchet.utils.phase_snps module

class hatchet.utils.phase_snps.Phaser(panel, outdir, hg19, ref, chains, rename, refvers, chrnot, verbose, bcftools, shapeit, picard, bgzip)

Bases: Worker

Methods

biallelic

change_chr

index

liftover

run

run_shapeit

stage_vcfs

work

biallelic(infile, chromosome)
change_chr(infile, chromosome, outname, rename)
index(infile, chromosome)
liftover(infile, chromosome, outname, chain, refgen, ch)
run_shapeit(infile, chromosome)
stage_vcfs(infile, chromosome)
work(*args)
hatchet.utils.phase_snps.cleanup(outdir)
hatchet.utils.phase_snps.concat(vcfs, outdir, bcftools)
hatchet.utils.phase_snps.main(args=None)
hatchet.utils.phase_snps.print_log(path, chromosomes)

hatchet.utils.plot_bins module

hatchet.utils.plot_bins.addchr(pos)
hatchet.utils.plot_bins.argmax(d)
hatchet.utils.plot_bins.argmin(d)
hatchet.utils.plot_bins.baf(bbc, args, out)
hatchet.utils.plot_bins.bb(bbc, clusters, args, out)
hatchet.utils.plot_bins.clubaf(bbc, clusters, args, out)
hatchet.utils.plot_bins.clurdr(bbc, clusters, args, out)
hatchet.utils.plot_bins.clus(seg, args, out)
hatchet.utils.plot_bins.cluster_bins(bbc, clusters, args, out, clust_order, pal)
hatchet.utils.plot_bins.coordinates(args, g=None)
hatchet.utils.plot_bins.debug(msg)
hatchet.utils.plot_bins.error(msg)
hatchet.utils.plot_bins.info(msg)
hatchet.utils.plot_bins.isfloat(value)
hatchet.utils.plot_bins.join(bbc, clusters, resolution)
hatchet.utils.plot_bins.log(msg)
hatchet.utils.plot_bins.main(args=None)
hatchet.utils.plot_bins.rdr(bbc, args, out)
hatchet.utils.plot_bins.readBBC(inp)
hatchet.utils.plot_bins.readSEG(inp)
hatchet.utils.plot_bins.select(bbc, clusters, args)
hatchet.utils.plot_bins.sortchr(x)
hatchet.utils.plot_bins.warning(msg)

hatchet.utils.plot_bins_1d2d module

hatchet.utils.plot_bins_1d2d.main(args=None)
hatchet.utils.plot_bins_1d2d.plot_1d(bbc, baf_lim=None, rdr_lim=None, display=False, outdir=None, alpha=1, show_centromeres=False)
hatchet.utils.plot_bins_1d2d.plot_2d(bbc, seg=None, show_centers=False, xlim=None, ylim=None, figsize=(4, 4), display=True, outdir=None, alpha=1)

For each sample, plot the mBAF and RDR of each bin colored by cluster. Colors will match clusters in corresponding 1D plots.

hatchet.utils.plot_bins_1d2d.plot_track(bb, chr_ends, chr2centro, yval='RD', ylabel=None, display=True, ylim=None, alpha=1, color_field=None, title=None, show_centromeres=False)

NOTE: this function assumes that 1) bb contains data for a single sample 2) chromosomes are specified using “chr” notation

hatchet.utils.plot_cn module

hatchet.utils.plot_cn.addchr(g, pos, color=None)
hatchet.utils.plot_cn.addchrplt(pos)
hatchet.utils.plot_cn.allelicprofiles(tumor, clones, props, args, out)
hatchet.utils.plot_cn.allelicproportions(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.argmax(d)
hatchet.utils.plot_cn.argmin(d)
hatchet.utils.plot_cn.cndistance(u, v)
hatchet.utils.plot_cn.cnproportions(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.debug(msg)
hatchet.utils.plot_cn.error(msg)
hatchet.utils.plot_cn.gridmixtures(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.gridprofiles(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.gridprofilesreduced(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.info(msg)
hatchet.utils.plot_cn.intergridfullprofiles(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.intergridreducedprofiles(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.intergridsamplesclusters(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.intergridsubclonality(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.interjoin(tumor, clones, resolution)
hatchet.utils.plot_cn.interreduction(proj, base)
hatchet.utils.plot_cn.isfloat(value)
hatchet.utils.plot_cn.join(tumor, clones, resolution)
hatchet.utils.plot_cn.log(msg)
hatchet.utils.plot_cn.main(args=None)
hatchet.utils.plot_cn.multiple(tumor, clones, props, base, args)
hatchet.utils.plot_cn.parsing_arguments(args=None)

Parse command line arguments Returns:

hatchet.utils.plot_cn.pp(tumor, clones, props, args)
hatchet.utils.plot_cn.profiles(tumor, clones, props, args, out)
hatchet.utils.plot_cn.readUCN(inputs, patnames)
hatchet.utils.plot_cn.reduction(proj, base)
hatchet.utils.plot_cn.segmenting(tumor, clones, props)
hatchet.utils.plot_cn.similarity(u, v)
hatchet.utils.plot_cn.similaritysample(u, v)
hatchet.utils.plot_cn.single(tumor, clones, props, base, args)
hatchet.utils.plot_cn.sortchr(x)
hatchet.utils.plot_cn.subclonal(tumor, base, clones, props, args, out)
hatchet.utils.plot_cn.warning(msg)

hatchet.utils.plot_cn_1d2d module

hatchet.utils.plot_cn_1d2d.cn2evs(cns, props)
hatchet.utils.plot_cn_1d2d.cn2total(s)
hatchet.utils.plot_cn_1d2d.cn2totals(x)
hatchet.utils.plot_cn_1d2d.compute_gamma(bbc)
hatchet.utils.plot_cn_1d2d.generate_1D2D_plots(bbc, fcn_lim=None, baf_lim=None, title=None, show_centromeres=False, by_sample=False, outdir=None, resample_balanced=False)
hatchet.utils.plot_cn_1d2d.limits_valid(lim)
hatchet.utils.plot_cn_1d2d.main(args=None)
hatchet.utils.plot_cn_1d2d.plot_clusters(bbc, mapping, figsize=(4, 4), fname=None, dpi=300, xlim=None, ylim=None, save_samples=False, save_prefix=None, coloring='original')
hatchet.utils.plot_cn_1d2d.plot_genome(big_bbc, mapping, chr_ends, chr2centro, chromosomes=None, dpi=400, figsize=(8, 5), fname=None, show_centromeres=False, fcn_ylim=None, baf_ylim=None, save_samples=False, save_prefix=None)
hatchet.utils.plot_cn_1d2d.recompose_state(l)

Read copy-number state vector from list of allele-specific values

hatchet.utils.plot_cn_1d2d.reindex(labels)

Given a list of labels, reindex them as integers from 1 to n_labels Labels are in nonincreasing order of prevalence

hatchet.utils.plot_cn_1d2d.str2cn(x)
hatchet.utils.plot_cn_1d2d.str2state(cn)

hatchet.utils.rd_gccorrect module

hatchet.utils.rd_gccorrect.rd_gccorrect(bb, ref_genome)

Function to correct GC bias in read depth data for each sample. Parameters: - bb: DataFrame containing read depth data, including columns ‘#CHR’, ‘START’, ‘END’, ‘RD’, ‘SAMPLE’ - ref_genome: File path to the reference genome in FASTA format Return type: DataFrame with corrected read depth data

hatchet.utils.run module

hatchet.utils.run.main(args=None)

Module contents