paste3.paste.pairwise_align

paste3.paste.pairwise_align(a_slice, b_slice, overlap_fraction=None, exp_dissim_matrix=None, alpha=0.1, exp_dissim_metric='kl', pi_init=None, a_spots_weight=None, b_spots_weight=None, norm=False, numItermax=200, use_gpu=True, maxIter=1000, optimizeTheta=True, eps=0.0001, do_histology=False)[source]

Returns a mapping (Π=[πij]) between spots in one slice and spots in another slice while preserving gene expression and spatial distances of mapped spots, where πij describes the probability that a spot i in the first slice is aligned to a spot j in the second slice.

Given slices (X,D,g) and (X,D,g) containing n and n spots, respectively, over the same p genes, an expression cost function c, and a parameter (0α1), this function finds a mapping (ΠΓ(g,g)) that minimizes the following transport cost:

(1)F(Π;X,D,X,D,c,α)=(1α)i,jc(xi,xj)πij+αi,j,k,l(dikdjl)2πijπkl.

subject to the regularity constraint that π has to be a probabilistic coupling between g and g:

(2)πF(g,g)={πRn×nπ0,π1n=g,πT1n=g}.

Where:

  • X and X represent the gene expression data for each slice,

  • D and D represent the spatial distance matrices for each slice,

  • c is a cost function applied to expression differences, and

  • α is a parameter that balances expression and spatial distance preservation in the mapping.

  • g and g represent probability distribution over the spots in slice X and X, respectively

Note

When the value for overlap_fraction is provided, this function solves the partial pairwise slice alignment problem by minimizing the same objective function as Equation (1), but with a different set of constraints that allow for unmapped spots. Given a parameter s[0,1] describing the fraction of mass to transport between g and g, we define a set P(g,g,s) of s-partial couplings between distributions g and g as:

(3)P(g,g,s)={πRn×nπ0,π1ng,πT1ng,1nTπ1n=s}.

Where:

  • s[0,1] is the overlap percentage between the two slices to align. (The constraint 1nTπ1n=s ensures that only the fraction of s probability mass is transported)

Parameters:
  • a_slice (AnnData) -- AnnData object containing data for the first slice.

  • b_slice (AnnData) -- AnnData object containing data for the second slice.

  • overlap_fraction (float, optional) -- Fraction of overlap between the two slices, must be between 0 and 1. If None, full alignment is performed.

  • exp_dissim_matrix (np.ndarray, optional) -- Precomputed expression dissimilarity matrix between two slices. If None, it will be computed.

  • alpha (float, default=0.1) -- Regularization parameter balancing transcriptional dissimilarity and spatial distance among aligned spots. Setting alpha = 0 uses only transcriptional information, while alpha = 1 uses only spatial coordinates.

  • exp_dissim_metric (str, default="kl") -- Metric used to compute the expression dissimilarity with the following options: - 'kl' for Kullback-Leibler divergence between slices, - 'euc' for Euclidean distance, - 'gkl' for generalized Kullback-Leibler divergence, - 'selection_kl' for a selection-based KL approach, - 'pca' for Principal Component Analysis, - 'glmpca' for Generalized Linear Model PCA.

  • pi_init (np.ndarray, optional) -- Initial transport plan. If None, it will be computed.

  • a_spots_weight (np.ndarray, optional) -- Weight distribution for the spots in the first slice. If None, uniform weights are used.

  • b_spots_weight (np.ndarray, optional) -- Weight distribution for the spots in the second slice. If None, uniform weights are used.

  • norm (bool, default=False) -- If True, normalizes spatial distances.

  • numItermax (int, default=200) -- Maximum number of iterations for the optimization.

  • use_gpu (bool, default=True) -- Whether to use GPU for computations. If True but no GPU is available, will default to CPU.

  • maxIter (int, default=1000) -- Maximum number of iterations for the dissimilarity calculation.

  • optimizeTheta (bool, default=True) -- Whether to optimize theta during dissimilarity calculation.

  • eps (float, default=1e-4) -- Tolerance level for convergence.

  • do_histology (bool, default=False) -- If True, incorporates RGB dissimilarity from histology data.

Returns:

  • pi : np.ndarray Optimal transport plan for aligning the two slices.

  • info : Optional[int] Information on the optimization process (if return_obj is True), else None.

Return type:

Tuple[np.ndarray, Optional[int]]