paste3.paste.pairwise_align

paste3.paste.pairwise_align(a_slice, b_slice, overlap_fraction=None, exp_dissim_matrix=None, alpha=0.1, exp_dissim_metric='kl', pi_init=None, a_spots_weight=None, b_spots_weight=None, norm=False, numItermax=200, use_gpu=True, maxIter=1000, optimizeTheta=True, eps=0.0001, do_histology=False)[source]

Returns a mapping \(( \Pi = [\pi_{ij}] )\) between spots in one slice and spots in another slice while preserving gene expression and spatial distances of mapped spots, where \(\pi_{ij}\) describes the probability that a spot i in the first slice is aligned to a spot j in the second slice.

Given slices \((X, D, g)\) and \((X', D', g')\) containing \(n\) and \(n'\) spots, respectively, over the same \(p\) genes, an expression cost function \(c\), and a parameter \((0 \leq \alpha \leq 1)\), this function finds a mapping \(( \Pi \in \Gamma(g, g') )\) that minimizes the following transport cost:

\[F(\Pi; X, D, X', D', c, \alpha) = (1 - \alpha) \sum_{i,j} c(x_i, x'_j) \pi_{ij} + \alpha \sum_{i,j,k,l} (d_{ik} - d'_{jl})^2 \pi_{ij} \pi_{kl}'. \tag{1}\]

subject to the regularity constraint that \(\pi\) has to be a probabilistic coupling between \(g\) and \(g'\):

\[\pi \in \mathcal{F}(g, g') = \left\{ \pi \in \mathbb{R}^{n \times n'} \mid \pi \geq 0, \pi 1_{n'} = g, \pi^T 1_n = g' \right\}. \tag{2}\]

Where:

  • \(X\) and \(X'\) represent the gene expression data for each slice,

  • \(D\) and \(D'\) represent the spatial distance matrices for each slice,

  • \(c\) is a cost function applied to expression differences, and

  • \(\alpha\) is a parameter that balances expression and spatial distance preservation in the mapping.

  • \(g\) and \(g'\) represent probability distribution over the spots in slice \(X\) and \(X'\), respectively

Note

When the value for \(\textit {overlap_fraction}\) is provided, this function solves the \(\textit{partial pairwise slice alignment problem}\) by minimizing the same objective function as Equation (1), but with a different set of constraints that allow for unmapped spots. Given a parameter \(s \in [0, 1]\) describing the fraction of mass to transport between \(g\) and \(g'\), we define a set \(\mathcal{P}(g, g', s)\) of \(s\)-\(\textit{partial}\) couplings between distributions \(g\) and \(g'\) as:

\[\mathcal{P}(g, g', s) = \left\{ \pi \in \mathbb{R}^{n \times n'} \mid \pi \geq 0, \pi 1_{n'} \leq g, \pi^T 1_n \leq g', 1_n^T \pi 1_{n'} = s \right\}. \tag{3}\]

Where:

  • \(s \in [0, 1]\) is the overlap percentage between the two slices to align. (The constraint \(1_n^T \pi 1_{n'} = s\) ensures that only the fraction of \(s\) probability mass is transported)

Parameters:
  • a_slice (AnnData) -- AnnData object containing data for the first slice.

  • b_slice (AnnData) -- AnnData object containing data for the second slice.

  • overlap_fraction (float, optional) -- Fraction of overlap between the two slices, must be between 0 and 1. If None, full alignment is performed.

  • exp_dissim_matrix (np.ndarray, optional) -- Precomputed expression dissimilarity matrix between two slices. If None, it will be computed.

  • alpha (float, default=0.1) -- Regularization parameter balancing transcriptional dissimilarity and spatial distance among aligned spots. Setting alpha = 0 uses only transcriptional information, while alpha = 1 uses only spatial coordinates.

  • exp_dissim_metric (str, default="kl") -- Metric used to compute the expression dissimilarity with the following options: - 'kl' for Kullback-Leibler divergence between slices, - 'euc' for Euclidean distance, - 'gkl' for generalized Kullback-Leibler divergence, - 'selection_kl' for a selection-based KL approach, - 'pca' for Principal Component Analysis, - 'glmpca' for Generalized Linear Model PCA.

  • pi_init (np.ndarray, optional) -- Initial transport plan. If None, it will be computed.

  • a_spots_weight (np.ndarray, optional) -- Weight distribution for the spots in the first slice. If None, uniform weights are used.

  • b_spots_weight (np.ndarray, optional) -- Weight distribution for the spots in the second slice. If None, uniform weights are used.

  • norm (bool, default=False) -- If True, normalizes spatial distances.

  • numItermax (int, default=200) -- Maximum number of iterations for the optimization.

  • use_gpu (bool, default=True) -- Whether to use GPU for computations. If True but no GPU is available, will default to CPU.

  • maxIter (int, default=1000) -- Maximum number of iterations for the dissimilarity calculation.

  • optimizeTheta (bool, default=True) -- Whether to optimize theta during dissimilarity calculation.

  • eps (float, default=1e-4) -- Tolerance level for convergence.

  • do_histology (bool, default=False) -- If True, incorporates RGB dissimilarity from histology data.

Returns:

  • pi : np.ndarray Optimal transport plan for aligning the two slices.r

  • info : Optional[int] Information on the optimization process (if return_obj is True), else None.

Return type:

Tuple[np.ndarray, Optional[int]]