paste3.paste.pairwise_align
- paste3.paste.pairwise_align(a_slice, b_slice, overlap_fraction=None, exp_dissim_matrix=None, alpha=0.1, exp_dissim_metric='kl', pi_init=None, a_spots_weight=None, b_spots_weight=None, norm=False, numItermax=200, use_gpu=True, maxIter=1000, optimizeTheta=True, eps=0.0001, do_histology=False)[source]
Returns a mapping \(( \Pi = [\pi_{ij}] )\) between spots in one slice and spots in another slice while preserving gene expression and spatial distances of mapped spots, where \(\pi_{ij}\) describes the probability that a spot i in the first slice is aligned to a spot j in the second slice.
Given slices \((X, D, g)\) and \((X', D', g')\) containing \(n\) and \(n'\) spots, respectively, over the same \(p\) genes, an expression cost function \(c\), and a parameter \((0 \leq \alpha \leq 1)\), this function finds a mapping \(( \Pi \in \Gamma(g, g') )\) that minimizes the following transport cost:
\[F(\Pi; X, D, X', D', c, \alpha) = (1 - \alpha) \sum_{i,j} c(x_i, x'_j) \pi_{ij} + \alpha \sum_{i,j,k,l} (d_{ik} - d'_{jl})^2 \pi_{ij} \pi_{kl}'. \tag{1}\]subject to the regularity constraint that \(\pi\) has to be a probabilistic coupling between \(g\) and \(g'\):
\[\pi \in \mathcal{F}(g, g') = \left\{ \pi \in \mathbb{R}^{n \times n'} \mid \pi \geq 0, \pi 1_{n'} = g, \pi^T 1_n = g' \right\}. \tag{2}\]Where:
\(X\) and \(X'\) represent the gene expression data for each slice,
\(D\) and \(D'\) represent the spatial distance matrices for each slice,
\(c\) is a cost function applied to expression differences, and
\(\alpha\) is a parameter that balances expression and spatial distance preservation in the mapping.
\(g\) and \(g'\) represent probability distribution over the spots in slice \(X\) and \(X'\), respectively
Note
When the value for \(\textit {overlap_fraction}\) is provided, this function solves the \(\textit{partial pairwise slice alignment problem}\) by minimizing the same objective function as Equation (1), but with a different set of constraints that allow for unmapped spots. Given a parameter \(s \in [0, 1]\) describing the fraction of mass to transport between \(g\) and \(g'\), we define a set \(\mathcal{P}(g, g', s)\) of \(s\)-\(\textit{partial}\) couplings between distributions \(g\) and \(g'\) as:
\[\mathcal{P}(g, g', s) = \left\{ \pi \in \mathbb{R}^{n \times n'} \mid \pi \geq 0, \pi 1_{n'} \leq g, \pi^T 1_n \leq g', 1_n^T \pi 1_{n'} = s \right\}. \tag{3}\]Where:
\(s \in [0, 1]\) is the overlap percentage between the two slices to align. (The constraint \(1_n^T \pi 1_{n'} = s\) ensures that only the fraction of \(s\) probability mass is transported)
- Parameters:
a_slice (AnnData) -- AnnData object containing data for the first slice.
b_slice (AnnData) -- AnnData object containing data for the second slice.
overlap_fraction (float, optional) -- Fraction of overlap between the two slices, must be between 0 and 1. If None, full alignment is performed.
exp_dissim_matrix (np.ndarray, optional) -- Precomputed expression dissimilarity matrix between two slices. If None, it will be computed.
alpha (float, default=0.1) -- Regularization parameter balancing transcriptional dissimilarity and spatial distance among aligned spots. Setting alpha = 0 uses only transcriptional information, while alpha = 1 uses only spatial coordinates.
exp_dissim_metric (str, default="kl") -- Metric used to compute the expression dissimilarity with the following options: - 'kl' for Kullback-Leibler divergence between slices, - 'euc' for Euclidean distance, - 'gkl' for generalized Kullback-Leibler divergence, - 'selection_kl' for a selection-based KL approach, - 'pca' for Principal Component Analysis, - 'glmpca' for Generalized Linear Model PCA.
pi_init (np.ndarray, optional) -- Initial transport plan. If None, it will be computed.
a_spots_weight (np.ndarray, optional) -- Weight distribution for the spots in the first slice. If None, uniform weights are used.
b_spots_weight (np.ndarray, optional) -- Weight distribution for the spots in the second slice. If None, uniform weights are used.
norm (bool, default=False) -- If True, normalizes spatial distances.
numItermax (int, default=200) -- Maximum number of iterations for the optimization.
use_gpu (bool, default=True) -- Whether to use GPU for computations. If True but no GPU is available, will default to CPU.
maxIter (int, default=1000) -- Maximum number of iterations for the dissimilarity calculation.
optimizeTheta (bool, default=True) -- Whether to optimize theta during dissimilarity calculation.
eps (float, default=1e-4) -- Tolerance level for convergence.
do_histology (bool, default=False) -- If True, incorporates RGB dissimilarity from histology data.
- Returns:
pi : np.ndarray Optimal transport plan for aligning the two slices.r
info : Optional[int] Information on the optimization process (if return_obj is True), else None.
- Return type:
Tuple[np.ndarray, Optional[int]]