paste3.paste.center_align

paste3.paste.center_align(initial_slice, slices, slice_weights=None, alpha=0.1, n_components=15, threshold=0.001, max_iter=10, exp_dissim_metric='kl', norm=False, random_seed=None, pi_inits=None, spots_weights=None, use_gpu=True, fast=False, pbar=None)[source]

Infers a "center" slice consisting of a low rank expression matrix \(X = WH\) and a collection of \(\pi\) of mappings from the spots of the center slice to the spots of each input slice.

Given slices \((X^{(1)}, D^{(1)}, g^{(1)}), \dots, (X^{(t)}, D^{(t)}, g^{(t)})\) containing \(n_1, \dots, n_t\) spots, respectively over the same \(p\) genes, a spot distance matrix \(D \in \mathbb{R}^{n \times n}_{+}\), a distribution \(g\) over \(n\) spots, an expression cost function \(c\), a distribution \(\lambda \in \mathbb{R}^t_{+}\) and parameters \(0 \leq \alpha \leq 1\), \(m \in \mathbb{N}\), find an expression matrix \(X = WH\) where \(W \in \mathbb{R}^{p \times m}_{+}\) and \(H \in \mathbb{R}^{m \times n}_{+}\), and mappings \(\Pi^{(q)} \in \Gamma(g, g^{(q)})\) for each slice \(q = 1, \dots, t\) that minimize the following objective:

\[ \begin{align}\begin{aligned}R(W, H, \Pi^{(1)}, \dots, \Pi^{(t)}) = \sum_q \lambda_q F(\Pi^{(q)}; WH, D, X^{(q)}, D^{(q)}, c, \alpha)\\= \sum_q \lambda_q \left[(1 - \alpha) \sum_{i,j} c(WH_{\cdot,i}, x^{(q)}_j) \pi^{(q)}_{ij} + \alpha \sum_{i,j,k,l} (d_{ik} - d^{(q)}_{jl})^2 \pi^{(q)}_{ij} \pi^{(q)}_{kl} \right].\end{aligned}\end{align} \]

Where:

  • \(X^{q} = [x_{ij}] \in \mathbb{N}^{p \times n_t}\) is a \(p\) genes by \(n_t\) spots transcript count matrix for \(q^{th}\) slice,

  • \(D^{(q)}\), where \(d_ij = \parallel z_.i - z_.j \parallel\) is the spatial distance between spot \(i\) and \(j\), represents the spot pairwise distance matrix for \(q^{th}\) slice,

  • \(c: \mathbb{R}^{p}_{+} \times \mathbb{R}^{p}_{+} \to \mathbb{R}_{+}\), is a function that measures a nonnegative cost between the expression profiles of two spots over all genes

  • \(\alpha\) is a parameter balancing expression and spatial distance preservation,

  • \(W\) and \(H\) form the low-rank approximation of the center slice's expression matrix, and

  • \(\lambda_q\) weighs each slice \(q\) in the objective.

Parameters:
  • initial_slice (AnnData) -- An AnnData object that represent a slice to be used as a reference data for alignment

  • slices (List[AnnData]) -- A list of AnnData objects that represent different slices to be aligned with the initial slice.

  • slice_weights (List[float], optional) -- Weights for each slice in the alignment process. If None, all slices are treated equally.

  • alpha (float, default=0.1) -- Regularization parameter balancing transcriptional dissimilarity and spatial distance among aligned spots. Setting alpha = 0 uses only transcriptional information, while alpha = 1 uses only spatial coordinates.

  • n_components (int, default=15) -- Number of components to use for the NMF.

  • threshold (float, default=0.001) -- Convergence threshold for the optimization process. The process stops when the change in loss is below this threshold.

  • max_iter (int, default=10) -- Maximum number of iterations for the optimization process.

  • exp_dissim_metric (str, default="kl") -- The metric used to compute dissimilarity. Options include "euclidean" or "kl" for Kullback-Leibler divergence.

  • norm (bool, default=False) -- If True, normalizes spatial distances.

  • random_seed (Optional[int], default=None) -- Random seed for reproducibility.

  • pi_inits (Optional[List[np.ndarray]], default=None) -- Initial transport plans for each slice. If None, it will be computed.

  • spots_weights (List[float], optional) -- Weights for individual spots in each slices. If None, uniform distribution is used.

  • use_gpu (bool, default=True) -- Whether to use GPU for computations. If True but no GPU is available, will default to CPU.

  • fast (bool, default=False) -- Whether to use the fast (untested) torch nmf library

  • pbar (Any, default=None) -- Progress bar (tqdm or derived) for tracking the optimization process. Something that has an update method.

Return type:

tuple[AnnData, list[ndarray]]

Returns:

  • Tuple[AnnData, List[np.ndarray]] -- A tuple containing: - center_slice : AnnData

    The aligned AnnData object representing the center slice after optimization.

    • pisList[np.ndarray]

      List of optimal transport distributions for each slice after alignment.

  • Returns --

    • Inferred center slice with full and low dimensional representations (feature_matrix, coeff_matrix) of the gene expression matrix.

    • List of pairwise alignment mappings of the center slice (rows) to each input slice (columns).