[1]:
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
import scanpy as sc
import seaborn as sns
import squidpy as sq
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/dask/dataframe/__init__.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option `dataframe.query-planning` to `True` or None to enable the new Dask Dataframe implementation and silence this warning.
  warnings.warn(
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/numba/core/decorators.py:246: RuntimeWarning: nopython is set for njit and is ignored
  warnings.warn('nopython is set for njit and is ignored', RuntimeWarning)

Install PASTE2 python package

You can install the paste2 package at https://pypi.org/project/paste2/. We import paste2 as follows:

[2]:
from paste3 import paste, visualization

Read in Spatial Transcriptomics slices as AnnData objects

We provide four example ST slices from DLPFC patient 3, cropped to form partially overlapping subslices (See Figure 3A of our paper). Each slice is stored in an AnnData object.

[3]:
sliceA_filename = "../../../tests/data/input/151673.h5ad"
sliceB_filename = "../../../tests/data/input/151674.h5ad"
sliceC_filename = "../../../tests/data/input/151675.h5ad"
sliceD_filename = "../../../tests/data/input/151676.h5ad"
sliceA = sc.read_h5ad(sliceA_filename)
sliceB = sc.read_h5ad(sliceB_filename)
sliceC = sc.read_h5ad(sliceC_filename)
sliceD = sc.read_h5ad(sliceD_filename)

Each AnnData object consists of a gene expression matrx and spatial coordinate matrix. The gene expression matrix is stored in the .X field. The spatial coordiante matrix is stored in the .obsm['spatial'] field.

[4]:
sliceA.X
[4]:
<Compressed Sparse Row sparse matrix of dtype 'float32'
        with 5798687 stored elements and shape (2929, 11381)>
[5]:
sliceA.obsm["spatial"]
[5]:
array([[ 5769,  2807],
       [ 4068,  9505],
       [ 3393,  7583],
       ...,
       [ 4631,  7831],
       [ 5571, 11193],
       [ 6317,  3291]])

The rows of the AnnData objects are spots. The columns are genes.

[6]:
sliceA.obs
[6]:
in_tissue array_row array_col imagerow imagecol sum_umi sum_gene subject position replicate ... layer_guess_reordered_short n_genes_by_counts log1p_n_genes_by_counts total_counts log1p_total_counts pct_counts_in_top_50_genes pct_counts_in_top_100_genes pct_counts_in_top_200_genes pct_counts_in_top_500_genes n_counts
AAACAATCTACTAGCA-1.3 1 3 43 126.327637 259.630972 1667 1150 Br8100 0 1 ... L1 1150 7.048386 1667.0 7.419381 22.975405 30.173965 42.171566 61.007798 1628.0
AAACACCAATAACTGC-1.8 1 59 19 427.767792 183.078314 3769 1960 Br8100 0 1 ... WM 1960 7.581210 3769.0 8.234830 25.975060 33.138764 42.531175 59.697533 3704.0
AAACAGCTTTCAGAAG-1.7 1 43 9 341.269139 152.700275 4278 2264 Br8100 0 1 ... L5 2264 7.725330 4278.0 8.361475 23.796166 30.201029 38.826554 54.698457 4217.0
AAACAGGGTCTATATT-1.8 1 47 13 362.916304 164.941500 4004 2178 Br8100 0 1 ... L6 2178 7.686621 4004.0 8.295299 24.600400 30.669331 39.185814 55.344655 3922.0
AAACAGTGTTCCTGGG-1.5 1 73 43 503.780395 256.930702 2376 1432 Br8100 0 1 ... WM 1432 7.267525 2376.0 7.773594 26.136364 33.501684 43.308081 60.774411 2339.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TTGTTTCACATCCAGG-1.8 1 58 42 422.862301 254.410450 4324 2170 Br8100 0 1 ... WM 2170 7.682943 4324.0 8.372168 22.941721 30.388529 40.703053 57.770583 4266.0
TTGTTTCATTAGTCTA-1.8 1 60 30 433.393354 217.146722 2761 1560 Br8100 0 1 ... WM 1560 7.353082 2761.0 7.923710 26.439696 34.190511 43.824701 61.608113 2715.0
TTGTTTCCATACAACT-1.8 1 45 27 352.430255 208.415849 2322 1343 Br8100 0 1 ... L6 1343 7.203406 2322.0 7.750615 31.093885 37.941430 47.243755 63.695090 2300.0
TTGTTTGTATTACACG-1.4 1 73 41 503.735391 250.720081 2331 1420 Br8100 0 1 ... WM 1420 7.259116 2331.0 7.754481 25.139425 32.947233 42.728443 60.531961 2286.0
TTGTTTGTGTAAATTC-1.8 1 7 51 148.109816 284.293439 6281 2927 Br8100 0 1 ... L2 2927 7.982075 6281.0 8.745443 24.231810 30.154434 38.242318 53.017036 6184.0

2929 rows × 24 columns

[7]:
sliceA.var
[7]:
gene_ids feature_types genome n_cells_by_counts mean_counts log1p_mean_counts pct_dropout_by_counts total_counts log1p_total_counts n_counts
NOC2L ENSG00000188976 Gene Expression GRCh38 690 0.223963 0.202094 81.038747 815.0 6.704414 815.0
KLHL17 ENSG00000187961 Gene Expression GRCh38 163 0.045342 0.044344 95.520747 165.0 5.111988 165.0
HES4 ENSG00000188290 Gene Expression GRCh38 875 0.303105 0.264750 75.954933 1103.0 7.006695 1103.0
ISG15 ENSG00000187608 Gene Expression GRCh38 968 0.318494 0.276490 73.399286 1159.0 7.056175 1159.0
AGRN ENSG00000188157 Gene Expression GRCh38 1038 0.351470 0.301193 71.475680 1279.0 7.154615 1279.0
... ... ... ... ... ... ... ... ... ... ...
MT-ND4 ENSG00000198886 Gene Expression GRCh38 3639 85.596596 4.461260 0.000000 311486.0 12.649113 311486.0
MT-ND5 ENSG00000198786 Gene Expression GRCh38 3593 9.799670 2.379516 1.264084 35661.0 10.481841 35661.0
MT-ND6 ENSG00000198695 Gene Expression GRCh38 1966 0.860126 0.620644 45.974169 3130.0 8.049108 3130.0
MT-CYB ENSG00000198727 Gene Expression GRCh38 3639 49.170376 3.915425 0.000000 178931.0 12.094761 178931.0
AC007325.4 ENSG00000278817 Gene Expression GRCh38 577 0.183567 0.168533 84.143996 668.0 6.505784 668.0

11381 rows × 10 columns

We can visualize the slices using squidpy. In this case, the .obs["layer_guess_reordered"] field stores the layer annotation of each slice, so we use this field to color each spot.

[8]:
sq.pl.spatial_scatter(
    sliceA, frameon=False, shape=None, color="layer_guess_reordered", figsize=(10, 10)
)
sq.pl.spatial_scatter(
    sliceB, frameon=False, shape=None, color="layer_guess_reordered", figsize=(10, 10)
)
sq.pl.spatial_scatter(
    sliceC, frameon=False, shape=None, color="layer_guess_reordered", figsize=(10, 10)
)
sq.pl.spatial_scatter(
    sliceD, frameon=False, shape=None, color="layer_guess_reordered", figsize=(10, 10)
)
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/squidpy/pl/_spatial_utils.py:976: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/squidpy/pl/_spatial_utils.py:976: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/squidpy/pl/_spatial_utils.py:976: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/squidpy/pl/_spatial_utils.py:976: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
../_images/notebooks_paste2_tutorial_12_2.png
../_images/notebooks_paste2_tutorial_12_3.png
../_images/notebooks_paste2_tutorial_12_4.png
../_images/notebooks_paste2_tutorial_12_5.png

Compute partial pairwise alignment using PASTE2

Give a pair of partially overlapping slices, we can use PASTE2.partial_pairwise_align( ) to find an alignment matrix. To call the function, you need to input the AnnData objects of the two slices, as well as a parameter s, which indicates the overlap percentage of the two slices. In this tutorial, each pair of cropped subslices overlap at 70% of the areas, so we set overlap_fraction=0.7. For your own datasets you should visualize the slices and manually determine the approxiamte overlap percentage (this parameter does not have to be very accurate).

Now we compute an alignment matrix between each pair of slices in our example dataset.

In the calls to ``paste3.pairwise_align`` below, we’re using maxIter=20 here to specify a maximum of 20 iterations for pairwise_align. This is only to allow this demo to run in a resonable amount of time. In a real alignment scenario, you should not include this argument.

[9]:
pi_AB, _ = paste.pairwise_align(sliceA, sliceB, overlap_fraction=0.7, maxIter=20)
(INFO) (paste.py) (16-Nov-24 17:52:10) GPU is not available, resorting to torch CPU.
[10]:
pi_BC, _ = paste.pairwise_align(sliceB, sliceC, overlap_fraction=0.7, maxIter=20)
(INFO) (paste.py) (16-Nov-24 17:54:02) GPU is not available, resorting to torch CPU.
[11]:
pi_CD, _ = paste.pairwise_align(sliceC, sliceD, overlap_fraction=0.7, maxIter=20)
(INFO) (paste.py) (16-Nov-24 17:55:49) GPU is not available, resorting to torch CPU.

Let’s check the shape of each alignment matrix. For aligning a slice with n1 spots and a slice with n2 spots, the alignment matrix should be of shape (n1 * n2)

[12]:
print(pi_AB.shape)
print(pi_BC.shape)
print(pi_CD.shape)
torch.Size([2929, 2877])
torch.Size([2877, 2873])
torch.Size([2873, 2701])

There are other optional parameters to PASTE2.partial_pairwise_align() as well. You can checkout the original function signature in the souce code with documentation.

Let’s visualize the alignment between sliceA and sliceB:

[13]:
def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)


def plot2D_samples_mat(xs, xt, G, alpha=0.2, top=1000, weight_alpha=False, **kwargs):
    if ("color" not in kwargs) and ("c" not in kwargs):
        kwargs["color"] = "k"
    mx = G.max().item()
    #     idx = np.where(G/mx>=thr)
    idx = largest_indices(G.cpu().numpy(), top)
    for i in range(len(idx[0])):
        plt.plot(
            [xs[idx[0][i], 0], xt[idx[1][i], 0]],
            [xs[idx[0][i], 1], xt[idx[1][i], 1]],
            alpha=alpha * (1 - weight_alpha)
            + (weight_alpha * G[idx[0][i], idx[1][i]].item() / mx),
            c="k",
        )


def plot_slice_pairwise_alignment(
    slice1, slice2, pi, alpha=0.05, top=1000, weight_alpha=False
):
    coordinates1, coordinates2 = slice1.obsm["spatial"], slice2.obsm["spatial"]
    offset = (coordinates1[:, 0].max() - coordinates2[:, 0].min()) * 1.1
    temp = np.zeros(coordinates2.shape)
    temp[:, 0] = offset
    plt.figure(figsize=(20, 10))
    plot2D_samples_mat(
        coordinates1,
        coordinates2 + temp,
        pi,
        c="k",
        alpha=alpha,
        top=top,
        weight_alpha=weight_alpha,
    )
    plt.scatter(
        coordinates1[:, 0],
        coordinates1[:, 1],
        linewidth=0,
        s=100,
        marker=".",
        color=list(
            slice1.obs["layer_guess_reordered"].map(
                dict(
                    zip(
                        slice1.obs["layer_guess_reordered"].cat.categories,
                        slice1.uns["layer_guess_reordered_colors"],
                        strict=False,
                    )
                )
            )
        ),
    )
    plt.scatter(
        coordinates2[:, 0] + offset,
        coordinates2[:, 1],
        linewidth=0,
        s=100,
        marker=".",
        color=list(
            slice2.obs["layer_guess_reordered"].map(
                dict(
                    zip(
                        slice2.obs["layer_guess_reordered"].cat.categories,
                        slice2.uns["layer_guess_reordered_colors"],
                        strict=False,
                    )
                )
            )
        ),
    )
    plt.gca().invert_yaxis()
    plt.axis("off")
    plt.show()


plot_slice_pairwise_alignment(sliceA, sliceB, pi_AB)
../_images/notebooks_paste2_tutorial_20_0.png

Project all slices onto the same coordiante system according to the alignment

Once the alignment matrix between each pair of adjacent slices in a sequence of consecutive slices are computed, we can use this information to project all slices onto the same 2D coordinate system. 3D reconstruction can be done by assiging a z-coordiante to each slice after the projection.

Specifically, we use visualization.partial_stack_slices_pairwise( ):

[14]:
pis = [pi_AB, pi_BC, pi_CD]
slices = [sliceA, sliceB, sliceC, sliceD]

new_slices, _, _ = visualization.stack_slices_pairwise(slices, pis, is_partial=True)

Now let’s plot the coordinates of all slices after the projection:

[15]:
layer_to_color_map = {f"Layer{i + 1}": sns.color_palette()[i] for i in range(6)}
layer_to_color_map["WM"] = sns.color_palette()[6]


def plot_slices_overlap(slices, layer_to_color_map=layer_to_color_map):
    plt.figure(figsize=(10, 10))
    for i in range(len(slices)):
        adata = slices[i]
        colors = list(
            adata.obs["layer_guess_reordered"].astype("str").map(layer_to_color_map)
        )
        plt.scatter(
            adata.obsm["spatial"][:, 0],
            adata.obsm["spatial"][:, 1],
            linewidth=0,
            s=100,
            marker=".",
            color=colors,
        )
    plt.legend(
        handles=[
            mpatches.Patch(
                color=layer_to_color_map[
                    adata.obs["layer_guess_reordered"].cat.categories[i]
                ],
                label=adata.obs["layer_guess_reordered"].cat.categories[i],
            )
            for i in range(len(adata.obs["layer_guess_reordered"].cat.categories))
        ],
        fontsize=10,
        title="Cortex layer",
        title_fontsize=15,
        bbox_to_anchor=(1, 1),
    )
    plt.gca().invert_yaxis()
    plt.axis("off")
    plt.show()


plot_slices_overlap(new_slices)
../_images/notebooks_paste2_tutorial_24_0.png

Or just the first two, which reproduces Figure 3C of the paper:

[16]:
plot_slices_overlap(new_slices[:2])
../_images/notebooks_paste2_tutorial_26_0.png