[1]:

# This cell is to allow automatic notebook generation for docs
# You may want to comment this out if you have paste3 installed
import sys
from pathlib import Path

sys.path.insert(0, str(Path.cwd().parent.parent.parent / "src"))

Using the PASTE2 algorithm

This noteook highlights the creation of slices (Anndata objects), usage of the pairwise_align and center_align functions of paste3, along with stacking and plotting functionalities.

This notebook primarily highlights how you would use the ``paste3`` package in ``PASTE2`` (i.e. partial alignment) mode, when adjacent slices do not fully overlap in space or have different cell type compositions.

[2]:

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
import scanpy as sc
import seaborn as sns

from paste3 import paste, visualization

Read in Spatial Transcriptomics slices as AnnData objects

We provide four example ST slices from DLPFC patient 3, cropped to form partially overlapping subslices (See Figure 3A of our paper). Each slice is stored in an AnnData object.

[3]:

sliceA_filename = "../../../tests/data/151673.h5ad"
sliceB_filename = "../../../tests/data/151674.h5ad"
sliceC_filename = "../../../tests/data/151675.h5ad"
sliceD_filename = "../../../tests/data/151676.h5ad"
sliceA = sc.read_h5ad(sliceA_filename)
sliceB = sc.read_h5ad(sliceB_filename)
sliceC = sc.read_h5ad(sliceC_filename)
sliceD = sc.read_h5ad(sliceD_filename)

Each AnnData object consists of a gene expression matrx and spatial coordinate matrix. The gene expression matrix is stored in the .X field. The spatial coordiante matrix is stored in the .obsm['spatial'] field.

[4]:

sliceA.X

[4]:

<Compressed Sparse Row sparse matrix of dtype 'float32'
        with 5798687 stored elements and shape (2929, 11381)>

[5]:

sliceA.obsm["spatial"]

[5]:

array([[ 5769,  2807],
       [ 4068,  9505],
       [ 3393,  7583],
       ...,
       [ 4631,  7831],
       [ 5571, 11193],
       [ 6317,  3291]])

The rows of the AnnData objects are spots. The columns are genes.

[6]:

sliceA.obs

[6]:

	in_tissue	array_row	array_col	imagerow	imagecol	sum_umi	sum_gene	subject	position	replicate	...	layer_guess_reordered_short	n_genes_by_counts	log1p_n_genes_by_counts	total_counts	log1p_total_counts	pct_counts_in_top_50_genes	pct_counts_in_top_100_genes	pct_counts_in_top_200_genes	pct_counts_in_top_500_genes	n_counts
AAACAATCTACTAGCA-1.3	1	3	43	126.327637	259.630972	1667	1150	Br8100	0	1	...	L1	1150	7.048386	1667.0	7.419381	22.975405	30.173965	42.171566	61.007798	1628.0
AAACACCAATAACTGC-1.8	1	59	19	427.767792	183.078314	3769	1960	Br8100	0	1	...	WM	1960	7.581210	3769.0	8.234830	25.975060	33.138764	42.531175	59.697533	3704.0
AAACAGCTTTCAGAAG-1.7	1	43	9	341.269139	152.700275	4278	2264	Br8100	0	1	...	L5	2264	7.725330	4278.0	8.361475	23.796166	30.201029	38.826554	54.698457	4217.0
AAACAGGGTCTATATT-1.8	1	47	13	362.916304	164.941500	4004	2178	Br8100	0	1	...	L6	2178	7.686621	4004.0	8.295299	24.600400	30.669331	39.185814	55.344655	3922.0
AAACAGTGTTCCTGGG-1.5	1	73	43	503.780395	256.930702	2376	1432	Br8100	0	1	...	WM	1432	7.267525	2376.0	7.773594	26.136364	33.501684	43.308081	60.774411	2339.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
TTGTTTCACATCCAGG-1.8	1	58	42	422.862301	254.410450	4324	2170	Br8100	0	1	...	WM	2170	7.682943	4324.0	8.372168	22.941721	30.388529	40.703053	57.770583	4266.0
TTGTTTCATTAGTCTA-1.8	1	60	30	433.393354	217.146722	2761	1560	Br8100	0	1	...	WM	1560	7.353082	2761.0	7.923710	26.439696	34.190511	43.824701	61.608113	2715.0
TTGTTTCCATACAACT-1.8	1	45	27	352.430255	208.415849	2322	1343	Br8100	0	1	...	L6	1343	7.203406	2322.0	7.750615	31.093885	37.941430	47.243755	63.695090	2300.0
TTGTTTGTATTACACG-1.4	1	73	41	503.735391	250.720081	2331	1420	Br8100	0	1	...	WM	1420	7.259116	2331.0	7.754481	25.139425	32.947233	42.728443	60.531961	2286.0
TTGTTTGTGTAAATTC-1.8	1	7	51	148.109816	284.293439	6281	2927	Br8100	0	1	...	L2	2927	7.982075	6281.0	8.745443	24.231810	30.154434	38.242318	53.017036	6184.0

2929 rows × 24 columns

[7]:

sliceA.var

[7]:

	gene_ids	feature_types	genome	n_cells_by_counts	mean_counts	log1p_mean_counts	pct_dropout_by_counts	total_counts	log1p_total_counts	n_counts
NOC2L	ENSG00000188976	Gene Expression	GRCh38	690	0.223963	0.202094	81.038747	815.0	6.704414	815.0
KLHL17	ENSG00000187961	Gene Expression	GRCh38	163	0.045342	0.044344	95.520747	165.0	5.111988	165.0
HES4	ENSG00000188290	Gene Expression	GRCh38	875	0.303105	0.264750	75.954933	1103.0	7.006695	1103.0
ISG15	ENSG00000187608	Gene Expression	GRCh38	968	0.318494	0.276490	73.399286	1159.0	7.056175	1159.0
AGRN	ENSG00000188157	Gene Expression	GRCh38	1038	0.351470	0.301193	71.475680	1279.0	7.154615	1279.0
...	...	...	...	...	...	...	...	...	...	...
MT-ND4	ENSG00000198886	Gene Expression	GRCh38	3639	85.596596	4.461260	0.000000	311486.0	12.649113	311486.0
MT-ND5	ENSG00000198786	Gene Expression	GRCh38	3593	9.799670	2.379516	1.264084	35661.0	10.481841	35661.0
MT-ND6	ENSG00000198695	Gene Expression	GRCh38	1966	0.860126	0.620644	45.974169	3130.0	8.049108	3130.0
MT-CYB	ENSG00000198727	Gene Expression	GRCh38	3639	49.170376	3.915425	0.000000	178931.0	12.094761	178931.0
AC007325.4	ENSG00000278817	Gene Expression	GRCh38	577	0.183567	0.168533	84.143996	668.0	6.505784	668.0

11381 rows × 10 columns

We can visualize the slices using scanpy. In this case, the .obs["layer_guess_reordered"] field stores the layer annotation of each slice, so we use this field to color each spot.

[8]:

sc.pl.spatial(sliceA, color="layer_guess_reordered", spot_size=125, frameon=False)
sc.pl.spatial(sliceB, color="layer_guess_reordered", spot_size=125, frameon=False)
sc.pl.spatial(sliceC, color="layer_guess_reordered", spot_size=125, frameon=False)
sc.pl.spatial(sliceD, color="layer_guess_reordered", spot_size=125, frameon=False)

../_images/notebooks_paste2_tutorial_12_0.png

../_images/notebooks_paste2_tutorial_12_1.png

../_images/notebooks_paste2_tutorial_12_2.png

../_images/notebooks_paste2_tutorial_12_3.png

Compute partial pairwise alignment using PASTE2

Give a pair of partially overlapping slices, we can use partial_pairwise_align() to find an alignment matrix. To call the function, you need to input the AnnData objects of the two slices, as well as a parameter s, which indicates the overlap percentage of the two slices. In this tutorial, each pair of cropped subslices overlap at 70% of the areas, so we set overlap_fraction=0.7. For your own datasets you should visualize the slices and manually determine the approximate overlap percentage (this parameter does not have to be very accurate).

Now we compute an alignment matrix between each pair of slices in our example dataset.

In the calls to ``pairwise_align`` below, we’re using maxIter=20 here to specify a maximum of 20 iterations for pairwise_align. This is only to allow this demo to run in a resonable amount of time. In a real alignment scenario, you should not include this argument.

[9]:

pi_AB, _ = paste.pairwise_align(sliceA, sliceB, overlap_fraction=0.7, maxIter=20)

(INFO) (paste.py) (10-Jan-25 17:08:25) GPU is not available, resorting to torch CPU.

[10]:

pi_BC, _ = paste.pairwise_align(sliceB, sliceC, overlap_fraction=0.7, maxIter=20)

(INFO) (paste.py) (10-Jan-25 17:10:17) GPU is not available, resorting to torch CPU.

[11]:

pi_CD, _ = paste.pairwise_align(sliceC, sliceD, overlap_fraction=0.7, maxIter=20)

(INFO) (paste.py) (10-Jan-25 17:12:04) GPU is not available, resorting to torch CPU.

Let’s check the shape of each alignment matrix. For aligning a slice with n1 spots and a slice with n2 spots, the alignment matrix should be of shape (n1 * n2)

[12]:

print(pi_AB.shape)
print(pi_BC.shape)
print(pi_CD.shape)

torch.Size([2929, 2877])
torch.Size([2877, 2873])
torch.Size([2873, 2701])

There are other optional parameters to PASTE2.partial_pairwise_align() as well. You can checkout the original function signature in the souce code with documentation.

Let’s visualize the alignment between sliceA and sliceB:

[13]:

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)


def plot2D_samples_mat(xs, xt, G, alpha=0.2, top=1000, weight_alpha=False, **kwargs):
    if ("color" not in kwargs) and ("c" not in kwargs):
        kwargs["color"] = "k"
    mx = G.max().item()
    #     idx = np.where(G/mx>=thr)
    idx = largest_indices(G.cpu().numpy(), top)
    for i in range(len(idx[0])):
        plt.plot(
            [xs[idx[0][i], 0], xt[idx[1][i], 0]],
            [xs[idx[0][i], 1], xt[idx[1][i], 1]],
            alpha=alpha * (1 - weight_alpha)
            + (weight_alpha * G[idx[0][i], idx[1][i]].item() / mx),
            c="k",
        )


def plot_slice_pairwise_alignment(
    slice1, slice2, pi, alpha=0.05, top=1000, weight_alpha=False
):
    coordinates1, coordinates2 = slice1.obsm["spatial"], slice2.obsm["spatial"]
    offset = (coordinates1[:, 0].max() - coordinates2[:, 0].min()) * 1.1
    temp = np.zeros(coordinates2.shape)
    temp[:, 0] = offset
    plt.figure(figsize=(20, 10))
    plot2D_samples_mat(
        coordinates1,
        coordinates2 + temp,
        pi,
        c="k",
        alpha=alpha,
        top=top,
        weight_alpha=weight_alpha,
    )
    plt.scatter(
        coordinates1[:, 0],
        coordinates1[:, 1],
        linewidth=0,
        s=100,
        marker=".",
        color=list(
            slice1.obs["layer_guess_reordered"].map(
                dict(
                    zip(
                        slice1.obs["layer_guess_reordered"].cat.categories,
                        slice1.uns["layer_guess_reordered_colors"],
                        strict=False,
                    )
                )
            )
        ),
    )
    plt.scatter(
        coordinates2[:, 0] + offset,
        coordinates2[:, 1],
        linewidth=0,
        s=100,
        marker=".",
        color=list(
            slice2.obs["layer_guess_reordered"].map(
                dict(
                    zip(
                        slice2.obs["layer_guess_reordered"].cat.categories,
                        slice2.uns["layer_guess_reordered_colors"],
                        strict=False,
                    )
                )
            )
        ),
    )
    plt.gca().invert_yaxis()
    plt.axis("off")
    plt.show()


plot_slice_pairwise_alignment(sliceA, sliceB, pi_AB)

../_images/notebooks_paste2_tutorial_20_0.png

Project all slices onto the same coordiante system according to the alignment

Once the alignment matrix between each pair of adjacent slices in a sequence of consecutive slices are computed, we can use this information to project all slices onto the same 2D coordinate system. 3D reconstruction can be done by assiging a z-coordiante to each slice after the projection.

Specifically, we use visualization.partial_stack_slices_pairwise( ):

[14]:

pis = [pi_AB, pi_BC, pi_CD]
slices = [sliceA, sliceB, sliceC, sliceD]

new_slices, _, _ = visualization.stack_slices_pairwise(slices, pis, is_partial=True)

Now let’s plot the coordinates of all slices after the projection:

[15]:

layer_to_color_map = {f"Layer{i + 1}": sns.color_palette()[i] for i in range(6)}
layer_to_color_map["WM"] = sns.color_palette()[6]


def plot_slices_overlap(slices, layer_to_color_map=layer_to_color_map):
    plt.figure(figsize=(10, 10))
    for i in range(len(slices)):
        adata = slices[i]
        colors = list(
            adata.obs["layer_guess_reordered"].astype("str").map(layer_to_color_map)
        )
        plt.scatter(
            adata.obsm["spatial"][:, 0],
            adata.obsm["spatial"][:, 1],
            linewidth=0,
            s=100,
            marker=".",
            color=colors,
        )
    plt.legend(
        handles=[
            mpatches.Patch(
                color=layer_to_color_map[
                    adata.obs["layer_guess_reordered"].cat.categories[i]
                ],
                label=adata.obs["layer_guess_reordered"].cat.categories[i],
            )
            for i in range(len(adata.obs["layer_guess_reordered"].cat.categories))
        ],
        fontsize=10,
        title="Cortex layer",
        title_fontsize=15,
        bbox_to_anchor=(1, 1),
    )
    plt.gca().invert_yaxis()
    plt.axis("off")
    plt.show()


plot_slices_overlap(new_slices)

../_images/notebooks_paste2_tutorial_24_0.png

Or just the first two, which reproduces Figure 3C of the paper:

[16]:

plot_slices_overlap(new_slices[:2])

../_images/notebooks_paste2_tutorial_26_0.png