Exercise 7

Differential analysis with multi-sample

In this exercise, we extend our analysis to multi-sample, multi-condition spatial transcriptomics data. With multiple samples per condition (e.g., 2 colorectal carcinoma (CRC) and 2 normal adjacent tissues), we can ask not only:

“Is a gene spatially variable within a tissue?”

but also:

“Does its spatial pattern change between conditions?”

Genes showing such changes are called differential spatial patterns (DSP) genes. Using the DESpace package, we will learn how to identify such genes, that may be spatially variable in one condition but not in another, or it could be spatially variable in both conditions, but with different spatial patterns.

Learning Objectives

By the end of this exercise, you will be able to:

Understand the concept of DSP
Perform global DSP tests across multiple samples
Perform individual-cluster tests to find cluster-specific DSP genes
Interpret the DSP patterns

Libraries

library(HDF5Array) 
library(SingleCellExperiment)
library(SpatialExperiment)
library(ggspavis)
library(patchwork)
library(DESpace)

Load preprocessed data

We will work with the same VisiumHD dataset from the human colon cancer study recently published. Here we use a subsetted dataset containing 4 slides from three patients (P1CRC, P5CRC, P3NAT, and P5NAT), representing either colorectal carcinoma (CRC) or normal adjacent tissues (NAT), all with pre-computed clusters.

spe <- loadHDF5SummarizedExperiment(dir="data/Human_Colon_Cancer_P1/", prefix="02.3_spe_tmp")

# Remove mitochondrial genes
gn <- rowData(spe)$Symbol
mt <- grepl("^MT-", gn, ignore.case = TRUE)
table(mt)

mt
FALSE  TRUE 
18034    11

spe <- spe[!mt, ]

# Extract sample labels
sample_id <- colData(spe)[["sample_id"]]

# Create a sample indicator matrix
# Each column represents a sample; entries are 1 if the spot belongs to that sample
sample_mat <- model.matrix(~ sample_id - 1)
 
# Total counts of each gene in each sample
gene_counts <- counts(spe)
counts_per_sample <- gene_counts %*% sample_mat

# Number of non-zero spots per gene per sample
nonzero_per_sample <- (gene_counts > 0) %*% sample_mat 

# Keep only genes that pass the filter in all samples
sel_matrix <- t(counts_per_sample >= 50 & nonzero_per_sample >= 20)
sel <- colMeans(sel_matrix) == 1
spe <- spe[sel, ]

# Normalization
spe <- scuttle::logNormCounts(spe)
spe

class: SpatialExperiment 
dim: 12114 56120 
metadata(0):
assays(2): counts logcounts
rownames(12114): SAMD11 NOC2L ... TMLHE VAMP7
rowData names(3): ID Symbol Type
colnames(56120): s_016um_00145_00029-1.Normal_P5
  s_016um_00165_00109-1.Normal_P5 ... s_016um_00122_00096-1.Normal_P3
  s_016um_00127_00062-1.Normal_P3
colData names(6): row col ... sample_id sizeFactor
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
imgData names(4): sample_id image_id data scaleFactor

colData(spe) includes:

sample_id: sample ID
condition: CRC vs. Normal tissue
cluster: spatial domain labels

head(colData(spe), 3)

DataFrame with 3 rows and 6 columns
                                      row       col  cluster condition
                                <integer> <integer> <factor>  <factor>
s_016um_00145_00029-1.Normal_P5       145        29       3     Normal
s_016um_00165_00109-1.Normal_P5       165       109       9     Normal
s_016um_00188_00060-1.Normal_P5       188        60       10    Normal
                                sample_id sizeFactor
                                 <factor>  <numeric>
s_016um_00145_00029-1.Normal_P5 Normal_P5   2.413912
s_016um_00165_00109-1.Normal_P5 Normal_P5   3.020864
s_016um_00188_00060-1.Normal_P5 Normal_P5   0.742408

table(spe$condition)


Cancer Normal 
 29064  27056

Visualize clusters for each sample.

samples <- sort(unique(spe$sample_id))
lapply(samples, \(.)
       plotCoords(spe[, spe$sample_id == .],
                  annotate="cluster",
                  in_tissue=NULL,
                  point_size = 0.1) +
         ggtitle(.)) |>
  wrap_plots(nrow = 2) &
  scale_color_manual(values=unname(pals::trubetskoy())) &
  theme_void() &
  theme(legend.position = "none")

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

DESpace

Global test

To answer the question: does the expression of a gene change between conditions, but change differently for some domains compared to others?

DESpace::dsp_test() first aggregates spot- or cell-level counts into pseudobulk counts for each (sample, domain) combination, then tests whether the interaction term (condition x domain) in the model is different from zero.

Take few minutes to run. While it is running, take a look at the package vignette.

res_edgeR <- dsp_test(spe,
                      cluster_col = "cluster",
                      sample_col = "sample_id",
                      condition_col = "condition",
                      verbose = TRUE)

Using 'dsp_test' for spatial variable pattern genes detection.

Filter low quality clusters:

Cluster levels to keep: 0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 16, 17

Design model: row names represent sample names, followed by underscores and cluster names.

            (Intercept) conditionNormal cluster_id10 cluster_id11 cluster_id13
Cancer_P1_0           1               0            0            0            0
Cancer_P5_0           1               0            0            0            0
            cluster_id14 cluster_id16 cluster_id17 cluster_id2 cluster_id3
Cancer_P1_0            0            0            0           0           0
Cancer_P5_0            0            0            0           0           0
            cluster_id4 cluster_id5 cluster_id6 cluster_id8 cluster_id9
Cancer_P1_0           0           0           0           0           0
Cancer_P5_0           0           0           0           0           0
            conditionNormal:cluster_id10 conditionNormal:cluster_id11
Cancer_P1_0                            0                            0
Cancer_P5_0                            0                            0
            conditionNormal:cluster_id13 conditionNormal:cluster_id14
Cancer_P1_0                            0                            0
Cancer_P5_0                            0                            0
            conditionNormal:cluster_id16 conditionNormal:cluster_id17
Cancer_P1_0                            0                            0
Cancer_P5_0                            0                            0
            conditionNormal:cluster_id2 conditionNormal:cluster_id3
Cancer_P1_0                           0                           0
Cancer_P5_0                           0                           0
            conditionNormal:cluster_id4 conditionNormal:cluster_id5
Cancer_P1_0                           0                           0
Cancer_P5_0                           0                           0
            conditionNormal:cluster_id6 conditionNormal:cluster_id8
Cancer_P1_0                           0                           0
Cancer_P5_0                           0                           0
            conditionNormal:cluster_id9
Cancer_P1_0                           0
Cancer_P5_0                           0

Exercise 1

Check out the results:

Extract gene-level results.
Count how many DSP genes are significant at 5% FDR.
Take the top 3 DSP genes and visualize their spatial expression across all samples using ggspavis::plotCoords().
How would you describe the spatial expression patterns of these genes? To help with interpretation, try looking up information about these genes (e.g., UniProt or the Human Protein Atlas) to get clues about which cell types or tissue domains they may highlight.

Answer

# Extract gene-level results
dsp_global <- res_edgeR$gene_results
head(dsp_global, 3)

        gene_id logFC.conditionNormal.cluster_id10
SPIB       SPIB                          3.3689574
SLITRK6 SLITRK6                          3.6484273
UBE2C     UBE2C                          0.2558715
        logFC.conditionNormal.cluster_id11 logFC.conditionNormal.cluster_id13
SPIB                             -6.016658                         -0.3735427
SLITRK6                          -5.108105                          0.5387260
UBE2C                             2.261431                         -2.4736932
        logFC.conditionNormal.cluster_id14 logFC.conditionNormal.cluster_id16
SPIB                              6.123880                         -5.8597462
SLITRK6                          -6.574244                          0.5313051
UBE2C                             9.600026                         -4.9771886
        logFC.conditionNormal.cluster_id17 logFC.conditionNormal.cluster_id2
SPIB                             0.4521503                         -1.268746
SLITRK6                          7.0643114                          3.071577
UBE2C                            4.0880357                          1.087541
        logFC.conditionNormal.cluster_id3 logFC.conditionNormal.cluster_id4
SPIB                             1.132356                          1.998761
SLITRK6                          2.661016                          1.201677
UBE2C                            3.072440                          3.234711
        logFC.conditionNormal.cluster_id5 logFC.conditionNormal.cluster_id6
SPIB                             5.175949                        -0.5615506
SLITRK6                          3.260155                         2.2817234
UBE2C                            1.428419                         4.5269257
        logFC.conditionNormal.cluster_id8 logFC.conditionNormal.cluster_id9
SPIB                            3.1408068                          2.290420
SLITRK6                         5.4390033                         -1.110038
UBE2C                           0.2980652                          4.314139
          logCPM        F       PValue         FDR
SPIB    5.873793 6.298689 4.064371e-07 0.004154844
SLITRK6 5.185589 5.909145 8.247322e-07 0.004154844
UBE2C   4.884226 5.953292 1.028936e-06 0.004154844

# Count significant DSP genes (at 5% FDR significance level)
table(dsp_global$FDR <= 0.05)


FALSE  TRUE 
12102    12

# Top 3 DSP genes
gs <- rownames(dsp_global)[seq_len(3)]
gs

[1] "SPIB"    "SLITRK6" "UBE2C"

plots <- lapply(samples, \(s) {
  # Subset to sample s
  spe_j <- spe[, spe$sample_id == s]
  
  # Plot cluster 
  p_cluster <- plotCoords(
    spe_j,
    annotate = "cluster",
    in_tissue = NULL,
    point_size = 0.05,
    legend_position = "none") +
    ggtitle(s) +
    scale_color_manual(values = unname(pals::trubetskoy()))
  
  # Plots for each DSP gene
  p_genes <- lapply(gs, \(g) {
    plotCoords(
      spe_j,
      annotate = g,
      point_size = 0.05,
      in_tissue = NULL,
      assay_name = "logcounts",
      feature_names = "Symbol"
    ) +
      scale_color_gradientn(colors = pals::brewer.reds(n=12))
  })
  
  # combine the cluster plot and 3 gene plots
  c(p_cluster, p_genes)
})

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

plots_flat <- unlist(plots, recursive = FALSE)
wrap_plots(plots_flat, ncol = 4)

Individual domain test

The global test tells us which genes show different spatial patterns globally, but it does not tell us which clusters drive the differences.

To identify the key spatial domain where expression changes across conditions, we use DESpace::individual_dsp(), which performs domain-specific DSP tests.

# Focus on one cluster to decrease runtime in this exercise
spe$c8 <- factor(ifelse(spe$cluster == "8", 1, 0))
dsp_clu <- individual_dsp(spe,
                      cluster_col = "c8",
                      sample_col = "sample_id",
                      condition_col = "condition",
                      filter_gene = FALSE,
                      test = "LRT")

Conducting tests for layer '0' against all other layers.

Design model: row names represent sample names, followed by underscores and cluster names.

                (Intercept) conditionNormal cluster_id0
Cancer_P1_Other           1               0           0
Cancer_P5_Other           1               0           0
                conditionNormal:cluster_id0
Cancer_P1_Other                           0
Cancer_P5_Other                           0

Conducting tests for layer '1' against all other layers.

Design model: row names represent sample names, followed by underscores and cluster names.

                (Intercept) conditionNormal cluster_id1
Cancer_P1_Other           1               0           0
Cancer_P5_Other           1               0           0
                conditionNormal:cluster_id1
Cancer_P1_Other                           0
Cancer_P5_Other                           0

Returning results

Exercise 2

Extract the 4th top DSP gene.
Visualize its expression across samples via ggspavis::plotCoords() or DESpace::FeaturePlot() and interpret the result.

Answer

# Select cluster id (because after binarization cluster 8 = "1")
id_clu <- "1"

# DSP results for cluster 8
head(dsp_clu[[id_clu]], 5)

          gene_id      logFC   logCPM       LR       PValue          FDR
PTGER3     PTGER3 -11.169531 3.342487 29.29157 6.226599e-08 0.0007542903
OGN           OGN  -5.401620 5.400259 27.78484 1.355842e-07 0.0008212335
SIGLEC10 SIGLEC10   2.580773 5.693757 25.35995 4.756912e-07 0.0019208410
COL12A1   COL12A1  -2.520161 7.053918 24.58019 7.128088e-07 0.0021587414
PPP1R1B   PPP1R1B  -3.449097 7.789589 23.62477 1.170709e-06 0.0028363945

# dsp_clu[["1"]] stores DSP genes for cluster 8  (after binarization)
# dsp_clu[["0"]] stores DSP genes for all other regions (not useful here)
# If full clusters are used originally, dsp_clu[["8"]] would store cluster 8 results

# Select the 4th top DSP genes
(gs <- rownames(dsp_clu[[id_clu]])[4])

[1] "COL12A1"

# Rotate slices
colData(spe) <- colData(spe) |> cbind(spatialCoords(spe))
spe$row <- spe$pxl_col_in_fullres
spe$col <- -spe$pxl_row_in_fullres

# Visualization
plots <- lapply(samples, \(.) {
  # Subset sample
  spe_j <- spe[, colData(spe)$sample_id == .]
  # FeaturePlot for cluster 8
  plot <- FeaturePlot(spe_j, 
                      feature = gs, 
                      coordinates = c("pxl_row_in_fullres", "pxl_col_in_fullres"),
                      cluster_col = "c8", # binary cluster column
                      cluster = "1",      # plot on cluster 8
                      platform = "VisiumHD",
                      sf_dim = 400,
                      diverging = TRUE,
                      point_size = 0.05,
                      linewidth = 0.6) +
    theme(legend.position = "right") +
    labs(color = "") + ggtitle(.) 
  
  return(plot)
})

Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.
Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.
Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.
Coordinate system already present.
ℹ Adding new coordinate system, which will replace the existing one.

wrap_plots(plots, ncol = 2) + plot_layout(guides = 'collect') + ggtitle(gs)

In CRC samples, COL12A1 shows higher expression in cluster 8 and covers a larger region. In normal samples, its expression is weaker and more limited.

Cluster 8 might correspond to a stromal region that becomes expanded or remodeled in CRC. individual_dsp() helps identify which spatial domain shows condition-specific changes in gene expression.

Bonus question

Try running the individual domain test using the default setting test = "QLF. How do the results change? (Hint: check the number of significant genes.) Why might this happen?

Clear your environment:

rm(list = ls())
gc()
.rs.restartR()

Important

Key Takeaways:

DSP genes capture changes in where a gene is expressed across conditions, not only how much.
dsp_test() identifies global DSP genes across all clusters and samples.
individual_dsp() pinpoints which clusters (spatial domains) drive these differences.