Trajectory analysis

Material

Lecture Alex Russel Lederer:

Lecture Tania Wyss and Rachel Marcone:

Exercises

Load the following packages:

Trajectory analysis using Slingshot

This part uses the Deng dataset

First, download the dataset from github within your Terminal tab as on Day 1:

Type the following commands within the Terminal tab:

cd course_data/
wget https://github.com/hemberg-lab/nrg-paper-figures/raw/master/deng-reads.rds

Then, within R, import the rds file. the ‘Deng’ dataset is an object of class SingleCellExperiment.

deng_SCE <- readRDS("course_data/deng-reads.rds")

Perform the first steps of the analysis. The deng_SCE object contains cells that were isolated at different stages of mouse embryogenesis, from the zygote stage to the late blastula.

The levels of the cell type are in alphabetical order. We now change the level order for plotting in developmental order:

deng_SCE$cell_type2 <- factor(deng_SCE$cell_type2,
                              levels = c("zy",
                                         "early2cell",
                                         "mid2cell",
                                         "late2cell",
                                         "4cell",
                                         "8cell",
                                         "16cell",
                                         "earlyblast",
                                         "midblast",
                                         "lateblast"))

We can run a PCA directly on the object of class SingleCellExperiment with the function runPCA:

deng_SCE <- scater::runPCA(deng_SCE, ncomponents = 50)

Use the reducedDim function to access the PCA and store the results.

pca <- SingleCellExperiment::reducedDim(deng_SCE, "PCA")

Describe how the PCA is stored in a matrix. Why does it have this structure?

head(pca)
              PC1       PC2       PC3         PC4        PC5         PC6
16cell   24.79868 -62.20826  8.035201 -2.07391816  2.1297390 -14.0930954
16cell.1 28.77121 -50.35974 13.607012  0.08664449  0.9454185  -3.5987880
16cell.2 26.67901 -61.03875  7.239352 -6.60967794 -1.0775002 -11.8876579
16cell.3 29.01151 -56.03620  6.433057  2.85332708 -4.2885083   0.1488504
16cell.4 26.38026 -58.09265  4.671850  7.99519397  9.8077416  -2.0570042
16cell.5 24.90566 -60.77897  5.632497 -3.80156587  9.8835527 -11.9028394
                PC7        PC8       PC9      PC10      PC11        PC12
16cell   -2.4645020 -1.6350660  7.202260  9.862212 10.660702  -0.6401721
16cell.1 -2.1726663  3.3481641  8.967394  6.664942 14.493227 -11.7471565
16cell.2  7.9007309 -0.3368756  6.032645  5.295515 15.384993  -4.2930696
16cell.3  4.3727592  1.1582470  1.520145 -8.789699 19.386866   0.4999047
16cell.4  0.6031572  3.6743278  5.793753 10.823787  7.613724  -4.7288640
16cell.5  4.3269009 -3.8968881 11.805221  9.798854 11.016137 -19.1535086
               PC13      PC14       PC15       PC16       PC17      PC18
16cell     5.716841  6.544614  -6.652210  -3.458346  -4.499013 11.360753
16cell.1 -13.284708 -4.206404  -8.721043  -7.926277  -0.703508  5.418131
16cell.2  -9.633173  1.672498  -9.609001  -9.302794 -10.219743  5.763834
16cell.3 -14.177687 -8.509097  -6.978210  10.771078  -6.188808 -6.504081
16cell.4  -3.106382 -4.078414 -10.739979 -12.032452  -6.239499 -2.331292
16cell.5  -9.544362 -2.255400  -8.614958  -2.832196  -1.798584 -2.321082
                PC19      PC20       PC21       PC22      PC23       PC24
16cell     2.2617345  2.456274 -11.227414  1.7122827  8.418641  -4.254968
16cell.1 -11.8613891 -4.069530  -9.320831  0.5802347 11.878096   6.412425
16cell.2  -3.3460356 -4.165813  -2.031473 -2.1106373  1.762218   1.135134
16cell.3  -0.6042649 -6.008176  -9.982856  9.4888653 -2.822138 -12.871921
16cell.4   3.9402029  0.298227 -10.773722 -0.6374236 -4.730329  -4.670391
16cell.5  -2.0280791 -5.050525   3.252243 -7.1527175  9.923140   1.791511
              PC25      PC26       PC27      PC28        PC29       PC30
16cell   -4.049629  4.133374 -0.6235391  3.381254 13.94917609  -8.217824
16cell.1 -8.052083  8.334263 -0.5815629 -4.592214  1.32417854   5.266909
16cell.2 -2.326133  3.775858 -2.3388745  6.947394  0.08121559  -2.942813
16cell.3 -5.860750  1.869659  7.0402429 -5.092207 -2.53575943 -18.529304
16cell.4 -4.291113 13.005331  3.2802102 -4.606226 -3.52531994  -3.599833
16cell.5  4.708265  5.717693  1.1023767 -9.761377 -4.57312078 -12.138646
              PC31      PC32       PC33      PC34       PC35       PC36
16cell    6.897320  5.675943 -8.6076039 -3.713348 -0.9099737  4.7467546
16cell.1  4.538307 -9.166969  9.4525575 -8.848231 -2.0782319  7.4318993
16cell.2 -3.082470  2.207176 -0.5365986 -3.895378  7.4493361  0.7465149
16cell.3 -1.680117  3.839556 13.3156066 -6.257479 -4.1112596  0.2780589
16cell.4 13.314741  1.453554 -0.1334034  2.941487 -0.8162660 -2.9940693
16cell.5  4.608498 12.180530 -5.8667454  6.645273  1.0224859  0.8960299
              PC37       PC38       PC39       PC40       PC41       PC42
16cell   -9.063470  5.2765051 -1.1758453  -9.474215  -3.559391  4.7781174
16cell.1 -6.217009 -1.0216459 -0.5798035 -21.705585   3.570104 -2.3279923
16cell.2 -6.227582 -3.0863112 -8.6153521   1.401230  -2.266017 -0.8150665
16cell.3 -8.411600 -3.7169411  0.7050601  -2.959623   3.123082 -1.0916370
16cell.4  2.871774  4.2664023  7.4894594   8.207422  -4.223035  1.4763577
16cell.5 10.169730 -0.3923632  9.3346900  -8.114487 -11.186021  4.5635674
                PC43      PC44      PC45      PC46        PC47      PC48
16cell    7.92280920  8.558202 -7.058962 -3.058209  -0.5723868  4.674968
16cell.1 -5.60067538 -8.717056  6.480960 -8.554813 -13.1868734  3.397925
16cell.2 -5.25328813  5.803788 -2.726822 -1.241769   7.4824411 -4.088434
16cell.3  0.05135525  2.181424 -2.404780 -8.691230   8.9700020 -3.713560
16cell.4 -1.55019721 -4.946841 -0.520753  3.068227  10.7801144  5.167676
16cell.5  9.98211744  8.759947  3.727758  9.064882  -1.7524459 -3.306604
              PC49      PC50
16cell   -2.936203  3.598746
16cell.1 -3.420297 -3.489052
16cell.2  4.446149 -0.235609
16cell.3 -5.179723  9.719219
16cell.4 -1.077690  3.049430
16cell.5  5.019110  0.925972

Add PCA data to the deng_SCE object.

deng_SCE$PC1 <- pca[, 1]
deng_SCE$PC2 <- pca[, 2]

Plot PC biplot with cells colored by cell_type2. colData(deng_SCE) accesses the cell metadata DataFrame object for deng_SCE. Look at Figure 1A of the paper as a comparison to your PC biplot.

ggplot(as.data.frame(colData(deng_SCE)), aes(x = PC1, y = PC2, color = cell_type2)) +
  geom_point(size=2, shape=20) +
  theme_classic() +
  xlab("PC1") + ylab("PC2") + ggtitle("PC biplot")

PCA is a simple approach and can be good to compare to more complex algorithms designed to capture differentiation processes. As a simple measure of pseudotime we can use the coordinates of PC1. Plot PC1 vs cell_type2.

deng_SCE$pseudotime_PC1 <- rank(deng_SCE$PC1)  # rank cells by their PC1 score

Create a jitter plot

ggplot(as.data.frame(colData(deng_SCE)), aes(x = pseudotime_PC1, y = cell_type2,
                                             colour = cell_type2)) +
  ggbeeswarm::geom_quasirandom(groupOnX = FALSE) +
  theme_classic() +
  xlab("PC1") + ylab("Timepoint") +
  ggtitle("Cells ordered by first principal component")
Orientation inferred to be along y-axis; override with
`position_quasirandom(orientation = 'x')`

Read the Slingshot documentation (?slingshot::slingshot) and then run Slingshot below.

sce <- slingshot::slingshot(deng_SCE, reducedDim = 'PCA')
No cluster labels provided. Continuing with one cluster.
Exercise

Given your understanding of the algorithm and the documentation, what is one major set of parameters we omitted here when running Slingshot?

We didn’t set the parameter clusterLabels

Here is a custom function to plot the PCA based on a slingshot object. Run it in the console to add it to your global environment:

PCAplot_slingshot <- function(sce, draw_lines = TRUE, variable = NULL, legend = FALSE, ...){
  # set palette for factorial variables
  palf <- colorRampPalette(RColorBrewer::brewer.pal(8, "Set2"))
  # set palette for numeric variables
  paln <- colorRampPalette(RColorBrewer::brewer.pal(9, "Blues"))
  # extract pca from SingleCellExperiment object
  pca <- SingleCellExperiment::reducedDims(sce)$PCA
  
  if(is.null(variable)){
    col <- "black"
  }
  if(is.character(variable)){
    variable <- as.factor(variable)
  }
  if(is.factor(variable)){
    colpal <- palf(length(levels(variable)))
    colors <- colpal[variable]
  }
  if(is.numeric(variable)){
    colpal <- paln(50)
    colors <- colpal[cut(variable,breaks=50)]
  }
  
  # draw the plot
  plot(pca, bg = colors, pch = 21)
  # draw lines
  if(draw_lines){
    lines(slingshot::SlingshotDataSet(sce), lwd = 2, ... )
  }
  # add legend
  if(legend & is.factor(variable)){
    legend("bottomright", pt.bg = colpal,legend = levels(variable),pch=21)
    
  }
}

Have a look at the PCA with the slingshot pseudotime line:

PCAplot_slingshot(sce, variable = sce$slingPseudotime_1, draw_lines = TRUE)

Also have a look at pseudotime versus cell type:

ggplot(as.data.frame(colData(deng_SCE)), aes(x = sce$slingPseudotime_1,
                                             y = cell_type2,
                                             colour = cell_type2)) +
  ggbeeswarm::geom_quasirandom(groupOnX = FALSE) +
  theme_classic() +
  xlab("Slingshot pseudotime") + ylab("Timepoint") +
  ggtitle("Cells ordered by Slingshot pseudotime")
Orientation inferred to be along y-axis; override with
`position_quasirandom(orientation = 'x')`

This already looks pretty good. Let’s see whether we can improve it. First we generate clusters by using Seurat:

gcdata <- Seurat::CreateSeuratObject(counts = SingleCellExperiment::counts(deng_SCE),
                                     project = "slingshot")

gcdata <- Seurat::NormalizeData(object = gcdata,
                                normalization.method = "LogNormalize",
                                scale.factor = 10000)

gcdata <- Seurat::FindVariableFeatures(object = gcdata,
                                       mean.function = ExpMean,
                                       dispersion.function = LogVMR)

gcdata <- Seurat::ScaleData(object = gcdata,
                            do.center = T,
                            do.scale = F)

gcdata <- Seurat::RunPCA(object = gcdata,
                         pc.genes = gcdata@var.genes)

gcdata <- Seurat::FindNeighbors(gcdata,
                                reduction = "pca",
                                dims = 1:5)

# clustering with resolution of 0.6
gcdata <- Seurat::FindClusters(object = gcdata,
                               resolution = 0.6)

Now we can add these clusters to the slingshot function:

deng_SCE$Seurat_clusters <- as.character(Idents(gcdata))  # go from factor to character

sce <- slingshot::slingshot(deng_SCE,
                            clusterLabels = 'Seurat_clusters',
                            reducedDim = 'PCA',
                            start.clus = "2")

Check how the slingshot object has evolved

class: SlingshotDataSet 

 Samples Dimensions
     268         50

lineages: 2 
Lineage1: 2  4  0  5  3  
Lineage2: 2  4  1  

curves: 2 
Curve1: Length: 425.94  Samples: 234.63
Curve2: Length: 341 Samples: 132.38

Plot PC1 versus PC2 colored by slingshot pseudotime:

PCAplot_slingshot(sce, variable = sce$slingPseudotime_2)

Plot Slingshot pseudotime vs cell stage.

ggplot(data.frame(cell_type2 = deng_SCE$cell_type2,
                  slingPseudotime_1 = sce$slingPseudotime_1),
       aes(x = slingPseudotime_1, y = cell_type2,
           colour = cell_type2)) +
  ggbeeswarm::geom_quasirandom(groupOnX = FALSE) +
  theme_classic() +
  xlab("Slingshot pseudotime") + ylab("Timepoint") +
  ggtitle("Cells ordered by Slingshot pseudotime")
Orientation inferred to be along y-axis; override with
`position_quasirandom(orientation = 'x')`
Warning: Removed 29 rows containing missing values (`position_quasirandom()`).

ggplot(data.frame(cell_type2 = deng_SCE$cell_type2,
                  slingPseudotime_2 = sce$slingPseudotime_2),
       aes(x = slingPseudotime_2, y = cell_type2,
           colour = cell_type2)) +
  ggbeeswarm::geom_quasirandom(groupOnX = FALSE) +
  theme_classic() +
  xlab("Slingshot pseudotime") + ylab("Timepoint") +
  ggtitle("Cells ordered by Slingshot pseudotime")
Orientation inferred to be along y-axis; override with
`position_quasirandom(orientation = 'x')`
Warning: Removed 134 rows containing missing values (`position_quasirandom()`).

Particularly the later stages, separation seems to improve. Since we have included the Seurat clustering, we can plot the PCA, with colors according to these clusters:

PCAplot_slingshot(sce,
                  variable = deng_SCE$Seurat_clusters,
                  type = 'lineages',
                  col = 'black',
                  legend = TRUE)

PCAplot_slingshot(sce,
                  variable = deng_SCE$cell_type2,
                  type = 'lineages',
                  col = 'black',
                  legend = TRUE)

Exercise

Instead of providing an initial cluster, think of an end cluster that would fit this trajectory analysis and perform the slingshot analysis. Does slingshot find the initial cluster corresponding to the biological correct situation?

sce <- slingshot::slingshot(deng_SCE,
                            clusterLabels = 'Seurat_clusters',
                            reducedDim = 'PCA',
                            end.clus = c("0", "3", "5")) ## check which would be the best according to bio

Clear your environment:

rm(list = ls())
gc()
.rs.restartR()

Trajectory analysis with monocle3

This part showcases how you can use monocle3 to perform a trajectory analysis. First load the seu dataset:

seu <- readRDS("seu_day2-4.rds")

Load the required package into your environment:

library(monocle3)

Generate a monocle3 object (with class cell_data_set) from our Seurat object:

# create gene metadata data.frame
feature_names <- as.data.frame(rownames(seu))
rownames(feature_names) <- rownames(seu)
colnames(feature_names) <- "gene_short_name"

# initiate monocle object from seurat count table 
seu_monocl <- monocle3::new_cell_data_set(Seurat::GetAssayData(seu,
                                                               layer = "counts"),
                                          cell_metadata = seu@meta.data,
                                          gene_metadata = feature_names)

We pre-process the newly created object. What does it involve? Check:

?preprocess_cds

Preprocess the dataset:

seu_monocl <- monocle3::preprocess_cds(seu_monocl)

And check out the elbow plot:

monocle3::plot_pc_variance_explained(seu_monocl)

Perform UMAP using the implementation in the monocle3 package and its default parameters:

seu_monocl <- monocle3::reduce_dimension(seu_monocl, reduction_method = "UMAP")

Plot the monocle3 UMAP coloring cells according to the cluster ID, marker gene or annotation that were stored in the Seurat object:

monocle3::plot_cells(seu_monocl, 
                     color_cells_by = "RNA_snn_res.0.3", 
                     cell_size = 1, 
                     show_trajectory_graph = FALSE)

monocle3::plot_cells(seu_monocl, genes = "CD79A", 
                     cell_size = 1,
                     show_trajectory_graph = FALSE,
                     scale_to_range = FALSE)

monocle3::plot_cells(seu_monocl, 
                     color_cells_by = "SingleR_annot", 
                     cell_size = 1, 
                     show_trajectory_graph = FALSE)

Cluster cells using monocle3’s clustering function:

seu_monocl <- monocle3::cluster_cells(seu_monocl, resolution=0.00025)
monocle3::plot_cells(seu_monocl, label_cell_groups = F)
No trajectory to plot. Has learn_graph() been called yet?

learn graph (i.e. identify trajectory) using monocle3 UMAP and clustering:

seu_monocl <- monocle3::learn_graph(seu_monocl)
monocle3::plot_cells(seu_monocl)

Exercise

Find the CD34+ B-cell cluster in the monocle UMAP. This cluster has a high expressession of CD79A and expresses CD34.

monocle3::plot_cells(seu_monocl, genes = c("CD79A", "CD34"),
                     show_trajectory_graph = FALSE, 
                     cell_size = 0.7, group_label_size = 4)

Cluster 11 has both a high expression of CD79A and CD34.

Select the “initial” cells in the B-cell cluster to calculate pseudotime. The initial cells in this case are the CD34+ B-cells we have just identified. A pop up window will open and you need to click on the “initial” cells (one node per trajectory), then click “Done”.

seu_monocl <- monocle3::order_cells(seu_monocl)
monocle3::plot_cells(seu_monocl,
                     color_cells_by = "pseudotime",
                     label_cell_groups=F,
                     label_leaves=F,
                     label_branch_points=FALSE,
                     graph_label_size=1.5, cell_size = 1)

In order to find genes which expression is affected by pseudtime, we first have to isolate the B-cell cluster. Therefore, extract all cells in the B-cell cluster with the interactive choose_cells function:

seuB <- choose_cells(seu_monocl)

Check whether you have selected the right cells:

plot_cells(seuB, show_trajectory_graph = FALSE, cell_size = 1,
           color_cells_by = "pseudotime")
Cells aren't colored in a way that allows them to be grouped.

Now we can use the cells in this trajectory to test which genes are affected by the trajectory:

pr_test <- graph_test(seuB, 
                      cores=4, 
                      neighbor_graph = "principal_graph")
# order by test statistic
pr_test <- pr_test[order(pr_test$morans_test_statistic, 
                         decreasing = TRUE),]
status p_value morans_test_statistic morans_I gene_short_name q_value
HBB OK 0 101.29233 0.8472319 HBB 0
HBA2 OK 0 100.44956 0.8405222 HBA2 0
PTMA OK 0 91.60811 0.7662323 PTMA 0
HBA1 OK 0 91.58533 0.7663894 HBA1 0
STMN1 OK 0 86.86541 0.7263461 STMN1 0
TOP2A OK 0 84.45458 0.7067213 TOP2A 0
View(pr_test)

There are some interesting genes in there, for example related to cell cycling (MKI67, CKS2), related to B-cell development (CD34, MS4A1) and immunoglobulins (IGLL1 and IGLL5). We can plot those in the UMAP:

goi <- c("CD34", "MS4A1", "IGLL1", "IGLL5", 
         "MKI67", "CKS2")
plot_cells(seuB, label_cell_groups=FALSE, genes = goi,
           show_trajectory_graph=FALSE, cell_size = 1)

But also against pseudotime:

seuB@colData$monocle_cluster <- clusters(seuB)

plot_genes_in_pseudotime(subset(seuB, 
                                rowData(seuB)$gene_short_name %in% goi),
                         min_expr=0.5, color_cells_by = "monocle_cluster")