This chapter uses the
gbm dataset you have created earlier today:
gbm <- readRDS("gbm_day2_part1.rds")
And load the following packages:
In the last exercise we saw that probably clustering at a resolution of 0.2 gave the most sensible results. Let’s therefore set the default identity of each cell based on this clustering:
gbm <- Seurat::SetIdent(gbm, value = gbm$RNA_snn_res.0.2)
From now on, grouping (e.g. for plotting) is done by the active identity (set at
@active.ident) by default.
Based on the UMAP we have generated, we can visualize expression for a gene in each cluster:
Based on expression of sets of genes you can do a manual cell type annotation. If you know the marker genes for some cell types, you can check whether they are up-regulated in one or the other cluster. Here we have some marker genes for two different cell types:
immune_genes<-c("GZMA", "CD3E", "CD3D") microglia_genes<-c("CCL4", "CCL3", "P2RY12", "C1QB", "CSF1ER", "CY3CR1")
Let’s have a look at the expression of the three immune genes:
Seurat::FeaturePlot(gbm, immune_genes, ncol=2)
These cells are almost all in cluster 6. Which becomes clearer when looking at the violin plot:
Seurat::VlnPlot(gbm, features = immune_genes, ncol = 2)
Exercise: Have a look at the microglia genes as well. Which cluster contains probably microglial cells?
Seurat::FeaturePlot(gbm, microglia_genes, ncol=2)
Corresponding mainly to cluster 1 (and cluster 4):
Seurat::VlnPlot(gbm, features = microglia_genes, ncol = 2)
We can also automate this with the function
AddModuleScore. For each cell, an expression score for a group of genes is calcuated:
gbm <- Seurat::AddModuleScore(gbm, features = list(immune_genes), name = "immune_genes")
Exercise: After running
AddModuleScore, a column was added to
A. What is the name of that column? What kind of data is in there?
B. Generate a UMAP with color accoding to this column and a violinplot grouped by cluster. Is this according to what we saw in the previous exercise?
A. The new column is called
immune_genes1. It contains the module score for each cell (which is basically the average expression of the set of genes).
B. You can plot the UMAP with
Which indeed shows these genes are mainly expressed in cluster 6:
Cell type annotation using
To do a fully automated annoation, we need a reference dataset of primary cells. Any reference could be used. The package
scRNAseq in Bioconductor includes several scRNAseq datasets that can be used as reference to
SingleR. One could also use a reference made of bulk RNA seq data. Here we are using the Human Primary Cell Atlas dataset from
celldex. Check out what’s in there:
hpca.se <- celldex::HumanPrimaryCellAtlasData() class(hpca.se) table(hpca.se$label.main)
SingleR compares our normalized count data to a reference set, and finds the most probable annation:
gbm_SingleR <- SingleR::SingleR(test = Seurat::GetAssayData(gbm, slot = "data"), ref = hpca.se, labels = hpca.se$label.main)
See what’s in there by using
In order to visualize it in our UMAP, we have to add the annotation to
gbm$SingleR_annot <- gbm_SingleR$labels Seurat::DimPlot(gbm, group.by = "SingleR_annot", label = T, repel = T)
Exercise: Compare our manual annotation (based on the set of immune genes) to the annotation with
SingleR. Do they correspond?
We can have a look at the mean module score for each
SingleR annotation like this:
mean_scores <- tapply(gbm$immune_genes1, gbm$SingleR_annot, mean) mean_scores[order(mean_scores, decreasing = TRUE)[1:6]]
T_cells NK_cell DC Monocyte Macrophage Chondrocytes 1.015605822 0.679592135 0.017018871 -0.005414908 -0.011926460 -0.011975024
Showing that T-cells and NK-cells have a high module score based on our set of immune genes, which makes sense.
Of course, it was also already clear from the UMAP plots that cluster 6 (the cluster with the high module score for the immune genes) contained the T-cells and NK-cells.
Save the dataset and clear environment
Now, save the dataset so you can use it tomorrow:
Clear your environment:
rm(list = ls()) gc() .rs.restartR()