After having completed this chapter you will be able to:
- Explain what kind of information single cell RNA-seq can give you to answer a biological question
- Describe essential considerations during the design of a single cell RNA-seq experiment
- Describe the pros and cons of different single cell sequencing methods
- Load single cell data into R
- Explain the basic structure of a
Seuratobject and extract count data and metadata
- Perform a basic quality control by:
- Evaluating the percentage of UMIs originating from mitochondrial genes
- Detecting doublets
Introduction to scRNA-seq and techniques:
scRNA-seq with 10x genomics:
- Single cell introductory video on iBiology
- Seurat website
- Paper on experimental considerations
- Paper on experimental design
- SMART-seq3 protocol at protocols.io
cellrangersystem requirements and installation
- Review by Tallulah Andrews
- Paper on correlation between mRNA and protein level in single cells
Have a look in the directory
reference. In the
reads directory you will find reads on one sample:
ETV6-RUNX1_1. In the analysis part of the course we will work with six samples, but due to time and computational limitations we will run
cellranger count on one of the samples, and only reads originating from chromsome 21 and 22.
The input you need to run
cellranger count are the sequence reads and a reference. Here, we have prepared a reference only with chromosome 21 and 22, but in ‘real life’ you would of course get the full reference genome of your species. The reference has a specific format. You can download precomputed human and mouse references from the 10X website. If your species of interest is not one of those, you will have to generate it yourself. For that, have a look here.
To be able to run cellranger in the compute environment, first run:
Have a look at the documentation of
cellranger count (scroll down to Command-line argument reference).
You can find the input files here:
/home/rstudio/single_cell_course/course_data/reads/(from the downloaded tar package in your home directory)
- pre-indexed reference:
Fill out the missing arguments (at
FIXME) in the script below, and run it:
cellranger count \ --id=FIXME \ --sample=FIXME \ --transcriptome=FIXME \ --fastqs=FIXME \ --localcores=4
This will take a while..
Once started, the process will need approximately 15 minutes to finish. Have a coffee and/or have a look at the other exercises.
Running a bash command with Rstudio
You can run a bash script or command using the terminal tab in Rstudio server:
cellranger count \ --id=ETV6-RUNX1_1 \ --sample=ETV6-RUNX1_1 \ --transcriptome=/group_work/cellranger_index \ --fastqs=/home/rstudio/single_cell_course/course_data/reads \ --localcores=4
Have a look out the output directory (i.e.
~/ETV6-RUNX1_1/outs). The analysis report (
web_summary.html) is usually a good place to start.
Open html files in Rstudio server
You can use the file browser in the bottom right (tab “Files”) to open html files:
Exercise: Have a good look inside
web_summary.html. Anything that draws your attention? Is this report good enough to continue the analysis?
Not really. First of all there is a warning:
Fraction of RNA read bases with Q-score >= 30 is low. This means that there is a low base quality of the reads. A low base quality gives results in more sequencing error and therefore possibly lower performance while mapping the reads to genes. However, a Q-score of 30 still represents 99.9% accuracy.
What should worry us more is the number of reads per cell (363) and the sequencing saturation (7.9%). In most cases you should aim for 30.000 - 50.000 reads per cell (depending on the application). We therefore don’t have enough reads per cell. However, as you might remember, this was a subset of reads (1 million) mapped against chromosome 21 & 22, while the original dataset contains 210,987,037 reads. You can check out the original report at
For more info on sequencing saturation, have a look here.