Skip to content

Introduction scRNAseq

Learning outcomes

After having completed this chapter you will be able to:

  • Explain what kind of information single cell RNA-seq can give you to answer a biological question
  • Describe essential considerations during the design of a single cell RNA-seq experiment
  • Describe the pros and cons of different single cell sequencing methods
  • Load single cell data into R
  • Explain the basic structure of a Seurat object and extract count data and metadata
  • Perform a basic quality control by:
    • Evaluating the percentage of UMIs originating from mitochondrial genes
    • Detecting doublets


Introduction to scRNA-seq and techniques:

Download the presentation

scRNA-seq with 10x genomics:

Download the presentation

Running cellranger count

Have a look in the directory course_data/reads and reference. In the reads directory you will find reads on one sample: ETV6-RUNX1_1. In the analysis part of the course we will work with six samples, but due to time and computational limitations we will run cellranger count on one of the samples, and only reads originating from chromsome 21 and 22.

The input you need to run cellranger count are the sequence reads and a reference. Here, we have prepared a reference only with chromosome 21 and 22, but in ‘real life’ you would of course get the full reference genome of your species. The reference has a specific format. You can download precomputed human and mouse references from the 10X website. If your species of interest is not one of those, you will have to generate it yourself. For that, have a look here.

To be able to run cellranger in the compute environment, first run:

export PATH=/group_work/cellranger-6.1.2:$PATH

Have a look at the documentation of cellranger count (scroll down to Command-line argument reference).

You can find the input files here:

  • reads: /home/rstudio/single_cell_course/course_data/reads/ (from the downloaded tar package in your home directory)
  • pre-indexed reference: /group_work/cellranger_index

Fill out the missing arguments (at FIXME) in the script below, and run it:

cellranger count \
--id=FIXME \
--sample=FIXME \
--transcriptome=FIXME \
--fastqs=FIXME \

This will take a while..

Once started, the process will need approximately 15 minutes to finish. Have a coffee and/or have a look at the other exercises.

Running a bash command with Rstudio

You can run a bash script or command using the terminal tab in Rstudio server:

cellranger count \
--id=ETV6-RUNX1_1 \
--sample=ETV6-RUNX1_1 \
--transcriptome=/group_work/cellranger_index \
--fastqs=/home/rstudio/single_cell_course/course_data/reads \

Have a look out the output directory (i.e. ~/ETV6-RUNX1_1/outs). The analysis report (web_summary.html) is usually a good place to start.

Open html files in Rstudio server

You can use the file browser in the bottom right (tab “Files”) to open html files:

Exercise: Have a good look inside web_summary.html. Anything that draws your attention? Is this report good enough to continue the analysis?


Not really. First of all there is a warning: Fraction of RNA read bases with Q-score >= 30 is low. This means that there is a low base quality of the reads. A low base quality gives results in more sequencing error and therefore possibly lower performance while mapping the reads to genes. However, a Q-score of 30 still represents 99.9% accuracy.

What should worry us more is the number of reads per cell (363) and the sequencing saturation (7.9%). In most cases you should aim for 30.000 - 50.000 reads per cell (depending on the application). We therefore don’t have enough reads per cell. However, as you might remember, this was a subset of reads (1 million) mapped against chromosome 21 & 22, while the original dataset contains 210,987,037 reads. You can check out the original report at course_data/count_matrices/ETV6-RUNX1_1/outs/web_summary.html.

For more info on sequencing saturation, have a look here.