wget https://single-cell-transcriptomics.s3.eu-central-1.amazonaws.com/projects/data/project2.tar.gz
tar -xzvf project2.tar.gz
Project 2
Project 2 focuses on a single-cell sequencing study of the Drosophila brain following acute cocaine exposure; paper. Flies were exposed to cocaine, which impaired locomotor activity and increased the incidence of seizures and compulsive grooming. To investigate the specific cell populations responding to cocaine, single-cell transcriptional responses were analyzed in duplicate samples of flies that consumed sucrose or sucrose supplemented with cocaine. The study utilized the 10x Genomics Chromium platform for single-cell RNA sequencing.
Unsupervised clustering of transcriptional profiles from 86,224 cells revealed 36 distinct clusters, representing all major cell types (neuronal and glial) and neurotransmitter types from most brain regions. Differential expression analysis within individual clusters indicated cluster-specific responses to cocaine, with Kenyon cells of the mushroom bodies and glia showing particularly large transcriptional changes. The study highlighted profound sexual dimorphism in brain transcriptional responses to cocaine, with males exhibiting more pronounced changes than females.
Cluster-specific coexpression networks and global interaction networks revealed diverse cellular processes affected by acute cocaine exposure, providing an atlas of sexually dimorphic cocaine-modulated gene expression in the Drosophila brain.
Available data
Data has been downloaded and prepared for you from GEO GSE152495.
In order to download the data, run:
After extracting, a directory project2
appears with the following format:
.
├── data
│ ├── Female_Cocaine_1
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── Female_Cocaine_2
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── Female_Sucrose_1
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── Female_Sucrose_2
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── Male_Cocaine_1
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── Male_Cocaine_2
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ ├── Male_Sucrose_1
│ │ ├── barcodes.tsv.gz
│ │ ├── features.tsv.gz
│ │ └── matrix.mtx.gz
│ └── Male_Sucrose_2
│ ├── barcodes.tsv.gz
│ ├── features.tsv.gz
│ └── matrix.mtx.gz
└── paper.pdf
9 directories, 25 files
Showing us that we have two replicates per treatment, and two treatments:
- Male_Sucrose: controls
- Female_Sucrose: controls
- Male_Cocaine: treatments
- Female_Cocaine: treatments
Now create a new project in the project2
directory (Project (None) > New Project …), and create Seurat object from the count matrices:
library(Seurat)
# vector of paths to all sample directories
datadirs <- list.files(path = "project2/data", full.names = TRUE)
# get the sample names and replace underscores with hyphens
names(datadirs) <- basename(datadirs) |> gsub("_", "-", x = _)
# create a large sparse matrix from all count data
sparse_matrix <- Seurat::Read10X(data.dir = datadirs)
# create a seurat object from sparse matrix
seu <- Seurat::CreateSeuratObject(counts = sparse_matrix,
project = "CocaineStudy")
With this dataset, go through the steps we have performed during the course, and try to reproduce the results provided in the paper.
Tips
Pay specific attention to the following key analysis steps and parameters from the paper:
1. Quality Control (QC)
The paper does not provide explicit QC thresholds for filtering cells. You should use standard single-cell RNA-seq QC metrics and apply appropriate thresholds to filter out low-quality cells and genes.
-
Mitochondrial Genes: The paper uses the pattern
"^mt:"
. Calculate the percentage of mitochondrial reads. -
Ribosomal Genes: The paper uses the pattern
"^Rp[SL]"
. Calculate the percentage of ribosomal reads. - Filtering: Use a combination of the number of features per cell, UMI counts per cell, and the calculated mitochondrial and ribosomal percentages to filter the data.
2. Data Normalization and Scaling
The paper mentions that transcriptional profiles were normalized by sequencing depth and log-transformed. This corresponds to the LogNormalize
method in Seurat with a scale factor of 10,000. For scaling, the paper used SCTransform
to regress out the effects of total UMI counts and mitochondrial genes.
-
Reproduce:
- First, normalize the data using
NormalizeData
with theLogNormalize
method. - Then, use
SCTransform
to scale the data and remove unwanted variation.
- First, normalize the data using
3. Dimensionality Reduction
The paper used Principal Component Analysis (PCA) for linear dimensionality reduction and UMAP for nonlinear dimensionality reduction and visualization.
-
Reproduce:
- Run PCA on the scaled data.
- Use an Elbow Plot to determine the appropriate number of principal components to use for subsequent steps.
- Run UMAP on the selected principal components for 2D visualization of the data.
4. Unsupervised Clustering
The paper used the Shared Nearest Neighbor (SNN) clustering algorithm. A crucial detail is that a resolution of 0.8 was found to be stable and yielded 36 distinct clusters, which is a key finding you should try to reproduce.
-
Reproduce:
- Find the nearest neighbors.
- Apply the clustering algorithm with a resolution of 0.8 to identify the 36 clusters.
5. Cluster Annotation
Annotate the 36 clusters by identifying top marker genes for each cluster. The paper provides specific filtering criteria for this analysis.
-
Reproduce:
- Use a function like
FindAllMarkers
to find marker genes for each cluster. - Filter the marker genes using the paper’s criteria:
- \(|log\_{e}FC| \> 0.5\)
- Bonferroni-adjusted P-value < 0.05
- Use a function like
6. Differential Expression (DE) Analysis
The paper performed differential expression analysis within individual clusters. You should compare gene expression between cocaine-exposed and sucrose-exposed flies within specific cell clusters.
-
Reproduce:
- Use a function like
FindMarkers
to compare the two conditions (Cocaine vs. Sucrose
) within each cluster. - Filter the results for strongly differentially expressed genes using the paper’s more stringent criteria:
- \(|log\_{e}FC| \> 1.0\)
- Bonferroni-adjusted P-value < 0.05
- Use a function like
7. Analysis of Sexual Dimorphism
Sexual Dimorphism refers to the differences in gene expression between males and females in response to cocaine exposure. While males and females of a species share a core set of traits, they can also exhibit distinct characteristics. In the paper, the authors found that the genetic changes in the brain after cocaine exposure were much more significant and widespread in male flies compared to female flies. This difference in response between the sexes is “profound sexual dimorphism.” The study highlighted profound sexual dimorphism.
-
Reproduce:
- Perform a DE analysis between cocaine and sucrose conditions separately for male flies and female flies.
- Compare the results to see if male flies exhibit more widespread changes than females, as reported in the paper.
8. Pathway Enrichment Analysis
The paper used Reactome for pathway enrichment analysis. You can use the R
package clusterProfiler
for this task. While the paper used Reactome, clusterProfiler
is a popular tool that can perform enrichment analysis with various databases, including Reactome. You will need to ensure you have the appropriate Drosophila melanogaster
gene annotation package for clusterProfiler
to work correctly. To install a package, use the command renv::install("package-name")
.
-
Reproduce:
- Take the list of differentially expressed genes from your DE analysis.
- Perform a pathway enrichment analysis using
clusterProfiler
to identify affected biological pathways.
Key Figures and Tables to Compare
As you work through the analysis, you can compare your results to the following figures and tables from the paper:
- Figure 2: These figures show UMAP plots and cluster annotations. You can aim to reproduce a similar UMAP plot with 36 clusters.
- Figure 3: These figures show heatmaps and Euler plots of differentially expressed genes.
The supplementary material of this paper contain a docx
file with some data analysis code.