Project 2

Project 2 focuses on a single-cell sequencing study of the Drosophila brain following acute cocaine exposure; paper. Flies were exposed to cocaine, which impaired locomotor activity and increased the incidence of seizures and compulsive grooming. To investigate the specific cell populations responding to cocaine, single-cell transcriptional responses were analyzed in duplicate samples of flies that consumed sucrose or sucrose supplemented with cocaine. The study utilized the 10x Genomics Chromium platform for single-cell RNA sequencing.

Unsupervised clustering of transcriptional profiles from 86,224 cells revealed 36 distinct clusters, representing all major cell types (neuronal and glial) and neurotransmitter types from most brain regions. Differential expression analysis within individual clusters indicated cluster-specific responses to cocaine, with Kenyon cells of the mushroom bodies and glia showing particularly large transcriptional changes. The study highlighted profound sexual dimorphism in brain transcriptional responses to cocaine, with males exhibiting more pronounced changes than females.

Cluster-specific coexpression networks and global interaction networks revealed diverse cellular processes affected by acute cocaine exposure, providing an atlas of sexually dimorphic cocaine-modulated gene expression in the Drosophila brain.

Available data

Data has been downloaded and prepared for you from GEO GSE152495.

In order to download the data, run:

wget https://single-cell-transcriptomics.s3.eu-central-1.amazonaws.com/projects/data/project2.tar.gz
tar -xzvf project2.tar.gz

After extracting, a directory project2 appears with the following format:

.
├── data
│   ├── Female_Cocaine_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── Female_Cocaine_2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── Female_Sucrose_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── Female_Sucrose_2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── Male_Cocaine_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── Male_Cocaine_2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── Male_Sucrose_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   └── Male_Sucrose_2
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
└── paper.pdf

9 directories, 25 files

Showing us that we have two replicates per treatment, and two treatments:

Male_Sucrose: controls
Female_Sucrose: controls
Male_Cocaine: treatments
Female_Cocaine: treatments

Now create a new project in the project2 directory (Project (None) > New Project …), and create Seurat object from the count matrices:

library(Seurat)

# vector of paths to all sample directories
datadirs <- list.files(path = "project2/data", full.names = TRUE)

# get the sample names and replace underscores with hyphens
names(datadirs) <- basename(datadirs) |> gsub("_", "-", x = _)

# create a large sparse matrix from all count data
sparse_matrix <- Seurat::Read10X(data.dir = datadirs)

# create a seurat object from sparse matrix
seu <- Seurat::CreateSeuratObject(counts = sparse_matrix,
                                  project = "CocaineStudy")

Project exercise

With this dataset, go through the steps we have performed during the course, and try to reproduce the results provided in the paper.

Tips

Pay specific attention to the following key analysis steps and parameters from the paper:

1. Quality Control (QC)

The paper does not provide explicit QC thresholds for filtering cells. You should use standard single-cell RNA-seq QC metrics and apply appropriate thresholds to filter out low-quality cells and genes.

Mitochondrial Genes: The paper uses the pattern "^mt:". Calculate the percentage of mitochondrial reads.
Ribosomal Genes: The paper uses the pattern "^Rp[SL]". Calculate the percentage of ribosomal reads.
Filtering: Use a combination of the number of features per cell, UMI counts per cell, and the calculated mitochondrial and ribosomal percentages to filter the data.

2. Data Normalization and Scaling

The paper mentions that transcriptional profiles were normalized by sequencing depth and log-transformed. This corresponds to the LogNormalize method in Seurat with a scale factor of 10,000. For scaling, the paper used SCTransform to regress out the effects of total UMI counts and mitochondrial genes.

Reproduce:
1. First, normalize the data using NormalizeData with the LogNormalize method.
2. Then, use SCTransform to scale the data and remove unwanted variation.

3. Dimensionality Reduction

The paper used Principal Component Analysis (PCA) for linear dimensionality reduction and UMAP for nonlinear dimensionality reduction and visualization.

Reproduce:
1. Run PCA on the scaled data.
2. Use an Elbow Plot to determine the appropriate number of principal components to use for subsequent steps.
3. Run UMAP on the selected principal components for 2D visualization of the data.

4. Unsupervised Clustering

The paper used the Shared Nearest Neighbor (SNN) clustering algorithm. A crucial detail is that a resolution of 0.8 was found to be stable and yielded 36 distinct clusters, which is a key finding you should try to reproduce.

Reproduce:
1. Find the nearest neighbors.
2. Apply the clustering algorithm with a resolution of 0.8 to identify the 36 clusters.

5. Cluster Annotation

Annotate the 36 clusters by identifying top marker genes for each cluster. The paper provides specific filtering criteria for this analysis.

Reproduce:
1. Use a function like FindAllMarkers to find marker genes for each cluster.
2. Filter the marker genes using the paper’s criteria:
  - \(|log\_{e}FC| \> 0.5\)
  - Bonferroni-adjusted P-value < 0.05

6. Differential Expression (DE) Analysis

The paper performed differential expression analysis within individual clusters. You should compare gene expression between cocaine-exposed and sucrose-exposed flies within specific cell clusters.

Reproduce:
1. Use a function like FindMarkers to compare the two conditions (Cocaine vs. Sucrose) within each cluster.
2. Filter the results for strongly differentially expressed genes using the paper’s more stringent criteria:
  - \(|log\_{e}FC| \> 1.0\)
  - Bonferroni-adjusted P-value < 0.05

7. Analysis of Sexual Dimorphism

Sexual Dimorphism refers to the differences in gene expression between males and females in response to cocaine exposure. While males and females of a species share a core set of traits, they can also exhibit distinct characteristics. In the paper, the authors found that the genetic changes in the brain after cocaine exposure were much more significant and widespread in male flies compared to female flies. This difference in response between the sexes is “profound sexual dimorphism.” The study highlighted profound sexual dimorphism.

Reproduce:
1. Perform a DE analysis between cocaine and sucrose conditions separately for male flies and female flies.
2. Compare the results to see if male flies exhibit more widespread changes than females, as reported in the paper.

8. Pathway Enrichment Analysis

The paper used Reactome for pathway enrichment analysis. You can use the R package clusterProfiler for this task. While the paper used Reactome, clusterProfiler is a popular tool that can perform enrichment analysis with various databases, including Reactome. You will need to ensure you have the appropriate Drosophila melanogaster gene annotation package for clusterProfiler to work correctly. To install a package, use the command renv::install("package-name").

Reproduce:
1. Take the list of differentially expressed genes from your DE analysis.
2. Perform a pathway enrichment analysis using clusterProfiler to identify affected biological pathways.

Key Figures and Tables to Compare

As you work through the analysis, you can compare your results to the following figures and tables from the paper:

Figure 2: These figures show UMAP plots and cluster annotations. You can aim to reproduce a similar UMAP plot with 36 clusters.
Figure 3: These figures show heatmaps and Euler plots of differentially expressed genes.

Important

The supplementary material of this paper contain a docx file with some data analysis code.