Project 4

Project 4 focuses on understanding the mechanisms of immune remodeling in cervical cancer (CC) and identifying potential therapeutic targets; paper1, paper2. Cervical cancer progression involves complex interactions between immune cells and tumor cells, leading to an immunosuppressive microenvironment. To investigate these interactions, this study utilizes scRNA-seq data from normal cervix, high-grade squamous intraepithelial lesions (HSIL), and cervical cancer tissues to analyze transcriptional responses in immune cells.

Unsupervised clustering of transcriptional profiles revealed distinct immune cell populations and their interactions with tumor cells. Differences in immune cell states were linked to the progression of cervical cancer and the establishment of an immunosuppressive microenvironment. Specifically, the study identified unique HPV-related epithelial clusters and critical node genes that regulate disease progression. The transition from normal cervix to HSIL and cervical cancer was marked by changes in immune cell populations, including T cells, dendritic cells, and macrophages.

Further analysis identified key immune cell subsets and their roles in shaping the tumor microenvironment. Network analyses demonstrated that immune cell interactions influence functional T cell programs and systemic immune coordination. This study provides insights into how immune remodeling impacts cervical cancer progression and highlights transcriptional markers that could be used to predict patient outcomes and guide therapeutic strategies.

Available data

These samples represent different stages and conditions of cervical tissues, providing a comprehensive dataset for analyzing the progression from normal cervix to cervical cancer.

Normal Cervix without HPV (NO_HPV):

N_HPV_NEG_1
N_HPV_NEG_2

Normal Cervix with HPV (N_HPV):

High-Grade Squamous Intraepithelial Lesions with HPV (HSIL_HPV):

HSIL_1
HSIL_2

Cervical Cancer with HPV (CA_HPV):

SCC_4
SCC_5
ADC_6

Data has been downloaded and prepared for you from GEO GSE208653.

In order to download the data, run:

wget https://single-cell-transcriptomics.s3.eu-central-1.amazonaws.com/projects/data/project4.tar.gz
tar -xzvf project4.tar.gz

After extracting, a directory project4 appears with the following content:

.
├── data
│   ├── ADC_6
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── HSIL_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── HSIL_2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── N_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── N_2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── N_HPV_NEG_1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── N_HPV_NEG_2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── SCC_4
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   └── SCC_5
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── paper1.pdf
└── paper2.pdf

10 directories, 29 files

Now create a new project in the project1 directory (Project (None) > New Project …), and create Seurat object from the count matrices:

library(Seurat)
# vector of paths to all sample directories
datadirs <- list.files(path = "data", full.names = TRUE)

# get the sample names
# replace underscores with hyphen to correctly extract sample names later on
names(datadirs) <- basename(datadirs) |> gsub("_", "-", x = _) 

# for now, we only take the HPV negative and cervical cancer samples
datadirs <- datadirs[c("N-HPV-NEG-1", "N-HPV-NEG-2", "SCC-4", "SCC-5")]

# create a large sparse matrix from all count data
sparse_matrix <- Seurat::Read10X(data.dir = datadirs)

# create a seurat object from sparse matrix
seu <- Seurat::CreateSeuratObject(counts = sparse_matrix,
                                  project = "CervicalCancerStudy")

Project exercise

With this dataset, go through the steps we have performed during the course, and try to reproduce the results provided in the paper. Pay specific attention to quality control, clustering and annotation.

Guidance questions

If you’d like more structure, the following questions may guide your analyses:

Quality control and filtering

How many cells are there in the data set (raw data)?
Is there evidence of dying cells? If so, how can you filter for dying cells?
How does your filtering strategy compare with what was done in the two papers?

Bonus questions (advanced)

Why do you think paper 2 chose not to remove potential doublets?
Do you think paper 1 or paper 2 chose a better threshold for removing cells based on high mitochondrial gene expression?

Normalization & scaling

Which normalization strategy was used in the papers?
How many principal components would you suggest for downstream analyses?
Are there batch-effects in this dataset? If so, how can you tell? How could you correct for batch effects?

Clustering & annotation

Which clustering resolution would you suggest for downstream analyses?
Which major cell types are you expecting for this dataset?
Which marker genes will you use for annotation?
Which clusters can you clearly identify using these markers genes?

Bonus questions (advanced)

If you didn’t have such a list of marker genes, which other techniques could you try?
How do these compare to manual annotation?

Additional tips

For mitochondrial genes, ribosomol genes and hemoglobin genes you can use the following patterns: "^MT-", "^RP[SL]" and "^HB[^(P)]".
Work iterative; meaning that based on results of an analyis, adjust the previous analysis. For example, if clustering is not according to cell types, try to adjust the number of components or the resolution.
Please read the methods section of the paper.
If the code for data analysis is available, try to adapt it (for specific parameters).
Check the supplementary figures.
Try to understand if they used some other tools for the data analysis.