Project 3

Project 3 focuses on understanding the role of type I interferon (IFN-I) responsiveness in shaping immune outcomes following PD1 blockade therapy; paper. Type I interferons are central regulators of anti-tumor immunity and responses to immunotherapy, but they also drive the feedback inhibition underlying therapeutic resistance. To investigate how pre-existing IFN-I responses influence therapeutic success, this study utilizes scRNA-seq data from healthy donors and 8 treated patients to analyze transcriptional responses in immune cells.

Unsupervised clustering of transcriptional profiles revealed distinct immune cell populations and IFN-I-induced responses. Differences in IFN-I responsiveness were linked to immune cell states and transcriptional programs that influence therapy outcomes. Patients with lower pre-therapy IFN-I responsiveness in CD4 and CD8 effector T cells (Teff cells) exhibited transcriptional signatures associated with improved immune function, whereas highly responsive Teff cells displayed gene expression patterns linked to immune dysfunction and therapy resistance.

Further analysis identified epigenetically imprinted IFN-I response states that predefine immune reactivity to therapy. Coexpression and network analyses demonstrated that IFN-I responsiveness influences functional T cell programs and systemic immune coordination. This study provides insights into how pre-existing immune states impact therapeutic success and highlights transcriptional markers that could be used to predict patient outcomes in PD1 blockade immunotherapy.

Available data

Data has been downloaded and prepared for you from GEO GSE199994.

In order to download the data, run:

wget https://single-cell-transcriptomics.s3.eu-central-1.amazonaws.com/projects/data/project3.tar.gz

tar -xzvf project3.tar.gz

After extracting, a directory GSE199994 appears with the following format:

GSE199994/
├── data
│   ├── HD1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── HD2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P1
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P2
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P3
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P4
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P5
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P6
│   │   ├── barcodes.tsv.gz
│   │   ├── features.tsv.gz
│   │   └── matrix.mtx.gz
│   ├── P7
   │   ├── barcodes.tsv.gz
   │   ├── features.tsv.gz
   │   └── matrix.mtx.gz
   └── P8
       ├── barcodes.tsv.gz
       ├── features.tsv.gz
       └── matrix.mtx.gz
└── paper.pdf

Showing us that we have two healthy donors (HD) and eight treated patients (P).

Now create a new project in the project3 directory (Project (None) > New Project …), and create a combined Seurat object from all the count matrices:

library(Seurat)

# vector of paths to all sample directories
datadirs <- list.files(path = "project3/data", full.names = TRUE)

# get the sample names and replace underscores with hyphens
names(datadirs) <- basename(datadirs) |> gsub("_", "-", x = _)

# create a large sparse matrix from all count data
sparse_matrix <- Seurat::Read10X(data.dir = datadirs)

# create a seurat object from sparse matrix
seu <- Seurat::CreateSeuratObject(counts = sparse_matrix,
project = "InterferonStudy")

Project exercise

With this dataset, go through the steps we have performed during the course, and try to reproduce the results provided in the paper.

Tips

Pay specific attention to the following key analysis steps and parameters from the paper:

1. Quality Control (QC)

The paper’s methods for filtering were primarily based on mass cytometry (CyTOF) data. For your scRNA-seq data, you should apply standard QC metrics to filter out low-quality cells and genes using the subset function in Seurat.

Mitochondrial Genes: Use the pattern "^MT-" to calculate the percentage of mitochondrial reads (PercentageFeatureSet).
Ribosomal Genes: Use the pattern "^RP[SL]" to calculate the percentage of ribosomal reads.
Hemoglobin Genes: Use the pattern "^HB[^(P)]" to calculate the percentage of hemoglobin reads.
Filtering: Use the subset function to filter cells based on the number of features (nFeature_RNA), UMI counts (nCount_RNA), and the calculated percentages of mitochondrial, ribosomal, and hemoglobin reads.

2. Data Normalization and Scaling

The paper used arcsinh transformation for their protein expression data. For your scRNA-seq data, you should use SCTransform to regress out unwanted variation.

Reproduce:
1. Use the SCTransform() function to normalize and scale the data and to regress out confounding factors like the number of UMIs (nFeature_RNA). The paper mentions using scTransform residuals for differential expression calculation, so this method is highly relevant.

3. Dimensionality Reduction

The paper used UMAP for visualization. For this, you will first need to perform a linear dimensionality reduction.

Reproduce:
1. Run PCA on the scaled data using the RunPCA() function.
2. Use an Elbow Plot (ElbowPlot()) to determine the appropriate number of principal components.
3. Run UMAP on the selected principal components using the RunUMAP() function to visualize the data in 2D.

4. Unsupervised Clustering

The paper used the fast-PhenoGraph algorithm for clustering and identified 39 distinct clusters in their ex vivo dataset.

Reproduce:
1. Find the nearest neighbors using FindNeighbors().
2. Apply the clustering algorithm using FindClusters(). You can adjust the resolution parameter to identify a similar number of clusters as reported in the paper. A resolution between 0.5 and 1.0 is a good starting point.

5. Cluster Annotation

Annotate your clusters by identifying top marker genes for each cluster. The paper manually classified clusters based on lineage marker expression.

Reproduce:
1. Use the FindAllMarkers() function to find marker genes for each cluster. You can set min.pct = 0.25 and logfc.threshold = 0.25 as a starting point.
2. Filter the marker genes (e.g., based on adjusted p-value) and use known canonical markers to assign cell type labels to your clusters.

6. Differential Expression (DE) and IFN-I Response Analysis

A core part of the paper’s analysis is the IFN-I response capacity (IRC) score, which was based on the expression of a core set of IFN-I-stimulated proteins (ISPs). The paper also used the Wilcoxon’s rank-sum test for statistical testing.

IFN-I Response Score: To replicate this using gene expression data, you can use Seurat’s AddModuleScore() function. The paper used the ISPs BST2, PKR, ISG15, MX1, IFIT3, and IRF7. You can use the corresponding genes to define your ISG gene list.
Differential Expression: Use the FindMarkers() function to compare gene expression between groups (e.g., patients with high vs. low IFN-I responsiveness). You should specify test.use = "wilcox" for the Wilcoxon’s rank-sum test.

7. Comparison of Patient Groups

The paper’s key finding relates IFN-I responsiveness to therapy outcome.

Reproduce:
1. After calculating the IFN-I response score, classify patient cells into high and low IFN-I responsive groups based on their scores. A common approach is to use the median or a quantile-based cutoff.
2. Perform a differential expression analysis between these two groups within specific cell clusters to identify gene expression patterns associated with improved or poor outcomes.

8. Pathway Enrichment Analysis

The paper used Gene Set Enrichment Analysis (GSEA) and Ingenuity Pathway Analysis (IPA). You can use the R package clusterProfiler for this task.

Reproduce:
1. Take the list of differentially expressed genes from your DE analysis.
2. Perform a pathway enrichment analysis using clusterProfiler to identify affected biological pathways. You can use functions like enrichGO for Gene Ontology enrichment.

Key Figures to Compare

As you work through the analysis, you can compare your results to the following figures from the paper:

UMAP Plots: Compare your UMAP plots with Figure 1b and Figure 2a to check your clustering and visualization.
IRC Score Plots: Compare the distributions of your calculated IFN-I response score across different cell populations to the violin plots in Figure 2d.