# Download and extract project2 data into a folder named 'project2'
url <- "https://multiomics-biological-integration-training.s3.eu-central-1.amazonaws.com/project2.tar.gz"
dest <- "project2.tar.gz"
download.file(url, dest, mode = "wb")
utils::untar(dest, exdir = "project2")Project 2
Description
In mouse embryonic brain development at E11.5 (early neural tube patterning and initial regionalization) and E15.5 (mid-neurogenesis, progenitor expansion, and early differentiation), brain regionalization into forebrain (prosencephalon/telencephalon/diencephalon), midbrain (mesencephalon), and hindbrain (rhombencephalon) relies on key signaling centers (e.g., mid-hindbrain boundary/isthmus) and transcription factors (TFs). These TFs are encoded on various chromosomes, with no single chromosome dominating the process—development involves distributed genetic networks across the genome.
Midbrain
Major TFs and genes at E11.5–E15.5
OTX2 (dorsal midbrain, boundary maintenance), EN1/EN2 (mid-hindbrain organizer), PAX2/5, LMX1A (floor plate/ventral dopaminergic progenitors), WNT1 (isthmus signaling), FGF8 (organizer), NEUROG1/2, NEUROD1/4 (neurogenesis), MSX2 (progenitors).
Important chromosomes (examples of key genes):
Chr1: En1
Chr2: En2, Pax2
Chr5: Otx2, Wnt1
At E11.5: Mid-hindbrain boundary (isthmus) is critical; Otx2 restricts posterior identity.
At E15.5: Ventral midbrain dopaminergic progenitors (e.g., for future substantia nigra/VTA) emerge, with dynamic enhancers.
This project focuses on E11.5 and E15.5 midbrain samples. WGBS should be treated as an observation-level assay (CpG-level methylation proportions with coverage) and represented as BSseq/GenomicRatioSet or a CpG BED with methylation and coverage prior to integration. Students will integrate RNA-seq, ATAC-seq, WGBS, and histone modification (H3K4me3, H3K27ac, H3K27me3, H3K4me1) data to identify regulatory programs and epigenomic changes during midbrain development. Deliverables: an analysis report, figures highlighting differential expression/peaks/methylation, and a short presentation.
Available data
Mouse Development Matrix: Midbrain
Students should download project data (group projects) from: https://multiomics-biological-integration-training.s3.eu-central-1.amazonaws.com/project1.tar.gz
Setup in R
After extracting, a directory project2 appears with the following content:
.
├── ATAC_11half.bw
├── ATAC_15half.bw
├── atac_se.rds
├── H3K27ac_11half.bw
├── H3K27ac_15half.bw
├── h3k27ac_se.rds
├── H3K27me3_11half.bw
├── H3K27me3_15half.bw
├── h3k27me3_se.rds
├── H3K4me1_11half.bw
├── H3K4me1_15half.bw
├── h3k4me1_se.rds
├── H3K4me3_11half.bw
├── H3K4me3_15half.bw
├── h3k4me3_se.rds
├── rna_se.rds
├── WGBS_11half.bed.gz
└── WGBS_15half.bed.gz
├─ rna_se.rds — SummarizedExperiment for RNA-seq (gene counts / normalized expression)
├─ atac_se.rds — SummarizedExperiment for ATAC-seq (peak counts / accessibility)
├─ h3k4me3_se.rds, h3k27ac_se.rds, h3k4me1_se.rds, h3k27me3_se.rds — SummarizedExperiments for histone marks (peak-level assays)
├─ *.bw (e.g., H3K4me3_11half.bw, ATAC_15half.bw) — bigWig signal tracks for visualization
├─ WGBS_11half.bed.gz, WGBS_15half.bed.gz — CpG-level methylation (columns typically: chr, start, end, methylation_prop, coverage); treat as observation assay
The provided assay objects are RangedSummarizedExperiment (assay names: “counts”, “logCPM”) with rowRanges as GRanges and colData including sample/group fields; differential analyses have already been performed and are reflected in the supplied objects. Students should therefore focus on interpretation and integrative analysis rather than re-running DE tests.
Day 1
-
Explore the dataset for the project:
- How many samples are there?
- Is the data properly normalized?
- How about a volcano plot to visualize the differences?
- How many chromosomes are present in the data?
-
ATAC-seq data
- How many peaks are there?
- How many differentially accessible peaks are there between E11.5 and E15.5?
- What is the median width of the ATAC peaks?
-
Normalizing and plotting
- Use correct normalization method for each dataset.
- Can you divide the
EnrichedHeatmapfor increse and decrease in accessibility regions? - How about also dividng
EnrichedHeatmapfor different annotations?
-
ChIP-seq data
- How many peaks are there for each histone mark?
- How many differentially enriched peaks are there between E11.5 and E15.5 for each mark?
- What is the median width of the peaks for each mark?
-
RNA-seq data
- How many genes are there?
- How many differentially expressed genes are there between E11.5 and E15.5?
-
Key Gene Expression Boxplots: Create boxplots showing the normalized expression of
OTX2andEN1across the samples for both E11.5 and E15.5. Discuss if these expression profiles match their known roles in midbrain development.
Day 2
-
overlapMatrix- As done during the course exercise, make the
overlapMatrixfor your data. - Similar to Exercise 5, find different regions of activity in your data.
- As done during the course exercise, make the
-
Integrative analysis
- Can you identify enhancers in the data?
- Are there any repressed promoters?
- Can you identify regions with increased chromatin accessibility, increased H3K27ac and H3K4me3, increased gene expression and which are less than 2.kb of TSS? what would you call these regions?
Day 3
- Enrichment analysis
- Can you find any enriched GO terms in the Gained Enhancer regions in E15.5 compared to E11.5?
- Can you find any enriched GO terms in the Repressed Promoter regions in E15?
- What biological significant GO terms did you find in these regions? Do they make sense in the context of development?
Advanced analysis
- For En1, which category fits (e.g., Repressed at E15.5)?
- Obtain
overlapMatrixfrom other brain region from your colleagues. - Merge
overlapMatricesfrom all three regions to compare co-occurring signals. - Generate shared vs. unique regulatory regions (e.g., Venn diagram of gained enhancers).
- Are there region-specific regulatory programs distinguishing forebrain/midbrain/hindbrain projects?