# Download and extract project3 data into a folder named 'project3'
url <- "https://multiomics-biological-integration-training.s3.eu-central-1.amazonaws.com/project3.tar.gz"
dest <- "project3.tar.gz"
download.file(url, dest, mode = "wb")
utils::untar(dest, exdir = "project3")Project 3
Description
In mouse embryonic brain development at E11.5 (early neural tube patterning and initial regionalization) and E15.5 (mid-neurogenesis, progenitor expansion, and early differentiation), brain regionalization into forebrain (prosencephalon/telencephalon/diencephalon), midbrain (mesencephalon), and hindbrain (rhombencephalon) relies on key signaling centers (e.g., mid-hindbrain boundary/isthmus) and transcription factors (TFs). These TFs are encoded on various chromosomes, with no single chromosome dominating the process—development involves distributed genetic networks across the genome.
Hindbrain
Major TFs and genes at E11.5–E15.5
GBX2 (posterior hindbrain vs. midbrain boundary), HOX genes (rhombomere patterning, e.g., Hoxa/b clusters), PAX3 (dorsal hindbrain), KROX20 (rhombomeres 3/5), FGF8 (isthmus), EN1/EN2 (overlapping midbrain), IRX genes, dynamic enhancers in cerebellar primordium.
Important chromosomes (examples of key genes):
Chr1: En1
Chr2: Gbx2
At E11.5: Rhombomere segmentation strong; floor plate and roof plate signaling.
Available data
Data has been downloaded and prepared for you from Mouse Development Matrix: Hindbrain
Students should download project data (group projects) from: https://multiomics-biological-integration-training.s3.eu-central-1.amazonaws.com/project3.tar.gz
Setup in R
After extracting, a directory project3 appears with the following content:
.
├── ATAC_11half.bw
├── ATAC_15half.bw
├── atac_se.rds
├── H3K27ac_11half.bw
├── H3K27ac_15half.bw
├── h3k27ac_se.rds
├── H3K27me3_11half.bw
├── H3K27me3_15half.bw
├── h3k27me3_se.rds
├── H3K4me1_11half.bw
├── H3K4me1_15half.bw
├── h3k4me1_se.rds
├── H3K4me3_11half.bw
├── H3K4me3_15half.bw
├── h3k4me3_se.rds
├── rna_se.rds
├── WGBS_11half.bed.gz
└── WGBS_15half.bed.gz
├─ rna_se.rds — SummarizedExperiment for RNA-seq (gene counts / normalized expression)
├─ atac_se.rds — SummarizedExperiment for ATAC-seq (peak counts / accessibility)
├─ h3k4me3_se.rds, h3k27ac_se.rds, h3k4me1_se.rds, h3k27me3_se.rds — SummarizedExperiments for histone marks (peak-level assays)
├─ *.bw (e.g., H3K4me3_11half.bw, ATAC_15half.bw) — bigWig signal tracks for visualization
├─ WGBS_11half.bed.gz, WGBS_15half.bed.gz — CpG-level methylation (columns typically: chr, start, end, methylation_prop, coverage); treat as observation assay
The provided assay objects are RangedSummarizedExperiment (assay names: “counts”, “logCPM”) with rowRanges as GRanges and colData including sample/group fields; differential analyses have already been performed and are reflected in the supplied objects. Students should therefore focus on interpretation and integrative analysis rather than re-running DE tests.
Day 1
-
Explore the dataset for the project:
- How many samples are there?
- Is the data properly normalized?
- How about a volcano plot to visualize the differences?
- How many chromosomes are present in the data?
-
ATAC-seq data
- How many peaks are there?
- How many differentially accessible peaks are there between E11.5 and E15.5?
- What is the median width of the ATAC peaks?
-
Normalizing and plotting
- Use correct normalization method for each dataset.
- Can you divide the
EnrichedHeatmapfor increse and decrease in accessibility regions? - How about also dividng
EnrichedHeatmapfor different annotations?
-
ChIP-seq data
- How many peaks are there for each histone mark?
- How many differentially enriched peaks are there between E11.5 and E15.5 for each mark?
- What is the median width of the peaks for each mark?
-
RNA-seq data
- How many genes are there?
- How many differentially expressed genes are there between E11.5 and E15.5?
-
Gene Expression Boxplots: For key hindbrain transcription factors such as
GBX2andKROX20, create boxplots showing their normalized expression levels across samples for E11.5 and E15.5. Analyze and interpret the observed expression changes.
Day 2
-
overlapMatrix- As done during the course exercise, make the
overlapMatrixfor your data. - Similar to Exercise 5, find different regions of activity in your data.
- As done during the course exercise, make the
-
Integrative analysis
- Can you identify enhancers in the data?
- Are there any repressed promoters?
- Can you identify regions with increased chromatin accessibility, increased H3K27ac and H3K4me3, increased gene expression and which are less than 2.kb of TSS? what would you call these regions?
Day 3
- Enrichment analysis
- Can you find any enriched GO terms in the Gained Enhancer regions in E15.5 compared to E11.5?
- Can you find any enriched GO terms in the Repressed Promoter regions in E15?
- What biological significant GO terms did you find in these regions? Do they make sense in the context of development?
Advanced analysis
- For En2, which category applies (e.g., Silent at E15.5)?
- Obtain
overlapMatrixfrom other brain region from your colleagues. - Merge
overlapMatricesfrom all three regions to compare co-occurring signals. - Generate shared vs. unique regulatory regions (e.g., Venn diagram of gained enhancers).
- Are there region-specific regulatory programs distinguishing forebrain/midbrain/hindbrain projects?