Project 3

Description

In mouse embryonic brain development at E11.5 (early neural tube patterning and initial regionalization) and E15.5 (mid-neurogenesis, progenitor expansion, and early differentiation), brain regionalization into forebrain (prosencephalon/telencephalon/diencephalon), midbrain (mesencephalon), and hindbrain (rhombencephalon) relies on key signaling centers (e.g., mid-hindbrain boundary/isthmus) and transcription factors (TFs). These TFs are encoded on various chromosomes, with no single chromosome dominating the process—development involves distributed genetic networks across the genome.

Hindbrain

Major TFs and genes at E11.5–E15.5

GBX2 (posterior hindbrain vs. midbrain boundary), HOX genes (rhombomere patterning, e.g., Hoxa/b clusters), PAX3 (dorsal hindbrain), KROX20 (rhombomeres 3/5), FGF8 (isthmus), EN1/EN2 (overlapping midbrain), IRX genes, dynamic enhancers in cerebellar primordium.

Important chromosomes (examples of key genes):

Chr1: En1
Chr2: Gbx2

At E11.5: Rhombomere segmentation strong; floor plate and roof plate signaling.

At E15.5: Cerebellar rhombic lip and ventricular zone progenitors active; many enhancers shared with midbrain/forebrain but some hindbrain-specific.

This project focuses on E11.5 and E15.5 hindbrain samples. WGBS should be treated as an observation-level assay (CpG-level methylation proportions with coverage) and represented as BSseq/GenomicRatioSet or a CpG BED with methylation and coverage prior to integration. Students will integrate RNA-seq, ATAC-seq, WGBS, and histone modification (H3K4me3, H3K27ac, H3K27me3, H3K4me1) data to identify regulatory programs and epigenomic changes during hindbrain development. Deliverables: an analysis report, figures highlighting differential expression/peaks/methylation, and a short presentation.

Available data

Data has been downloaded and prepared for you from Mouse Development Matrix: Hindbrain

Students should download project data (group projects) from: https://multiomics-biological-integration-training.s3.eu-central-1.amazonaws.com/project3.tar.gz

Setup in R

# Download and extract project3 data into a folder named 'project3'
url <- "https://multiomics-biological-integration-training.s3.eu-central-1.amazonaws.com/project3.tar.gz"
dest <- "project3.tar.gz"
download.file(url, dest, mode = "wb")
utils::untar(dest, exdir = "project3")

After extracting, a directory project3 appears with the following content:

.
├── ATAC_11half.bw
├── ATAC_15half.bw
├── atac_se.rds
├── H3K27ac_11half.bw
├── H3K27ac_15half.bw
├── h3k27ac_se.rds
├── H3K27me3_11half.bw
├── H3K27me3_15half.bw
├── h3k27me3_se.rds
├── H3K4me1_11half.bw
├── H3K4me1_15half.bw
├── h3k4me1_se.rds
├── H3K4me3_11half.bw
├── H3K4me3_15half.bw
├── h3k4me3_se.rds
├── rna_se.rds
├── WGBS_11half.bed.gz
└── WGBS_15half.bed.gz

├─ rna_se.rds — SummarizedExperiment for RNA-seq (gene counts / normalized expression)
├─ atac_se.rds — SummarizedExperiment for ATAC-seq (peak counts / accessibility)
├─ h3k4me3_se.rds, h3k27ac_se.rds, h3k4me1_se.rds, h3k27me3_se.rds — SummarizedExperiments for histone marks (peak-level assays)
├─ *.bw (e.g., H3K4me3_11half.bw, ATAC_15half.bw) — bigWig signal tracks for visualization
├─ WGBS_11half.bed.gz, WGBS_15half.bed.gz — CpG-level methylation (columns typically: chr, start, end, methylation_prop, coverage); treat as observation assay

The provided assay objects are RangedSummarizedExperiment (assay names: “counts”, “logCPM”) with rowRanges as GRanges and colData including sample/group fields; differential analyses have already been performed and are reflected in the supplied objects. Students should therefore focus on interpretation and integrative analysis rather than re-running DE tests.

Day 1

Explore the dataset for the project:
- How many samples are there?
- Is the data properly normalized?
- How about a volcano plot to visualize the differences?
- How many chromosomes are present in the data?
ATAC-seq data
- How many peaks are there?
- How many differentially accessible peaks are there between E11.5 and E15.5?
- What is the median width of the ATAC peaks?
Normalizing and plotting
- Use correct normalization method for each dataset.
- Can you divide the EnrichedHeatmap for increse and decrease in accessibility regions?
- How about also dividng EnrichedHeatmap for different annotations?
ChIP-seq data
- How many peaks are there for each histone mark?
- How many differentially enriched peaks are there between E11.5 and E15.5 for each mark?
- What is the median width of the peaks for each mark?
RNA-seq data
- How many genes are there?
- How many differentially expressed genes are there between E11.5 and E15.5?
- Gene Expression Boxplots: For key hindbrain transcription factors such as GBX2 and KROX20, create boxplots showing their normalized expression levels across samples for E11.5 and E15.5. Analyze and interpret the observed expression changes.

Day 2

overlapMatrix
- As done during the course exercise, make the overlapMatrix for your data.
- Similar to Exercise 5, find different regions of activity in your data.
Integrative analysis
- Can you identify enhancers in the data?
- Are there any repressed promoters?
- Can you identify regions with increased chromatin accessibility, increased H3K27ac and H3K4me3, increased gene expression and which are less than 2.kb of TSS? what would you call these regions?

Day 3

Enrichment analysis
- Can you find any enriched GO terms in the Gained Enhancer regions in E15.5 compared to E11.5?
- Can you find any enriched GO terms in the Repressed Promoter regions in E15?
- What biological significant GO terms did you find in these regions? Do they make sense in the context of development?

Advanced analysis

For En2, which category applies (e.g., Silent at E15.5)?
Obtain overlapMatrix from other brain region from your colleagues.
Merge overlapMatrices from all three regions to compare co-occurring signals.
Generate shared vs. unique regulatory regions (e.g., Venn diagram of gained enhancers).
Are there region-specific regulatory programs distinguishing forebrain/midbrain/hindbrain projects?