Quality Control Post-Alignment
Overview
After filtering our BAM files, we need to assess the quality of our ATAC-seq experiment using specialized metrics.
Learning Objectives
- Understand key ATAC-seq quality metrics
- Use ATAQV to generate comprehensive QC reports
- Interpret QC results to assess experiment quality
In order to assess the quality of the ATAC-seq experiment, there are several metrics that can be calculated from the aligned reads.
1. Key ATAC-seq Quality Metrics
Fragment size distribution: The distribution of fragment sizes can indicate the quality of the library preparation. A good ATAC-seq library should have a characteristic pattern with peaks corresponding to nucleosome-free regions (NFRs) and mono-, di-, and tri-nucleosomes.
TSS enrichment: The enrichment of reads at transcription start sites (TSS) is a key indicator of data quality. High-quality ATAC-seq data should show a strong enrichment of reads at TSSs.
2. ATAQV Tool
ATAQV is a specialized tool that calculates a variety of QC metrics for ATAC-seq data. It provides a comprehensive report that includes fragment size distribution, TSS enrichment, and other important metrics.
Task 1: Generate individual QC reports with ATAQV
- Create a new folder for QC results:
results/02_QC_post_alignment - Run ATAQV on each filtered BAM file to generate individual QC metrics
- Each sample will produce its own detailed QC report
Ataqv
You can find further information about ATAQC tool and commands in the Usage section here.
Ataqv needs a Transcription Start Site (TSS) reference file to compute TSS enrichment score. You will find this file in: /data/references/ENCODE_mm10_M21_TSS_reference.bed.
Source: TSS reference downloaded from ENCODE Project
Hint
Run ataqv for each *qc_bl_filt.sorted.bam file
Key parameters needed:
- --tss-file: Path to TSS reference file
- --metrics-file: Output JSON file path
- --name: Sample name
- mouse: Genome reference
Solution
mkdir -p results/02_QC_post_alignment
TSS_bed="/data/references/ENCODE_mm10_M21_TSS_reference.bed"
for sample_name in "${samples[@]}"; do
echo "Processing sample: $sample_name"
# run command
ataqv --name $sample_name --metrics-file results/02_QC_post_alignment/$sample_name.ataqv.json --tss-file $TSS_bed mouse $path_bams/$sample_name.qc_bl_filt.sorted.bam > results/02_QC_post_alignment/$sample_name.ataqv.out
done
--name: Sample name for output files-
--metrics-file: Output file for metrics in JSON format-
--tss-file: BED file with TSS locations-
mouse: Genome reference (mouse or human)-
$bam: Input BAM file
Task 2: Create multi-sample summary report
Compile all individual QC reports into a comprehensive summary. Use the mkarv tool from ATAQV to combine all JSON files into an interactive HTML report
Hint
Use mkarv command to process all *.json files
- Specify output directory name for the HTML report
- Include all JSON files with wildcard pattern
Solution
# Create summary report from all JSON files without changing directories
mkarv results/02_QC_post_alignment/summary_ataqv results/02_QC_post_alignment/*.ataqv.json
What this does:
- mkarv: Tool to create interactive HTML summary report.
- results/02_QC_post_alignment/summary_ataqv: Output directory for the HTML report.
- results/02_QC_post_alignment/*.ataqv.json: Include all JSON files using full path.
3. Interpret QC Results
Open the QC report located at: results/02_QC_post_alignment/summary_ataqv/index.html
Task 3: Assess the quality of ATAC-seq experiment
- Overall Assessment: Do you think the experiment worked well?
- Fragment Size Distribution: Do you see the expected nucleosome pattern?
- TSS Enrichment: Are the TSS enrichment scores acceptable?
- Concerning Metrics: Are there any metrics that would concern you?