Reproducibility
Learning outcomes
After having completed this chapter you will be able to:
- Understand the importance of reproducibility
- Apply some basic rules to support reproducibilty in computational research
Material
Some good practices for reproducibility
During today and tomorrow we will work with a small E. coli dataset to practice quality control, alignment and alignment filtering. You can consider this as a small project. During the exercise you will be guided to adhere to the following basic principles for reproducibility:
- Execute the commands from a script in order to be able to trace back your steps
- Number scripts based on their order of execution (e.g.
01_download_reads.sh
) - Give your scripts a descriptive and active name, e.g.
06_build_bowtie_index.sh
- Make your scripts specific, i.e. do not combine many different commands in the same script
- Refer to directories and variables at the beginning of the script
By adhering to these simple principles it will be relatively straightforward to re-do your analysis steps only based on the scripts, and will get you started to adhere to the Ten Simple Rules for Reproducible Computational Research.
By the end of day 2 ~/project
should look (something) like this:
.
├── alignment_output
├── reads
├── ref_genome
├── scripts
│ ├── 01_download_reads.sh
│ ├── 02_run_fastqc.sh
│ ├── 03_trim_reads.sh
│ ├── 04_run_fastqc_trimmed.sh
│ ├── 05_download_ecoli_reference.sh
│ ├── 06_build_bowtie_index.sh
│ ├── 07_align_reads.sh
│ ├── 08_compress_sort.sh
│ ├── 09_extract_unmapped.sh
│ ├── 10_extract_region.sh
│ └── 11_align_sort.sh
└── trimmed_data