{"DOI":"10.5281/zenodo.5570116","abstract":"Reproducibility is one of the cornerstones of scientific research. However, reproducibility has been a longtime challenge across many scientific fields [1-3]. These difficulties arise from complexities in experimental and bioinformatic workflows that diverge over time, across different operators, and often with limited versioning [4-6]. In the field of genomics, collections of massive datasets that can be parsed in many ways has added to the reproducibility challenge [7-11]. What is needed is systematic metadata capture and management software that is tailored to (epi)genomic data collection.
In general, a genomic project is composed of two distinct but interrelated components: 'wet-bench' biochemistry experiments and 'dry-bench' bioinformatic analysis. In wet-bench experiments: sample type (human tissue biopsy, yeast, etc.), reagents (catalogue number, wash buffer recipe, etc.), growth environment (log growth, % confluence, etc.), and experimental protocols (ChIP-seq, Western blot, etc.) are examples of critical metadata that need to be captured. Minor variations in these experimental components can result in distinct experimental outcomes [12, 13]. Confounding these issues is the traditional reliance on storing experiment metadata in hand-written notebooks, which are not searchable and often incomprehensible to a third party [14]. Consequently, it can be difficult to follow and accurately reproduce an experimental protocol from start to finish.
Similarly, in bioinformatics analysis, different analytical tools, software versions, and tool parameters may generate different analytical outcomes. While progress has been made in tracking and reproducing informatic workflows (e.g., Pegasus, Galaxy), these platforms are generally limited to reproducing software workflows [15, 16]. To our knowledge, there are no free open-source platforms that manage entire experimental pipelines, from wet-bench experiments to bioinformatic analyses. Laboratory information management systems (LIMS) typically focus on inventory m [...]","author":[{"family":"Shao","given":"Danying"},{"family":"Kellogg","given":"Gretta"},{"family":"Nematbakhsh","given":"Ali"},{"family":"Kuntala","given":"Prashant"},{"family":"Mahony","given":"Shaun"},{"family":"Pugh","given":"B Frank"},{"family":"Lai","given":"William"}],"id":"unknown","issued":{"date-parts":[[2021,10,14]]},"publisher":"Zenodo","title":"Platform for EpiGenomic Research (PEGR): A flexible management platform for reproducible epigenomic and genomic research","type":"article-journal"}