lpEdit: an editor to facilitate reproducible analysis via literate programming

Abstract

There is evidence to suggest that a surprising proportion of published experiments in science are difficult if not impossible to reproduce. The concepts of data sharing, leaving an audit trail and extensive documentation are fundamental to reproducible research, whether it is in the laboratory or as part of an analysis. In this work, we introduce a tool for documentation that aims to make analyses more reproducible in the general scientific community.

The application, lpEdit, is a cross-platform editor, written with PyQt4, that enables a broad range of scientists to carry out the analytic component of their work in a reproducible manner—through the use of literate programming. Literate programming mixes code and prose to produce a final report that reads like an article or book. lpEdit targets researchers getting started with statistics or programming, so the hurdles associated with setting up a proper pipeline are kept to a minimum and the learning burden is reduced through the use of templates and documentation. The documentation for lpEdit is centered around learning by example, and accordingly we use several increasingly involved examples to demonstrate the software’s capabilities.

We first consider applications of lpEdit to process analyses mixing R and Python code with the documentation system. Finally, we illustrate the use of lpEdit to conduct a reproducible functional analysis of high-throughput sequencing data, using the transcriptome of the butterfly species Pieris brassicae.

Keywords:reproducible researchtext editorRNA-seq