Exploring the future of bioinformatics data sharing and mining with Pygr and Worldbase

Abstract

Worldbase is a virtual namespace for scientific data sharing that can be accessed via from pygr import worldbase. Worldbase enables users to access, save and share complex datasets as easily as simply giving a specific name for a commonly-used dataset (e.g. Bio.Seq.Genome.HUMAN.hg17 for draft 17 of the human genome). Worldbase transparently takes care of all issues of how to access the dataset, what code must be imported to use it, what dependencies on other datasets it may have, and how to make use of its relations with other datasets as specified by its schema. Worldbase works with a wide variety of “back-end” storage, including data stored on local file systems, relational databases such as MySQL, remote services via XMLRPC, and “downloadable” resources that can be obtained from the network but automatically installed locally by Worldbase.