datreant: persistent, Pythonic trees for heterogeneous data

Abstract

In science the filesystem often serves as a de facto database, with directory trees being the zeroth-order scientific data structure. But it can be tedious and error prone to work directly with the filesystem to retrieve and store heterogeneous datasets. datreant makes working with directory structures and files Pythonic with Treants: specially marked directories with distinguishing characteristics that can be discovered, queried, and filtered. Treants can be manipulated individually and in aggregate, with mechanisms for granular access to the directories and files in their trees. Disparate datasets stored in any format (CSV, HDF5, NetCDF, Feather, etc.) scattered throughout a filesystem can thus be manipulated as meta-datasets of Treants. datreant is modular and extensible by design to allow specialized applications to be built on top of it, with `MDSynthesis as an example for working with molecular dynamics simulation data. http://datreant.org/

Keywords:data managementsciencefilesystems