signac: Data Management and Workflows for Computational Researchers

Abstract

The signac data management framework (https://signac.io) helps researchers execute reproducible computational studies, scales workflows from laptops to supercomputers, and emphasizes portability and fast prototyping. With signac, users can track, search, and archive data and metadata for file-based workflows and automate workflow submission on high performance computing (HPC) clusters. We will discuss recent improvements to the software’s feature set, scalability, scientific applications, usability, and community. Newly implemented synced data structures, features for generalized workflow execution, and performance optimizations will be covered, as well as recent research using the framework and changes to the project’s outreach and governance as a response to its growth.

Keywords:data managementdata sciencedatabasesimulationcollaborationworkflowHPCreproducibility