Proceedings of SciPy 2016

SciPy 2016, the 15th annual Scientific Computing with Python conference, was held July 11-17, 2016 in Austin, Texas. 20 peer reviewed articles were published in the conference proceedings.

Launching Python Applications on Peta-scale Massively Parallel Systems

We introduce a method to launch Python applications at near native speed on large high performance computing systems. The Python run-time and other dependencies are bundled and delivered to computing nodes via a broadcast operation.
Yu Feng, Nick Hand

An Ecological Approach to Software Supply Chain Risk Management

We approach the problem of software assurance in a novel way inspired by an analytic framework used in natural hazard risk mitigation. Existing approaches to software assurance focus on evaluating individual software projects in isolation.
Sebastian Benthall, Travis Pinney, JC Herz, +1

PySPH: a reproducible and high-performance framework for smoothed particle hydrodynamics

Smoothed Particle Hydrodynamics (SPH) is a general purpose technique to numerically compute the solutions to partial differential equations such as those used to simulate fluid and solid mechanics. The method is grid-free and uses particles to discretize the various properties of interest (such as density, fluid velocity, pressure etc.
Prabhu Ramachandran

Spreading the Adoption of Python in India: the FOSSEE Python Project

The FOSSEE (Free Open Source Software for Science and Engineering Education) project ( is funded by the Ministry of Human Resources and Development, MHRD, ( of the Government of India.
Prabhu Ramachandran

Validating Function Arguments in Python Signal Processing Applications

Python does not have a built-in mechanism to validate the value of function arguments. This can lead to nonsensical exceptions, unexpected behaviour, erroneous results and the like. In the present paper, we define the concept of so-called application-driven data types which place a layer of abstraction on top of Python data types.
Patrick Steffen Pedersen, Christian Schou Oxvig, Jan Østergaard, +1

MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations

MDAnalysis ( is a library for structural and temporal analysis of molecular dynamics (MD) simulation trajectories and individual protein structures. MD simulations of biological molecules have become an important tool to elucidate the relationship between molecular structure and physiological function.
Richard J. Gowers, Max Linke, Jonathan Barnoud, +8

Linting science prose and the science of prose linting

The craft of writing is hard despite the abundance of thoughtful advice available in usage guides and other sources. This is partly a problem of medium: amassing advice is not enough to improve writing.
Michael D. Pacer, Jordan W. Suchow

PyTeCK: a Python-based automatic testing package for chemical kinetic models

Combustion simulations require detailed chemical kinetic models to predict fuel oxidation, heat release, and pollutant emissions. These models are typically validated using qualitative rather than quantitative comparisons with limited sets of experimental data.
Kyle E. Niemeyer

Tell Me Something I Don't Know: Analyzing OkCupid Profiles

In this paper, we present an analysis of 59,000 OkCupid user profiles that examines online self-presentation by combining natural language processing (NLP) with machine learning. We analyze word usage patterns by self-reported sex and drug usage status.
Juan Shishido, Jaya Narasimhan, Matar Haller

The Climate Modelling Toolkit

The Climate Modelling Toolkit (CliMT) is a Python-based software component toolkit providing a flexible problem-solving environment for climate science problems. It aims to simplify the development of models of complexity 'appropriate' to the scientific question at hand.
Joy Merwin Monteiro, Rodrigo Caballero

MONTE Python for Deep Space Navigation

The Mission Analysis, Operations, and Navigation Toolkit Environment (MONTE) is the Jet Propulsion Laboratory's (JPL) signature astrodynamic computing platform. It was built to support JPL's deep space exploration program, and has been used to fly robotic spacecraft to Mars, Jupiter, Saturn, Ceres, and many solar system small bodies.
Jonathon Smith, William Taber, Theodore Drain, Scott Evans, James Evans, Michelle Guevara, William Schulze, Richard Sunseri, Hsi-Cheng Wu

Comparison of machine learning methods applied to birdsong element classification

Songbirds provide neuroscience with a model system for understanding how the brain learns and produces a motor skill similar to speech. Much like humans, songbirds learn their vocalizations from social interactions during a critical period in development.
David Nicholson

datreant: persistent, Pythonic trees for heterogeneous data

In science the filesystem often serves as a de facto database, with directory trees being the zeroth-order scientific data structure. But it can be tedious and error prone to work directly with the filesystem to retrieve and store heterogeneous datasets.
David L. Dotson, Sean L. Seyler, Max Linke, +2

Storing Reproducible Results from Computational Experiments using Scientific Python Packages

Computational methods have become a prime branch of modern science. Unfortunately, retractions of papers in high-ranked journals due to erroneous computations as well as a general lack of reproducibility of results have led to a so-called credibility crisis.
Christian Schou Oxvig, Thomas Arildsen, Torben Larsen

UConnRCMPy: Python-based data analysis for Rapid Compression Machines

The ignition delay of a fuel/air mixture is an important quantity in designing combustion devices, and these data are also used to validate computational kinetic models for combustion. One of the typical experimental devices used to measure the ignition delay is called a Rapid Compression Machine (RCM).
Bryan W. Weber, Chih-Jen Sung

cesium: Open-Source Platform for Time-Series Inference

Inference on time series data is a common requirement in many scientific disciplines and internet of things (IoT) applications, yet there are few resources available to domain scientists to easily, robustly, and repeatably build such complex inference workflows: traditional statistical models of time series are often too rigid to explain complex time domain behavior, while popular machine learning packages require already-featurized dataset inputs.
Brett Naul, Stéfan van der Walt, Arien Crellin-Quick, +2

Generalized earthquake classification

We characterize the source of an earthquake based on identifying the nodal lines of the radiation pattern it produces. These characteristics are the mode of failure of the rock (shear or tensile), the orientation of the fault plane and direction of slip.
Ben Lasscock

Composable Multi-Threading for Python Libraries

Python is popular among numeric communities that value it for easy to use number crunching modules like NumPy, SciPy, Dask, Numba, and many others. These modules often use multi-threading for efficient multi-core parallelism in order to utilize all the available CPU cores.
Anton Malakhov

Functional Uncertainty Constrained by Law and Experiment

Many physical processes are modeled by unspecified functions. Here, we introduce the F_UNCLE project which uses the Python ecosystem of scientific software to develop and explore techniques for estimating such unknown functions and our uncertainty about them.
Andrew M. Fraser, Stephen A. Andrews

Fitting Human Decision Making Models using Python

A topic of interest in experimental psychology and cognitive neuroscience is to understand how humans make decisions. A common approach involves using computational models to represent the decision making process, and use the model parameters to analyze brain imaging data.
Alejandro Weinstein, Wael El-Deredy, Stéren Chabert, +1