Proceedings of SciPy 2015

SciPy 2015, the 14th annual Scientific Computing with Python conference, was held July 6-12, 2015 in Austin, Texas. 30 peer reviewed articles were published in the conference proceedings.

Building a Cloud Service for Reproducible Simulation Management

The notion of capturing each execution of a script and workflow and its associated metadata is enormously appealing and should be at the heart of any attempt to make scientific simulations repeatable and reproducible.
Faical Yannick Palingwende Congo

Visualizing physiological signals in real-time

This article presents an open-source Python software package, dubbed RTGraph, to visualize, process and record physiological signals (electrocardiography, electromyography, etc.) in real-time. RTGraph has a multiprocess architecture.
Sebastián Sepúlveda, Pablo Reyes, Alejandro Weinstein

Testing Generative Models of Online Collaboration with BigBang

We introduce BigBang, a new Python toolkit for analyzing online collaborative communities such as those that build open source software. Mailing lists serve as critical communications infrastructure for many communities, including several of the open source software development communities that build scientific Python packages.
Sebastian Benthall

Relation: The Missing Container

The humble mathematical relation, a fundamental (if implicit) component in computational algorithms, is conspicuously absent in most standard container collections, including Python’s. In this paper, we present the basics of a relation container, and why you might use it instead of other methods.
Scott James, James Larkin

Python in Data Science Research and Education

In this paper we demonstrate how Python can be used throughout the entire life cycle of a graduate program in Data Science. In interdisciplinary fields, such as Data Science, the students often come from a variety of different backgrounds where, for example, some students may have strong mathematical training but less experience in programming.
Randy Paffenroth, Xiangnan Kong

Qiita: report of progress towards an open access microbiome data analysis and visualization platform

Advances in sequencing, proteomics, transcriptomics and metabolomics are giving us new insights into the microbial world and dramatically improving our ability to understand microbial community composition and function at high resolution.
The Qiita Development Team

Geodynamic simulations in HPC with Python

The deformation of the Earth surface reflects the action of several forces that act inside the planet. To understand how the Earth surface evolves complex models must be built to reconcile observations with theoretical numerical simulations.
Nicola Creati, Roberto Vidmar, Paolo Sterzai

Causal Bayesian NetworkX

Probabilistic graphical models are useful tools for modeling systems governed by probabilistic structure. Bayesian networks are one class of probabilistic graphical model that have proven useful for characterizing both formal systems and for reasoning with those systems.
Michael D. Pacer

TrendVis: an Elegant Interface for dense, sparkline-like, quantitative visualizations of multiple series using matplotlib

TrendVis is a plotting package that uses matplotlib to create information-dense, sparkline-like, quantitative visualizations of multiple disparate data sets in a common plot area against a common variable.
Mellissa Cross

PySPLIT: a Package for the Generation, Analysis, and Visualization of HYSPLIT Air Parcel Trajectories

The National Oceanic and Atmospheric Administration (NOAA) Air Resources Laboratory's HYSPLIT (HYbrid Single Particle Lagrangian Transport) model Drax98, Drax97 uses a hybrid Langrangian and Eulerian calculation method to compute air parcel trajectories and particle dispersion and deposition simulations.
Mellissa Cross

Dask: Parallel Computation with Blocked algorithms and Task Scheduling

Dask enables parallel and out-of-core computation. We couple blocked algorithms with dynamic and memory aware task scheduling to achieve a parallel and out-of-core NumPy clone. We show how this extends the effective scale of modern hardware to larger datasets and discuss how these ideas can be more broadly applied to other parallel collections.
Matthew Rocklin

Widgets and Astropy: Accomplishing Productive Research with Undergraduates

This paper describes a tool for astronomical research implemented as an IPython notebook with a widget interface. The notebook uses Astropy, a community-developed package of fundamental tools for astronomy, and Astropy affiliated packages, as the back end.
Matthew Craig

pyDEM: Global Digital Elevation Model Analysis

Hydrological terrain analysis is important for applications such as environmental resource, agriculture, and flood risk management. It is based on processing of high-resolution, tiled digital elevation model (DEM) data for geographic regions of interest.
Mattheus P. Ueckermann, Robert D. Chambers, Christopher A. Brooks, +2

Signal Processing and Communications: Teaching and Research Using IPython Notebook

This paper will take the audience through the story of how an electrical and computer engineering faculty member has come to embrace Python, in particular IPython Notebook (IPython kernel for Jupyter), as an analysis and simulation tool for both teaching and research in signal processing and communications.
Mark Wickert

White Noise Test: detecting autocorrelation and nonstationarities in long time series after ARIMA modeling

Time series analysis has been a dominant technique for assessing relations within datasets collected over time and is becoming increasingly prevalent in the scientific community; for example, assessing brain networks by calculating pairwise correlations of time series generated from different areas of the brain.
Margaret Y Mahan, Chelley R Chorn, Apostolos P Georgopoulos

VisPy: Harnessing The GPU For Fast, High-Level Visualization

The growing availability of large, multidimensional data sets has created demand for high-performance, interactive visualization tools. VisPy leverages the GPU to provide fast, interactive, and beautiful visualizations in a high-level API.
Luke Campagnola, Almar Klein, Eric Larson, +2

PyRK: A Python Package For Nuclear Reactor Kinetics

In this work, a new python package, PyRK (Python for Reactor Kinetics), is introduced. PyRK has been designed to simulate, in zero dimensions, the transient, coupled, thermal-hydraulics and neutronics of time-dependent behavior in nuclear reactors.
Kathryn Huff

Automated Image Quality Monitoring with IQMon

Automated telescopes are capable of generating images more quickly than they can be inspected by a human, but detailed information on the performance of the telescope is valuable for monitoring and tuning of their operation.
Josh Walawender

Structural Cohesion: Visualization and Heuristics for Fast Computation with NetworkX and matplotlib

The structural cohesion model is a powerful sociological conception of cohesion in social groups, but its diffusion in empirical literature has been hampered by computational problems. We present useful heuristics for computing structural cohesion that allow a speed-up of one order of magnitude over the algorithms currently available.
Jordi Torrents, Fabrizio Ferraro

HoloViews: Building Complex Visualizations Easily for Reproducible Science

Scientific visualization typically requires large amounts of custom coding that obscures the underlying principles of the work and makes it difficult to reproduce the results. Here we describe how the new HoloViews Python package, when combined with the IPython Notebook and a plotting library, provides a rich, interactive interface for flexible and nearly code-free visualization of your results while storing a full record of the process for later reproduction.
Jean-Luc R. Stevens, Philipp Rudiger, James A. Bednar

Mesa: An Agent-Based Modeling Framework

Agent-based modeling is a computational methodology used in social science, biology, and other fields, which involves simulating the behavior and interaction of many autonomous entities, or agents, over time.
David Masad, Jacqueline Kazil

Circumventing The Linker: Using SciPy's BLAS and LAPACK Within Cython

BLAS, LAPACK, and other libraries like them have formed the underpinnings of much of the scientific stack in Python. Until now, the standard practice in many packages for using BLAS and LAPACK has been to link each Python extension directly against the libraries needed.
Ian Henriksen

The James Webb Space Telescope Data Calibration Pipeline

The James Webb Space Telescope (JWST) is the successor to the Hubble Space Telescope (HST) and is currently expected to be launched in late 2018. The Space Telescope Science Institute (STScI) is developing the software systems that will be used to provide routine calibration of the science data received from JWST.
Howard Bushouse, Michael Droettboom, Perry Greenfield

Creating a Real-Time Recommendation Engine using Modified K-Means Clustering and Remote Sensing Signature Matching Algorithms

Built on Google App Engine (GAE), RealMassive encountered challenges while attempting to scale its recommendation engine to match its nationwide, multi-market expansion. To address this problem, we borrowed a conceptual model from spectral data processing to transform our domain-specific problem into one that the GAE's search engine could solve.
David Lippa, Jason Vertrees

Scientific Data Analysis and Visualization with Python, VTK, and ParaView

VTK and ParaView are leading software packages for data analysis and visualization. Since their early years, Python has played an important role in each package. In many use cases, VTK and ParaView serve as modules used by Python applications.
Cory Quammen

PyEDA: Data Structures and Algorithms for Electronic Design Automation

This paper introduces PyEDA, a Python library for electronic design automation (EDA). PyEDA provides both a high level interface to the representation of Boolean functions, and blazingly-fast C extensions for fundamental algorithms where performance is essential.
Chris Drake

librosa: Audio and Music Signal Analysis in Python

This document describes version 0.4.0 of librosa: a Python package for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information retrieval.
Brian McFee, Colin Raffel, Dawen Liang, +4

Python as a First Programming Language for Biomedical Scientists

We have been involved with teaching Python to biomedical scientists since 2005. In all, seven courses have been taught: 5 at the University of Pittsburgh, as a required course for biomedical informatics graduate students.
Brian E. Chapman, Ph.D., Jeannie Irwin, Ph.D.

pgmpy: Probabilistic Graphical Models using Python

Probabilistic Graphical Models (PGM) is a technique of compactly representing a joint distribution by exploiting dependencies between the random variables. It also allows us to do inference on joint distributions in a computationally cheaper way than the traditional methods.
Ankur Ankan, Abinash Panda

Will Millennials Ever Get Married?

Using data from the National Survey of Family Growth (NSFG), we investigate marriage patterns among women in the United States We describe and predict age at first marriage for successive generations based on decade of birth.
Allen B. Downey