Proceedings of SciPy 2018

SciPy 2018, the 17th annual Scientific Computing with Python conference, was held July 9-15, 2018 in Austin, Texas. 24 peer reviewed articles were published in the conference proceedings.

Yaksh: Facilitating Learning by Doing

Yaksh is a free and open-source online evaluation platform. At its core, Yaksh focuses on problem-based learning and lets teachers create practice exercises and quizzes which are evaluated in real-time.
Prabhu Ramachandran, Prathamesh Salunke, Ankit Javalkar, +3

signac: A Python framework for data and workflow management

Computational research requires versatile data and workflow management tools that can easily adapt to the highly dynamic requirements of scientific investigations. Many existing tools require strict adherence to a particular usage pattern, so researchers often use less robust ad hoc solutions that they find easier to adopt.
Vyas Ramasubramani, Carl S. Adorf, Paul M. Dodd, +2

Scalable Feature Extraction with Aerial and Satellite Imagery

Deep learning techniques have greatly advanced the performance of the already rapidly developing field of computer vision, which powers a variety of emerging technologies—from facial recognition to augmented reality to self-driving cars.
Virginia Ng, Daniel Hofmann

A Bayesian’s journey to a better research workflow

This work began when the two authors met at a software development meeting. Konstantinos was building Bayesian models in his research and wanted to learn how to better manage his research process. Marianne was working on data analysis workflows in industry and wanted to learn more about Bayesian statistics.
Konstantinos Vamvourellis, Marianne Corvellec

Design and Implementation of pyPRISM: A Polymer Liquid-State Theory Framework

In this work, we describe the code structure, implementation, and usage of a Python-based, open-source framework, pyPRISM, for conducting polymer liquid-state theory calculations. Polymer Reference Interaction Site Model (PRISM) theory describes the equilibrium spatial-correlations, thermodynamics, and structure of liquid-like polymer systems and macromolecular materials.
Tyler B. Martin, Thomas E. Gartner III, Ronald L. Jones, +2

Spatio-temporal analysis of socioeconomic neighborhoods: The Open Source Longitudinal Neighborhood Analysis Package (OSLNAP)

The neighborhood effects literature represents a wide span of the social sciences broadly concerned with the influence of spatial context on social processes. From the study of segregation dynamics, the relationships between the built environment and health outcomes, to the impact of concentrated poverty on social efficacy, neighborhoods are a central construct in empirical work.
Sergio Rey, Elijah Knaap, Su Han, +2

Binder 2.0 - Reproducible, interactive, sharable environments for science at scale

Binder is an open source web service that lets users create sharable, interactive, reproducible environments in the cloud. It is powered by other core projects in the open source ecosystem, including JupyterHub and Kubernetes for managing cloud resources.
Project Jupyter, Matthias Bussonnier, Jessica Forde, +12

Harnessing the Power of Scientific Python to Investigate Biogeochemistry and Metaproteomes of the Central Pacific Ocean

Oceanographic expeditions commonly generate millions of data points for various chemical, biological, and physical features, all in different formats. Scientific Python tools are extremely useful for synthesizing this data to make sense of major trends in the changing ocean environment.
Noelle A. Held, Jaclyn K. Saunders, Joe Futrelle, +1

Organic Molecules in Space: Insights from the NASA Ames Molecular Database in the era of the James Webb Space Telescope

We present the software tool pyPAHdb to the scientific astronomical community, which is used to characterize emission from one of the most prevalent types of organic molecules in space, namely polycyclic aromatic hydrocarbons (PAHs).
Matthew J. Shannon, Christiaan Boersma

Real-Time Digital Signal Processing Using pyaudio_helper and the ipywidgets

The focus of this paper is on teaching real-time digital signal processing to electrical and computer engineers using the Jupyter notebook and the code module `pyaudio_helper`, which is a component of the package scikit-dsp-comm.
Mark Wickert

Exploring the Extended Kalman Filter for GPS Positioning Using Simulated User and Satellite Track Data

This paper describes a Python computational tool for exploring the use of the extended Kalman filter (EKF) for position estimation using the Global Positioning System (GPS) pseudorange measurements. The development was motivated by the need for an example generator in a training class on Kalman filtering, with emphasis on GPS.
Mark Wickert, Chiranth Siddappa

WrightSim: Using PyCUDA to Simulate Multidimensional Spectra

Nonlinear multidimensional spectroscopy (MDS) is a powerful experimental technique used to interrogate complex chemical systems. MDS promises to reveal energetics, dynamics, and coupling features of and between the many quantum-mechanical states that these systems contain.
Kyle F Sunden, Blaise J Thompson, John C Wright

Bringing ipywidgets Support to

Plotly.js is a declarative JavaScript data visualization library built on D3 and WebGL that supports a wide range of statistical, scientific, financial, geographic, and 3-dimensional visualizations. Support for creating Plotly.
Jon Mease

Sparse: A more modern sparse array library

This paper is about sparse multi-dimensional arrays in Python. We discuss their applications, layouts, and current implementations in the SciPy ecosystem along with strengths and weaknesses. We then introduce a new package for sparse arrays that builds on the legacy of the scipy.
Hameer Abbasi

Text and data mining scientific articles with allofplos

Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use.
Elizabeth Seiver, M Pacer, Sebastian Bassi

Safe handling instructions for missing data

In machine learning tasks, it is common to handle missing data by removing observations with missing values, or replacing missing data with the mean value for its feature. To show why this is problematic, we use listwise deletion and mean imputing to recover missing values from artificially created datasets, and we compare those models against ones with full information.
Dillon Niederhut

EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks

Building environmental simulation workflows is typically a slow process involving multiple proprietary desktop tools that do not interoperate well. In this work, we demonstrate building flexible, lightweight workflows entirely in Jupyter notebooks.
Dharhas Pothina, Philipp J. F. Rudiger, James A Bednar, +5

Practical Applications of Astropy

Packages developed under the auspices of the Astropy Project (astropy2013, astropy2018) address many common problems faced by astronomers in their computational projects. In this paper we describe how capabilities provided by Astropy have been employed in two current projects.
David Shupe, Frank Masci, Russ Laher, +2

Developing a Start-to-Finish Pipeline for Accelerometer-Based Activity Recognition Using Long Short-Term Memory Recurrent Neural Networks

Increased prevalence of smartphones and wearable devices has facilitated the collection of triaxial accelerometer data for numerous Human Activity Recognition (HAR) tasks. Concurrently, advances in the theory and implementation of long short-term memory (LSTM) recurrent neural networks (RNNs) has made it possible to process this data in its raw form, enabling on-device online analysis.
Christian McDaniel, Shannon Quinn

The Econ-ARK and HARK: Open Source Tools for Computational Economics

The Economics Algorithmic Repository and toolKit (Econ-ARK) aims to become a focal resource for computational economics. Its first ‘framework,’ the Heterogeneous Agent Resources and Toolkit (HARK), provides a modern, robust, transparent set of tools to solve a class of macroeconomic models whose usefulness has become increasingly apparent both for economic policy and for research purposes, but whose adoption has been limited because the existing literature derives from idiosyncratic, hand-crafted, and often impenetrable legacy code.
Christopher D. Carroll, Alexander M. Kaufman, Jacqueline L. Kazil, +2

Composable Multi-Threading and Multi-Processing for Numeric Libraries

Python is popular among scientific communities that value its simplicity and power, especially as it comes along with numeric libraries such as NumPy, SciPy, Dask, and Numba. As CPU core counts keep increasing, these modules can make use of many cores via multi-threading for efficient multi-core parallelism.
Anton Malakhov, David Liu, Anton Gorshkov, +1

Equity, Scalability, and Sustainability of Data Science Infrastructure

We seek to understand the current state of equity, scalability, and sustainability of data science education infrastructure in both the U.S. and Canada. Our analysis of the technological, funding, and organizational structure of four types of institutions shows an increasing divergence in the ability of universities across the United States to provide students with accessible data science education infrastructure, primarily JupyterHub.
Anthony Suen, Laura Norén, Alan Liang, +1

Dynamic Social Network Modeling of Diffuse Subcellular Morphologies

The use of fluorescence microscopy has catalyzed new insights into biological function, and spurred the development of quantitative models from rich biomedical image datasets. While image processing in some capacity is commonplace for extracting and modeling quantitative knowledge from biological systems at varying scales, general-purpose approaches for more advanced modeling are few.
Andrew Durden, Allyson T Loy, Barbara Reaves, +5

Cloudknot: A Python Library to Run your Existing Code on AWS Batch

We introduce Cloudknot, a software library that simplifies cloud-based distributed computing by programmatically executing user-defined functions (UDFs) in AWS Batch. It takes as input a Python function, packages it as a container, creates all the necessary AWS constituent resources to submit jobs, monitors their execution and gathers the results, all from within the Python environment.
Adam Richie-Halford, Ariel Rokem