Contents
Proceedings of SciPy 2018
SciPy 2018, the 17th annual Scientific Computing with Python conference, was held July 9-15, 2018 in Austin, Texas. 24 peer reviewed articles were published in the conference proceedings. Full proceedings and organizing committee can be found at https://
Yaksh: Facilitating Learning by Doing
Yaksh: Facilitating Learning by Doing
Yaksh is a free and open-source online evaluation platform. At its core, Yaksh focuses on problem-based learning and lets teachers create practice exercises and quizzes which are evaluated in real-time.
Prabhu Ramachandran, Prathamesh Salunke, Ankit Javalkar, +3
https://doi.org/10.25080/Majora-4af1f417-017
signac: A Python framework for data and workflow management
signac: A Python framework for data and workflow management
Computational research requires versatile data and workflow management tools that can easily adapt to the highly dynamic requirements of scientific investigations. Many existing tools require strict adherence to a particular usage pattern, so researchers often use less robust ad hoc solutions that they find easier to adopt.
Vyas Ramasubramani, Carl S. Adorf, Paul M. Dodd, +2
https://doi.org/10.25080/Majora-4af1f417-016
Scalable Feature Extraction with Aerial and Satellite Imagery
Scalable Feature Extraction with Aerial and Satellite Imagery
Deep learning techniques have greatly advanced the performance of the already rapidly developing field of computer vision, which powers a variety of emerging technologies—from facial recognition to augmented reality to self-driving cars.
Virginia Ng, Daniel Hofmann
https://doi.org/10.25080/Majora-4af1f417-015
A Bayesian’s journey to a better research workflow
A Bayesian’s journey to a better research workflow
This work began when the two authors met at a software development meeting. Konstantinos was building Bayesian models in his research and wanted to learn how to better manage his research process. Marianne was working on data analysis workflows in industry and wanted to learn more about Bayesian statistics.
Konstantinos Vamvourellis, Marianne Corvellec
https://doi.org/10.25080/Majora-4af1f417-014
Design and Implementation of pyPRISM: A Polymer Liquid-State Theory Framework
Design and Implementation of pyPRISM: A Polymer Liquid-State Theory Framework
In this work, we describe the code structure, implementation, and usage of a Python-based, open-source framework, pyPRISM, for conducting polymer liquid-state theory calculations. Polymer Reference Interaction Site Model (PRISM) theory describes the equilibrium spatial-correlations, thermodynamics, and structure of liquid-like polymer systems and macromolecular materials.
Tyler B. Martin, Thomas E. Gartner III, Ronald L. Jones, +2
https://doi.org/10.25080/Majora-4af1f417-013
Spatio-temporal analysis of socioeconomic neighborhoods: The Open Source Longitudinal Neighborhood Analysis Package (OSLNAP)
Spatio-temporal analysis of socioeconomic neighborhoods: The Open Source Longitudinal Neighborhood Analysis Package (OSLNAP)
The neighborhood effects literature represents a wide span of the social sciences broadly concerned with the influence of spatial context on social processes. From the study of segregation dynamics, the relationships between the built environment and health outcomes, to the impact of concentrated poverty on social efficacy, neighborhoods are a central construct in empirical work.
Sergio Rey, Elijah Knaap, Su Han, +2
https://doi.org/10.25080/Majora-4af1f417-012
Binder 2.0 - Reproducible, interactive, sharable environments for science at scale
Binder 2.0 - Reproducible, interactive, sharable environments for science at scale
Binder is an open source web service that lets users create sharable, interactive, reproducible environments in the cloud. It is powered by other core projects in the open source ecosystem, including JupyterHub and Kubernetes for managing cloud resources.
Project Jupyter, Matthias Bussonnier, Jessica Forde, +12
https://doi.org/10.25080/Majora-4af1f417-011
Harnessing the Power of Scientific Python to Investigate Biogeochemistry and Metaproteomes of the Central Pacific Ocean
Harnessing the Power of Scientific Python to Investigate Biogeochemistry and Metaproteomes of the Central Pacific Ocean
Oceanographic expeditions commonly generate millions of data points for various chemical, biological, and physical features, all in different formats. Scientific Python tools are extremely useful for synthesizing this data to make sense of major trends in the changing ocean environment.
Noelle A. Held, Jaclyn K. Saunders, Joe Futrelle, +1
https://doi.org/10.25080/Majora-4af1f417-010
Organic Molecules in Space: Insights from the NASA Ames Molecular Database in the era of the James Webb Space Telescope
Organic Molecules in Space: Insights from the NASA Ames Molecular Database in the era of the James Webb Space Telescope
We present the software tool pyPAHdb to the scientific astronomical community, which is used to characterize emission from one of the most prevalent types of organic molecules in space, namely polycyclic aromatic hydrocarbons (PAHs).
Matthew J. Shannon, Christiaan Boersma
https://doi.org/10.25080/Majora-4af1f417-00f
Real-Time Digital Signal Processing Using pyaudio_helper and the ipywidgets
Real-Time Digital Signal Processing Using pyaudio_helper and the ipywidgets
The focus of this paper is on teaching real-time digital signal processing to electrical and computer engineers using the Jupyter notebook and the code module `pyaudio_helper`, which is a component of the package scikit-dsp-comm.
Mark Wickert
https://doi.org/10.25080/Majora-4af1f417-00e
Exploring the Extended Kalman Filter for GPS Positioning Using Simulated User and Satellite Track Data
Exploring the Extended Kalman Filter for GPS Positioning Using Simulated User and Satellite Track Data
This paper describes a Python computational tool for exploring the use of the extended Kalman filter (EKF) for position estimation using the Global Positioning System (GPS) pseudorange measurements. The development was motivated by the need for an example generator in a training class on Kalman filtering, with emphasis on GPS.
Mark Wickert, Chiranth Siddappa
https://doi.org/10.25080/Majora-4af1f417-00d
WrightSim: Using PyCUDA to Simulate Multidimensional Spectra
WrightSim: Using PyCUDA to Simulate Multidimensional Spectra
Nonlinear multidimensional spectroscopy (MDS) is a powerful experimental technique used to interrogate complex chemical systems. MDS promises to reveal energetics, dynamics, and coupling features of and between the many quantum-mechanical states that these systems contain.
Kyle F Sunden, Blaise J Thompson, John C Wright
https://doi.org/10.25080/Majora-4af1f417-00c
Bringing ipywidgets Support to plotly.py
Bringing ipywidgets Support to plotly.py
Plotly.js is a declarative JavaScript data visualization library built on D3 and WebGL that supports a wide range of statistical, scientific, financial, geographic, and 3-dimensional visualizations. Support for creating Plotly.
Jon Mease
https://doi.org/10.25080/Majora-4af1f417-00b
Sparse: A more modern sparse array library
Sparse: A more modern sparse array library
This paper is about sparse multi-dimensional arrays in Python. We discuss their applications, layouts, and current implementations in the SciPy ecosystem along with strengths and weaknesses. We then introduce a new package for sparse arrays that builds on the legacy of the scipy.
Hameer Abbasi
https://doi.org/10.25080/Majora-4af1f417-00a
Text and data mining scientific articles with allofplos
Text and data mining scientific articles with allofplos
Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use.
Elizabeth Seiver, M Pacer, Sebastian Bassi
https://doi.org/10.25080/Majora-4af1f417-009
Safe handling instructions for missing data
Safe handling instructions for missing data
In machine learning tasks, it is common to handle missing data by removing observations with missing values, or replacing missing data with the mean value for its feature. To show why this is problematic, we use listwise deletion and mean imputing to recover missing values from artificially created datasets, and we compare those models against ones with full information.
Dillon Niederhut
https://doi.org/10.25080/Majora-4af1f417-008
EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks
EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks
Building environmental simulation workflows is typically a slow process involving multiple proprietary desktop tools that do not interoperate well. In this work, we demonstrate building flexible, lightweight workflows entirely in Jupyter notebooks.
Dharhas Pothina, Philipp J. F. Rudiger, James A Bednar, +5
https://doi.org/10.25080/Majora-4af1f417-007
Practical Applications of Astropy
Practical Applications of Astropy
Packages developed under the auspices of the Astropy Project (astropy2013, astropy2018) address many common problems faced by astronomers in their computational projects. In this paper we describe how capabilities provided by Astropy have been employed in two current projects.
David Shupe, Frank Masci, Russ Laher, +2
https://doi.org/10.25080/Majora-4af1f417-006
Developing a Start-to-Finish Pipeline for Accelerometer-Based Activity Recognition Using Long Short-Term Memory Recurrent Neural Networks
Developing a Start-to-Finish Pipeline for Accelerometer-Based Activity Recognition Using Long Short-Term Memory Recurrent Neural Networks
Increased prevalence of smartphones and wearable devices has facilitated the collection of triaxial accelerometer data for numerous Human Activity Recognition (HAR) tasks. Concurrently, advances in the theory and implementation of long short-term memory (LSTM) recurrent neural networks (RNNs) has made it possible to process this data in its raw form, enabling on-device online analysis.
Christian McDaniel, Shannon Quinn
https://doi.org/10.25080/Majora-4af1f417-005
The Econ-ARK and HARK: Open Source Tools for Computational Economics
The Econ-ARK and HARK: Open Source Tools for Computational Economics
The Economics Algorithmic Repository and toolKit (Econ-ARK) aims to become a focal resource for computational economics. Its first ‘framework,’ the Heterogeneous Agent Resources and Toolkit (HARK), provides a modern, robust, transparent set of tools to solve a class of macroeconomic models whose usefulness has become increasingly apparent both for economic policy and for research purposes, but whose adoption has been limited because the existing literature derives from idiosyncratic, hand-crafted, and often impenetrable legacy code.
Christopher D. Carroll, Alexander M. Kaufman, Jacqueline L. Kazil, +2
https://doi.org/10.25080/Majora-4af1f417-004
Composable Multi-Threading and Multi-Processing for Numeric Libraries
Composable Multi-Threading and Multi-Processing for Numeric Libraries
Python is popular among scientific communities that value its simplicity and power, especially as it comes along with numeric libraries such as NumPy, SciPy, Dask, and Numba. As CPU core counts keep increasing, these modules can make use of many cores via multi-threading for efficient multi-core parallelism.
Anton Malakhov, David Liu, Anton Gorshkov, +1
https://doi.org/10.25080/Majora-4af1f417-003
Equity, Scalability, and Sustainability of Data Science Infrastructure
Equity, Scalability, and Sustainability of Data Science Infrastructure
We seek to understand the current state of equity, scalability, and sustainability of data science education infrastructure in both the U.S. and Canada. Our analysis of the technological, funding, and organizational structure of four types of institutions shows an increasing divergence in the ability of universities across the United States to provide students with accessible data science education infrastructure, primarily JupyterHub.
Anthony Suen, Laura Norén, Alan Liang, +1
https://doi.org/10.25080/Majora-4af1f417-002
Dynamic Social Network Modeling of Diffuse Subcellular Morphologies
Dynamic Social Network Modeling of Diffuse Subcellular Morphologies
The use of fluorescence microscopy has catalyzed new insights into biological function, and spurred the development of quantitative models from rich biomedical image datasets. While image processing in some capacity is commonplace for extracting and modeling quantitative knowledge from biological systems at varying scales, general-purpose approaches for more advanced modeling are few.
Andrew Durden, Allyson T Loy, Barbara Reaves, +5
https://doi.org/10.25080/Majora-4af1f417-000
Cloudknot: A Python Library to Run your Existing Code on AWS Batch
Cloudknot: A Python Library to Run your Existing Code on AWS Batch
We introduce Cloudknot, a software library that simplifies cloud-based distributed computing by programmatically executing user-defined functions (UDFs) in AWS Batch. It takes as input a Python function, packages it as a container, creates all the necessary AWS constituent resources to submit jobs, monitors their execution and gathers the results, all from within the Python environment.
Adam Richie-Halford, Ariel Rokem
https://doi.org/10.25080/Majora-4af1f417-001