Proceedings of SciPy 2019

SciPy 2019, the 18th annual Scientific Computing with Python conference, was held July 8-14, 2019 in Austin, Texas. 20 peer reviewed articles were published in the conference proceedings. Full proceedings, posters and slides, and organizing committee can be found at https://proceedings.scipy.org/articles/Majora-7ddc1dd1-026.

PMDA - Parallel Molecular Dynamics Analysis

MDAnalysis is an object-oriented Python library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats. With the development of highly optimized MD software packages on high performance computing (HPC) resources, the size of simulation trajectories is growing up to many terabytes in size.
Shujie Fan, Max Linke, Ioannis Paraskevakos, +3
https://doi.org/10.25080/Majora-7ddc1dd1-013

Visualization of Bioinformatics Data with Dash Bio

Plotly's Dash is a library that empowers data scientists to create interactive web applications declaratively in Python. Dash Bio is a bioinformatics-oriented suite of components that are compatible with Dash.
Shammamah Hossain
https://doi.org/10.25080/Majora-7ddc1dd1-012

Better and faster hyperparameter optimization with Dask

Nearly every machine learning model requires hyperparameters, parameters that the user must specify before training begins and influence model performance. Finding the optimal set of hyperparameters is often a time- and resource-consuming process.
Scott Sievert, Tom Augspurger, Matthew Rocklin
https://doi.org/10.25080/Majora-7ddc1dd1-011

PyDDA: A new Pythonic Wind Retrieval Package

PyDDA is a new community framework aimed at wind retrievals that depends only upon utilities in the SciPy ecosystem such as scipy, numpy, and dask. It can support retrievals of winds using information from weather radar networks constrained by high resolution forecast models over grids that cover thousands of kilometers at kilometer-scale resolution.
Robert Jackson, Scott Collis, Timothy Lang, +2
https://doi.org/10.25080/Majora-7ddc1dd1-010

Parkinson's Classification and Feature Extraction from Diffusion Tensor Images

Parkinson’s disease (PD) affects over 6.2 million people around the world. Despite its prevalence, there is still no cure, and diagnostic methods are extremely subjective, relying on observation of physical motor symptoms and response to treatment protocols.
Rajeswari Sivakumar, Shannon Quinn
https://doi.org/10.25080/Majora-7ddc1dd1-00f

PyLZJD: An Easy to Use Tool for Machine Learning

As Machine Learning (ML) becomes more widely known and popular, so too does the desire for new users from other backgrounds to apply ML techniques to their own domains. A difficult prerequisite that often confounds new users is the feature creation and engineering process.
Edward Raff, Joe Aurelio, Charles Nicholas
https://doi.org/10.25080/Majora-7ddc1dd1-00e

Parameter Estimation Using the Python Package pymcmcstat

A Bayesian approach to solving inverse problems provides insight regarding model limitations as well as the underlying model and observation uncertainty. In this paper we introduce pymcmcstat, which provides a wide variety of tools for estimating unknown parameter distributions.
Paul R. Miles, Ralph C. Smith
https://doi.org/10.25080/Majora-7ddc1dd1-00d

An intelligent shopping list based on the application of partitioning and machine learning algorithms

A grocery list is an integral part of the shopping experience of many consumers. Several mobile retail studies of grocery apps indicate that potential customers place the highest priority on features that help them to create and manage personalized shopping lists.
Nadia Tahiri, Bogdan Mazoure, Vladimir Makarenkov
https://doi.org/10.25080/Majora-7ddc1dd1-00c

A Real-Time 3D Audio Simulator for Cognitive Hearing Science

This paper describes the development of a 3D audio simulator for use in cognitive hearing science studies and also for general 3D audio experimentation. The framework that the simulator is built upon is pyaudio\_helper, which is a module of the package scikit-dsp-comm.
Mark Wickert
https://doi.org/10.25080/Majora-7ddc1dd1-00b

Optimizing Python-Based Spectroscopic Data Processing on NERSC Supercomputers

We present a case study of optimizing a Python-based cosmology data processing pipeline designed to run in parallel on thousands of cores using supercomputers at the National Energy Research Scientific Computing Center (NERSC).
Laurie A. Stephey, Rollin C. Thomas, Stephen J. Bailey
https://doi.org/10.25080/Majora-7ddc1dd1-00a

Solving Polynomial Systems with phcpy

The solutions of a system of polynomials in several variables are often needed, e.g.: in the design of mechanical systems, and in phase-space analyses of nonlinear biological dynamics. Reliable, accurate, and comprehensive numerical solutions are available through PHCpack, a FOSS package for solving polynomial systems with homotopy continuation.
Jasmine Otto, Angus Forbes, Jan Verschelde
https://doi.org/10.25080/Majora-7ddc1dd1-009

Case study: Real-world machine learning application for hardware failure detection

When designing microprocessors, engineers must verify whether the proposed design, defined in hardware description language, does what is intended. During this verification process, engineers run simulation tests and can fix bugs if the tests have failed.
Hongsup Shin
https://doi.org/10.25080/Majora-7ddc1dd1-001

Codebraid: Live Code in Pandoc Markdown

Codebraid executes code blocks and inline code in Pandoc Markdown documents as part of the document build process. Code can be executed with a built-in system or Jupyter kernels. Either way, a single document can involve multiple programming languages, as well as multiple independent sessions or processes per language.
Geoffrey M. Poore
https://doi.org/10.25080/Majora-7ddc1dd1-008

pyjanitor: A Cleaner API for Cleaning Data

The pandas library has become the de facto library for data wrangling in the Python programming language. However, inconsistencies in the pandas application programming interface (API), while idiomatic due to historical use, prevent use of expressive, fluent programming idioms that enable self-documenting pandas code.
Eric J. Ma, Zachary Barry, Sam Zuckerman, +1
https://doi.org/10.25080/Majora-7ddc1dd1-007

Developing a Graph Convolution-Based Analysis Pipeline for Multi-Modal Neuroimage Data: An Application to Parkinson's Disease

Parkinson's disease (PD) is a highly prevalent neurodegenerative condition originating in subcortical areas of the brain and resulting in progressively worsening motor, cognitive, and psychiatric (e.g.
Christian McDaniel, Shannon Quinn, PhD
https://doi.org/10.25080/Majora-7ddc1dd1-006

CAF Implementation on FPGA Using Python Tools

The purpose of this project is to provide a real time geolocation solution by generating code for the complex ambiguity function (CAF) in a hardware description language (HDL) and the implementation on FPGA hardware.
Chiranth Siddappa, Mark Wickert
https://doi.org/10.25080/Majora-7ddc1dd1-005

Analyzing Particle Systems for Machine Learning and Data Visualization with freud

The freud Python library analyzes particle data output from molecular dynamics simulations. The library's design and its variety of high-performance methods make it a powerful tool for many modern applications.
Bradley D. Dice, Vyas Ramasubramani, Eric S. Harper, +3
https://doi.org/10.25080/Majora-7ddc1dd1-004

Accelerating the Advancement of Data Science Education

We outline a synthesis of strategies created in collaboration with 35+ colleges and universities on how to advance undergraduate data science education on a national scale. The four core pillars of this strategy include the integration of data science education across all domains, establishing adoptable and scalable cyberinfrastructure, applying data science to non-traditional domains, and incorporating ethical content into data science curricula.
Eric Van Dusen, Anthony Suen, Alan Liang, +1
https://doi.org/10.25080/Majora-7ddc1dd1-000

Deep and Ensemble Learning to Win the Army RCO AI Signal Classification Challenge

Automatic modulation classification is a challenging problem with multiple applications including cognitive radio and signals intelligence. Most of the existing efforts to solve this problem are only applicable when the signal to noise ratio (SNR) is high and/or long observations of the signal are available.
Andres Vila, Donna Branchevsky, Kyle Logue, +5
https://doi.org/10.25080/Majora-7ddc1dd1-003

Expert RF Feature Extraction to Win the Army RCO AI Signal Classification Challenge

Automatic modulation classification is a challenging problem with multiple applications including cognitive radio and signals intelligence. Most of the existing efforts to solve this problem are only applicable when the signal to noise ratio (SNR) is high and/or long observations of the signal are available.
Kyle Logue, Esteban Valles, Andres Vila, +5
https://doi.org/10.25080/Majora-7ddc1dd1-002