Proceedings of SciPy 2020

SciPy 2020, the 19th annual Scientific Computing with Python conference, was a virtual conference held July 6-12, 2020. 23 peer reviewed articles were published in the conference proceedings.

Towards an Unsupervised Spatiotemporal Representation of Cilia Video Using A Modular Generative Pipeline

Motile cilia are a highly conserved organelle found on the exterior of many human cells. Cilia beat in rhythmic patterns to transport substances or generate signaling gradients. Disruption of these patterns is often indicative of diseases known as ciliopathies, whose consequences can include dysfunction of macroscopic structures within the lungs, kidneys, brain, and other organs.
Meekail Zain, Sonia Rao, Nathan Safir, +6

Falsify your Software: validating scientific code with property-based testing

Where traditional example-based tests check software using manually-specified input-output pairs, property-based tests exploit a general description of valid inputs and program behaviour to automatically search for falsifying examples.
Zac Hatfield-Dodds

Software Engineering as Research Method: Aligning Roles in Econ-ARK

While general purpose scientific software has enjoyed great success in industry and academia, domain specific scientific software has not yet become well-established in many disciplines where it has potential.
Sebastian Benthall, Mridul Seth

SHADOW: A workflow scheduling algorithm reference and testing framework

As the scale of science projects increase, so does the demand on computing infrastructures. The complexity of science processing pipelines, and the heterogeneity of the environments on which they are run, continues to increase; in order to deal with this, the algorithmic approaches to executing these applications must also be adapted and improved to deal with this increased complexity.
Ryan W. Bunney, Andreas Wicenec, Mark Reynolds

Leading magnetic fusion energy science into the big-and-fast data lane

We present Delta, a Python framework that connects magnetic fusion experiments to high-performance computing (HPC) facilities in order leverage advanced data analysis for near real-time decisions. Using the ADIOS I/O framework, Delta streams measurement data with over 300 MByte/sec from a remote experimental site in Korea to Cori, a Cray XC-40 supercomputer at the National Energy Energy Research Scientific Computing Centre in California.
Ralph Kube, R Michael Churchill, Jong Youl Choi, +5

Pydra - a flexible and lightweight dataflow engine for scientific analyses

This paper presents a new lightweight dataflow engine written in Python: Pydra. Pydra is developed as an open-source project in the neuroimaging community, but it is designed as a general-purpose dataflow engine to support any scientific domain.
Dorota Jarecka, Mathias Goncalves, Christopher J. Markiewicz, +4

Combining Physics-Based and Data-Driven Modeling for Pressure Prediction in Well Construction

A framework for combining physics-based and data-driven models to improve well construction is presented in this study. Additionally, the proposed approach provides a more robust and accurate model that mitigates the disadvantages of using purely physics-based or data-driven models.
Oney Erge, Eric van Oort

pandera: Statistical Data Validation of Pandas Dataframes

pandas is an essential tool in the data scientist’s toolkit for modern data engineering, analysis, and modeling in the Python ecosystem. However, dataframes can often be difficult to reason about in terms of their data types and statistical properties as data is reshaped from its raw form to one that’s ready for analysis.
Niels Bantilan

Having your cake and eating it: Exploiting Python for programmer productivity and performance on micro-core architectures using ePython

Micro-core architectures combine many simple, low memory, low power computing cores together in a single package. These can be used as a co-processor or standalone but due to limited on-chip memory and esoteric nature of the hardware, writing efficient parallel codes for these chips is challenging.
Maurice Jamieson, Nick Brown, Sihang Liu

Matched Filter Mismatch Losses in MPSK and MQAM Using Semi-Analytic BEP Modeling

The focus of this paper is the bit error probability (BEP) performance degradation when the transmit and receive pulse shaping filters are mismatched. The modulation schemes considered are MPSK and MQAM.
Mark Wickert, David Peckham

Spectral Analysis of Mitochondrial Dynamics: A Graph-Theoretic Approach to Understanding Subcellular Pathology

Perturbations of organellar structures within a cell are useful indicators of the cell’s response to viral or bacterial invaders. Of the various organelles, mitochondria are meaningful to model because they show distinct migration patterns in the presence of potentially fatal infections, such as tuberculosis.
Marcus Hill, Mojtaba Fazli, Rachel Mattson, +8

High-performance operator evaluations with ease of use: libCEED's Python interface

libCEED is a new lightweight, open-source library for high-performance matrix-free Finite Element computations. libCEED offers a portable interface to high-performance implementations, selectable at runtime, tuned for a variety of current and emerging computational architectures, including CPUs and GPUs.
Valeria Barra, Jed Brown, Jeremy Thompson, +1

Awkward Array: JSON-like data, NumPy-like idioms

NumPy simplifies and accelerates mathematical calculations in Python, but only for rectilinear arrays of numbers. Awkward Array provides a similar interface for JSON-like data: slicing, masking, broadcasting, and performing vectorized math on the attributes of objects, unequal-length nested lists (i.
Jim Pivarski, Ianna Osborne, Pratyush Das, +2

Learning from evolving data streams

Ubiquitous data poses challenges on current machine learning systems to store, handle and analyze data at scale. Traditionally, this task is tackled by dividing the data into (large) batches. Models are trained on a data batch and then used to obtain predictions.
Jacob Montiel

Boost-histogram: High-Performance Histograms as Objects

Unlike arrays and tables, histograms in Python have usually been denied their own object, and have been represented as a single operation producing several arrays. Boost-histogram is a new Python library that provides histograms that can be filled, manipulated, sliced, and projected as objects.
Henry Schreiner, Hans Dembinski, Shuo Liu, +1

Network visualizations with Pyvis and VisJS

Pyvis is a Python module that enables visualizing and interactively manipulating network graphs in the Jupyter notebook, or as a standalone web application. Pyvis is built on top of the powerful and mature VisJS JavaScript library, which allows for fast and responsive interactions while also abstracting away the low-level JavaScript and HTML.
Giancarlo Perrone, Jose Unpingco, Haw-minn Lu

Introduction to Geometric Learning in Python with Geomstats

There is a growing interest in leveraging differential geometry in the machine learning community. Yet, the adoption of the associated geometric computations has been inhibited by the lack of a reference implementation.
Nina Miolane, Nicolas Guigui, Hadi Zaatiti, +17

Netlist Analysis and Transformations Using SpyDrNet

Digital hardware circuits (i.e., for application specific integrated circuits or field programmable gate array circuits) can contain a large number of discrete components and connections. These connections are defined by a data structure called a \textquotedbl{}netlist\textquotedbl{}.
Dallin Skouson, Andrew Keller, Michael Wirthlin

Compyle: a Python package for parallel computing

Compyle allows users to execute a restricted subset of Python on a variety of HPC platforms. It is an embedded domain-specific language (eDSL) for parallel computing. It currently supports multi-core execution using Cython, and OpenCL and CUDA for GPU devices.
Aditya Bhosale, Prabhu Ramachandran

HOOMD-blue version 3.0 A Modern, Extensible, Flexible, Object-Oriented API for Molecular Simulations

HOOMD-blue is a library for running molecular dynamics and hard particle Monte Carlo simulations that uses pybind11 to provide a Python interface to fast C++ internals. The package is designed to scale from a single CPU core to thousands of NVIDIA or AMD GPUs.
Brandon L. Butler, Vyas Ramasubramani, Joshua A. Anderson, +1

Fluctuation X-ray Scattering real-time app

The Linac Coherent Light Source (LCLS) at the SLAC National Accelerator Laboratory is an X-ray Free Electron Laser (X-FEL) facility enabling scientists to take snapshots of single macromolecules to study their structure and dynamics.
Antoine Dujardin, Elliott Slaugther, Jeffrey Donatelli, +3

Quasi-orthonormal Encoding for Machine Learning Applications

Most machine learning models, especially artificial neural networks, require numerical, not categorical data. We briefly describe the advantages and disadvantages of common encoding schemes. For example, one-hot encoding is commonly used for attributes with a few unrelated categories and word embeddings for attributes with many related categories (e.
Haw-minn Lu

Securing Your Collaborative Jupyter Notebooks in the Cloud using Container and Load Balancing Services

Jupyter has become the go-to platform for developing data applications but data and security concerns, especially when dealing with healthcare, have become paramount for many institutions and applications dealing with sensitive information.
Haw-minn Lu, Adrian Kwong, José Unpingco