Proceedings of SciPy 2022

SciPy 2022, the 21st annual Scientific Computing with Python conference, was held in Austin, TX July 11-17, 2022. 39 peer reviewed articles were published in the conference proceedings.

Automatic random variate generation in Python

The generation of random variates is an important tool that is required in many applications. Various software programs or packages contain generators for standard distributions like the normal, exponential or Gamma, e.g., the programming language R and the packages SciPy and NumPy in Python.
Christoph Baumgarten, Tirth Patel

Global optimization software library for research and education

Machine learning models are often represented by functions given by computer programs. Optimization of such functions is a challenging task because traditional derivative based optimization methods with guaranteed convergence properties cannot be used.
Nadia Udler

Search for Extraterrestrial Intelligence: GPU Accelerated TurboSETI

A common technique adopted by the Search For Extraterrestrial Intelligence (SETI) community is monitoring electromagnetic radiation for signs of extraterrestrial technosignatures using ground-based radio observatories.
Luigi Cruz, Wael Farah, Richard Elkins

A New Python API for Webots Robotics Simulations

Webots is a popular open-source package for 3D robotics simulations. It can also be used as a 3D interactive environment for other physics-based modeling, virtual reality, teaching or games. Webots has provided a simple API allowing Python programs to control robots and/or the simulated world, but this API is inefficient and does not provide many "pythonic" conveniences.
Justin C. Fisher

poliastro: a Python library for interactive astrodynamics

Space is more popular than ever, with the growing public awareness of interplanetary scientific missions, as well as the increasingly large number of satellite companies planning to deploy satellite constellations.
Juan Luis Cano Rodríguez, Jorge Martínez Garrido

Papyri: better documentation for the scientific ecosystem in Jupyter

We present here the idea behind Papyri, a framework we are developing to provide a better documentation experience for the scientific ecosystem.
Matthias Bussonnier, Camille Carvalho

Experience report of physics-informed neural networks in fluid simulations: pitfalls and frustration

Though PINNs (physics-informed neural networks) are now deemed as a complement to traditional CFD (computational fluid dynamics) solvers rather than a replacement, their ability to solve the Navier-Stokes equations without given data is still of great interest.
Pi-Yueh Chuang, Lorena A. Barba

Low Level Feature Extraction for Cilia Segmentation

Cilia are organelles found on the surface of some cells in the human body that sweep rhythmically to transport substances. Dysfunction of ciliary motion is often indicative of diseases known as ciliopathies, which disrupt the functionality of macroscopic structures within the lungs, kidneys and other organs.
Meekail Zain, Eric Miller, Shannon P Quinn, +1

Enabling Active Learning Pedagogy and Insight Mining with a Grammar of Model Analysis

Modern engineering models are complex, with dozens of inputs, uncertainties arising from simplifying assumptions, and dense output data. While major strides have been made in the computational scalability of complex models, relatively less attention has been paid to user-friendly, reusable tools to explore and make sense of these models.
Zachary del Rosario

atoMEC: An open-source average-atom Python code

Average-atom models are an important tool in studying matter under extreme conditions, such as those conditions experienced in planetary cores, brown and white dwarfs, and during inertial confinement fusion.
Timothy J. Callow, Daniel Kotik, Eli Kraisler, +1

Monaco: A Monte Carlo Library for Performing Uncertainty and Sensitivity Analyses

This paper introduces *monaco*, a Python library for conducting Monte Carlo simulations of computational models, and performing uncertainty analysis (UA) and sensitivity analysis (SA) on the results.
W. Scott Shambaugh

A Python Pipeline for Rapid Application Development (RAD)

Rapid Application Development (RAD) is the ability to rapidly prototype an interactive interface through frequent feedback, so that it can be quickly deployed and delivered to stakeholders and customers.
Scott D. Christensen, Marvin S. Brown, Robert B. Haehnel, +6

Variational Autoencoders For Semi-Supervised Deep Metric Learning

Deep metric learning (DML) methods generally do not incorporate unlabelled data. We propose borrowing components of the variational autoencoder (VAE) methodology to extend DML methods to train on semi-supervised datasets.
Nathan Safir, Meekail Zain, Curtis Godwin, +3

Wailord: Parsers and Reproducibility for Quantum Chemistry

Data driven advances dominate the applied sciences landscape, with quantum chemistry being no exception to the rule. Dataset biases and human error are key bottlenecks in the development of reproducible and generalized insights.
Rohit Goswami

RocketPy: Combining Open-Source and Scientific Libraries to Make the Space Sector More Modern and Accessible

In recent years we are seeing exponential growth in the space sector, with new companies emerging in it. On top of that more people are becoming fascinated to participate in the aerospace revolution, which motivates students and hobbyists to build more High Powered and Sounding Rockets.
João Lemes Gribel Soares, Mateus Stano Junqueira, Oscar Mauricio Prada Ramirez, +4

Improving PyDDA's atmospheric wind retrievals using automatic differentiation and Augmented Lagrangian methods

Meteorologists require information about the spatiotemporal distribution of winds in thunderstorms in order to analyze how physical and dynamical processes govern thunderstorm evolution. Knowledge of such processes is vital for predicting severe and hazardous weather events.
Robert Jackson, Rebecca Gjini, Sri Hari Krishna Narayanan, +4

pyDAMPF: a Python package for modeling mechanical properties of hygroscopic materials under interaction with a nanoprobe

Willy Menacho, Gonzalo Marcelo Ramírez-Ávila, Horacio V. Guzman

popmon: Analysis Package for Dataset Shift Detection

popmon is an open-source Python package to check the stability of a tabular dataset.
Simon Brugman, Tomas Sostak, Pradyot Patil, +1

The Geoscience Community Analysis Toolkit: An Open Development, Community Driven Toolkit in the Scientific Python Ecosystem

The Geoscience Community Analysis Toolkit (GeoCAT) team develops and maintains data analysis and visualization tools on structured and unstructured grids for the geosciences community in the Scientific Python Ecosystem (SPE).
Orhan Eroglu, Anissa Zacharias, Michaela Sizemore, +3

Design of a Scientific Data Analysis Support Platform

Software data analytic workflows are a critical aspect of modern scientific research and play a crucial role in testing scientific hypotheses.
Nathan Martindale, Jason Hite, Scott Stewart, +1

Temporal Word Embeddings Analysis for Disease Prevention

Human languages' semantics and structure constantly change over time through mediums such as culturally significant events. By viewing the semantic changes of words during notable events, contexts of existing and novel words can be predicted for similar, current events.
Nathan Jacobi, Ivan Mo, Albert You, +4

Phylogeography: Analysis of genetic and climatic data of SARS-CoV-2

Due to the fact that the SARS-CoV-2 pandemic reaches its peak, researchers around the globe are combining efforts to investigate the genetics of different variants to better deal with its distribution. This paper discusses phylogeographic approaches to examine how patterns of divergence within SARS-CoV-2 coincide with geographic features, such as climatic features.
Aleksandr Koshkarov, Wanlin Li, My-Linh Luu, +1

pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling

pyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models, and classifying data with existing pre-trained audio classification models or custom user-built models.
Jyotika Singh

Likeness: a toolkit for connecting the social fabric of place to human dynamics

The ability to produce richly-attributed synthetic populations is key for understanding human dynamics, responding to emergencies, and preparing for future events, all while protecting individual privacy. The Likeness toolkit accomplishes these goals.
Joseph V. Tuccillo, James D. Gaboardi

Keeping your Jupyter notebook code quality bar high (and production ready) with Ploomber

This paper walks through the ploomber interactive tutorial.
Ido Michael

Awkward Packaging: building Scikit-HEP

Scikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iminuit are examples of libraries that are used beyond the original HEP focus. In this paper we will look at key packages in the ecosystem.
Henry Schreiner, Jim Pivarski, Eduardo Rodrigues

Incorporating Task-Agnostic Information in Task-Based Active Learning Using a Variational Autoencoder

It is often much easier and less expensive to collect data than to label it. Active learning (AL) responds to this issue by selecting which unlabeled data are best to label next.
Curtis Godwin, Meekail Zain, Nathan Safir, +2

Codebraid Preview for VS Code: Pandoc Markdown Preview with Jupyter Kernels

Codebraid Preview is a VS Code extension that provides a live preview of Pandoc Markdown documents with optional support for executing embedded code. Unlike typical Markdown previews, all Pandoc features are fully supported because Pandoc itself generates the preview.
Geoffrey M. Poore

Pylira: deconvolution of images in the presence of Poisson noise

All physical and astronomical imaging observations are degraded by the finite angular resolution of the camera and telescope systems. The recovery of the true image is limited by both how well the instrument characteristics are known and by the magnitude of measurement noise.
Axel Donath, Aneta Siemiginowska, Vinay Kashyap, +3

Python vs. the pandemic: a case study in high-stakes software development

When it became clear in early 2020 that COVID-19 was going to be a major public health threat, politicians and public health officials turned to academic disease modelers like us for urgent guidance.
Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, +7

Bayesian Estimation and Forecasting of Time Series in statsmodels

Statsmodels, a Python library for statistical and econometric analysis, has traditionally focused on frequentist inference, including in its models for time series data.
Chad Fulton

USACE Coastal Engineering Toolkit and a Method of Creating a Web-Based Application

In the early 1990s the Automated Coastal Engineering Systems, ACES, was created with the goal of providing state-of-the-art computer-based tools to increase the accuracy, reliability, and cost-effectiveness of Corps coastal engineering endeavors.
Amanda Catlett, Theresa R. Coumbe, Scott D. Christensen, +1

Python for Global Applications: teaching scientific Python in context to law and diplomacy students

For students across domains and disciplines, the message has been communicated loud and clear: data skills are an essential qualification for today’s job market.
Anna Haensch, Karin Knudson

The myth of the normal curve and what to do about it

Reliance on the normal curve as a tool for measurement is almost a given. It shapes our grading systems, our measures of intelligence, and importantly, it forms the mathematical backbone of many of our inferential statistical tests and algorithms.
Allan Campopiano

A Novel Pipeline for Cell Instance Segmentation, Tracking and Motility Classification of Toxoplasma Gondii in 3D Space

Toxoplasma gondii is the parasitic protozoan that causes disseminated toxoplasmosis, a disease that is estimated to infect around one-third of the world's population. TSeg is developed for segmenting, tracking, and classifying the motility phenotypes of T. gondii in 3D microscopic images.
Seyed Alireza Vaezi, Gianni Orlando, Mojtaba Fazli, +3

Utilizing SciPy and other open source packages to provide a powerful API for materials manipulation in the Schrödinger Materials Suite

The use of several open source scientific packages in the Schrödinger Materials Science Suite will be discussed.
Alexandr Fonari, Farshad Fallah, Michael Rauch

Galyleo: A General-Purpose Extensible Visualization Solution

Galyleo is an open-source, extensible dashboarding solution integrated with JupyterLab.
Rick McGeer, Andreas Bergen, Mahdiyar Biazi, +2

Semi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Labeling

Most semantic image annotation platforms suffer severe bottlenecks when handling large images, complex regions of interest, or numerous distinct foreground regions in a single image. We have developed the Semi-Supervised Semantic Annotator (S3A) to address each of these issues and facilitate rapid collection of ground truth pixel-level labeled data.
Nathan Jessurun, Daniel E. Capecci, Olivia P. Dizon-Paradis, +2

The Advanced Scientific Data Format (ASDF): An Update

We report on progress in developing and extending the new (ASDF) format we have developed for the data from the James Webb and Nancy Grace Roman Space Telescopes since we reported on it at a previous Scipy.
Perry Greenfield, Edward Slavich, William Jamieson, +1