
Posters and Slides
Accepted Paper Slides¶
Building Binary Extensions with pybind11, scikit-build, and cibuildwheel
Building binary extensions is easier than ever thanks to several key libraries. Pybind11 provides a natural C++ language for extensions without requiring pre-processing or special dependencies. Scikit-build ties the premier C++ build system, CMake, into the Python extension build process. And cibuildwheel makes it easy to build highly compatible wheels for over 80 different platforms using CI or on your local machine.
Henry Schreiner, Joe Rickerby, Ralf Grosse-Kunstleve, Wenzel Jakob, Matthieu Darbois, Aaron Gokaslan, Jean-Christophe Fillion-Robin, Matt McCormick
Python Development Schemes for Monte Carlo Neutronics on High Performance Computing
We investigate three methods of hardware accleeration on both GPUs and CPUs for a Monte Carlo neutron transport simulation code writen in Python. The accelerating schemes we examine are Pykokks, Numba, and hardware code generating libraries like PyCUDA. This work was supported by the Center for Exascale Monte-Carlo Neutron Transport (CEMeNT) a PSAAP-III project funded by the Department of Energy, grant number: DE-NA003967.
Jackson P. Morgan, Kyle E. Niemeyer
Awkward Packaging: Building scikit-HEP
Scikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iMinuit are examples of libraries that are used beyond the original HEP focus.
Henry Schreiner, Jim Pivarski, Eduardo Rodrigues
Development of Accessible, Aesthetically-Pleasing Color Sequences
Many types of data visualization, e.g., line plots and scatter plots, utilize a discrete palette of colors, a color sequence, to differentiate between the categories of data being plotted. Unfortunately, many commonly-used color sequences offer poor accessibility to individuals with color-vision deficiencies, using colors that such individuals find difficult to differentiate between. Here, the development of new, accessible color sequences is discussed. As new color sequences must be aesthetically pleasing if they are to see widespread adoption, a crowd-sourced survey was used to estimate aesthetic preference, while accessibility aspects were handled via quantitative analysis.
Matthew A. Petroff
Cutting Edge Climate Science in the Cloud with Pangeo
Climate change is one of the most challenging issues of our time. To prevent the worst outcomes, we need to drastically accelerate the creation and distribution of scientific knowledge. But the complex and massive datasets produced by numerical climate models render the common ‘download and analyze’ workflow inefficient, blocking innovative analysis and fast scientific discoveries. We present python tools and cloud infrastructure developed within the Pangeo community, enabling cutting edge climate science from a web-browser, making it efficient, reproducible, and inclusive. To demonstrate these capabilities we will reproduce a plot from the IPCC report in a live cloud demonstration.
Julius Busecke
Pylira: deconvolution of images in the presence of Poisson noise
Pylira is Python package for deconvolution for images in the presence of Poisson noise. In this presentation I will explain the method in detail, show the setup and API of the Python package as well as show application examples using real astronomical data.
Axel Donath, Aneta Siemiginowska, Vinay Kashyap, Douglas Burke, Karthik Reddy Solipuram, David van Dyk
Accelerating Science with the Generative Toolkit for Scientific Discovery (GT4SD)
A presentation about GT4SD: an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.
GT4SD team
MModel: a modular modeling framework for scientific prototyping
MModel is a Python framework that allows for fast and modular prototyping. The library uses networkx graph for workflow construction and provides built-in toolkits such as subgraph modification and graph visualization with rich metadata.
Peter Sun, John A. Marohn
Monaco: Quantify Uncertainty and Sensitivities in Your Computational Models with a Monte Carlo Library
Quantify uncertainty and sensitivities in your existing computational models with the “monaco” library. Users define input variables randomly drawn from any of SciPy’s statistical distributions, run their model in parallel anywhere from 1 to millions of times, and postprocess the outputs to obtain meaningful, statistically significant conclusions. This talk goes over why you should always be running Monte Carlo simulations, a demo of how to set up and run a sim, and a crash course in generating relevant plots and statistics.
W. Scott Shambaugh
UFuncs and DTypes: new possibilities in NumPy
Over the past three years, NumPy has seen large changes to much of its core functionalities including universal functions, casting, and DTypes. The goal of this refactoring was to introduce extensible APIs to improve existing user-defined DTypes and unlock new ones. This refactoring is nearing its conclusion, with the work being surfaced as public-facing API. In this talk we will discuss what has been done, and newly possible applications—such as a custom NumPy DType that is aware of physical units.
Sebastian Berg, Stéfan van der Walt
Per Python ad astra: interactive Astrodynamics with poliastro
This talk presents poliastro, an open-source Python library for interactive Astrodynamics that features an easy-to-use API and tools for quick visualization. poliastro implements core Astrodynamics algorithms and leverages numba and Astropy. During the talk, we will describe the two-layer architecture that allows poliastro to offer an approachable API with good performance, discuss the challenges we faced to validate our code, and comment on the successes and failures of the project in trying to build a rich and diverse community. Source code of poliastro is available at https://
Juan Luis Cano Rodríguez
pyampute: a Python library for data amputation
Amputation is the opposite of imputation; it is the creation of a missing data mask for complete datasets. Amputation is useful for evaluating the effect of missing values on the outcome of a statistical or machine learning model. In this talk, we present pyampute: the first open-source Python library for data amputation. Our package is compatible with the scikit-learn-style fit and transform paradigm, which allows for seamless integration of amputation in a larger, more complex data processing pipeline.
Rianne M Schouten, Davina Zamanzadeh, Prabhant Singh
Scientific Python: From GitHub to TikTok
The Scientific Python project aims to better coordinate the ecosystem and grow the community. This talk focuses on our efforts to expand our community by generating a welcoming and friendly environment where people collaborate, build, and improve together.
Juanita Gomez Romero, Stéfan van der Walt, K. Jarrod Millman, Melissa Weber Mendonça, Inessa Pawson
Scientific Python: By maintainers, for maintainers
Tools for maintainers and how we can help each others.
Pamphile T. Roy, Stéfan van der Walt, K. Jarrod Millman, Melissa Weber Mendonça
Improving random sampling in Python: scipy.stats.sampling and scipy.stats.qmc
Why and how to use scipy.stats.sampling and scipy.stats.qmc?
Pamphile T. Roy, Matt Haberland, Christoph Baumgarten, Tirth Patel
Petabyte-scale ocean data analytics on staggered grids via the grid ufunc protocol in xGCM
We analysed the highest resolution global ocean simulation to date, using xGCM, xhistogram, and dask.
Thomas Nicholas, Julius Busecke, Ryan Abernathey
Accepted Posters¶
Optimal Review Assignments for the SciPy Conference Using Binary Integer Linear Programming in SciPy 1.9
Each year, the SciPy Conference receives hundreds of submissions, and dozens of volunteers offer to review them to help make selections for the conference. How should submissions be assigned to reviewers to distribute the work fairly while 1) ensuring that each submission receives at least three reviews, 2) preventing conflicts of interest, and 3) respecting reviewers’ domains of expertise? Binary integer linear programming is an ideal framework for defining and solving ‘scheduling’ or ‘assignment’ problems like this. In this poster, we show how users can formulate and solve problems of this type with new, accessible tools in the scientific Python ecosystem.
Matt Haberland, Nicholas McKibben
Contributing to Open Source Software: From not knowing Python to becoming a Spyder core developer
Experience overview of becoming an open source developer and updates on the work being done in the Spyder IDE project for 2022
Daniel Althviz Moré
Semi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Image Labeling
Python GUI and library for semantic image segmentation and annotation
Nathan Jessurun, Olivia P. Dizon-Paradis, Dan E. Capecci, Damon L. Woodard, Navid Asadizanjani
Bioframe: Operating on Genomic Interval Dataframes
Python library for working with genomic interval dataframes.
Nezar Abdennur, Geoffrey Fudenberg, Ilya M. Flyamer, Aleksandra Galitsyna, Anton Goloborodko, Maxim Imakaev, Trevor Manz, Sergey V. Venev
Likeness: a toolkit for connecting the social fabric of place to human dynamics
Richly-attributed synthetic population data are crucial for discerning human dynamics while preserving privacy. The Likeness toolkit provides a solution to this problem with a suite of Python packages that generate population data as individual agents in appropriate nighttime locations and allocates them to probable daytime activity spaces. Through a case study utilizing students and faculty as agents, the results of Likeness simulations are shown to recreate high-fidelity school capacities, comparable to empirical data sources.
Joseph V. Tuccillo, James D. Gaboardi
pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling
pyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models, and classifying data with existing pre-trained audio classification models or custom user-built models. This library contains features built in Python that were originally published in MATLAB. pyAudioProcessing allows the user to compute various features from audio files including Gammatone Frequency Cepstral Coefficients (GFCC), Mel Frequency Cepstral Coefficients (MFCC), spectral features, chroma features, and others such as beat-based and cepstrum-based features from audio. One can use these features along with one’s own classification backend or any of the popular scikit-learn classifiers that have been integrated into pyAudioProcessing. Cleaning functions to strip unwanted portions from the audio are another offering of the library. It further contains integrations with other audio functionalities such as frequency and time-series visualizations and audio format conversions. This software aims to provide machine learning engineers, data scientists, researchers, and students with a set of baseline models to classify audio. The library is available at https://
Jyotika Singh
Kiwi: Python Tool for Tex Processing and Classification
A user-friendly desktop tool for text visualization and classification. This allows users within the field to avoid creating boilerplate code for basic NLP tasks and users new to machine learning to plug and play with various models and methods. Our main goal is to make natural language processing accessible and easy.
Neelima Pulagam, Sai Marasani, Brian Sass
Phylogeography: Analysis of genetic and climatic data of SARS-CoV-2
Due to the fact that the SARS-CoV-2 pandemic reaches its peak, researchers around the globe are combining efforts to investigate the genetics of different variants to better deal with its distribution. This paper discusses phylogeographic approaches to examine how patterns of divergence within SARS-CoV-2 coincide with geographic features, such as climatic features. First, we propose a python-based bioinformatic pipeline called aPhylogeo for phylogeographic analysis written in Python 3 that help researchers better understand the distribution of the virus in specific regions via a configuration file, and then run all the analysis operations in a single run. In particular, the aPhylogeo tool determines which parts of the genetic sequence undergo a high mutation rate depending on geographic conditions, using a sliding window that moves along the genetic sequence alignment in user-defined steps and a window size. As a Python-based cross-platform program, aPhylogeo works on Windows®, MacOS X® and GNU/Linux. The implementation of this pipeline is publicly available on GitHub (https://
Wanlin Li, Aleksandr Koshkarov, My-Linh Luu, Nadia Tahiri
Design of a Scientific Data Analysis Support Platform
Studying the design features necessary for a workflow and experiment management system, and presenting Curifactory: an open source package that meets these design features.
Nathan Martindale, Jason Hite, Scott Stewart, Mark Adams
Opening ARM: A pivot to community software to meet the needs of users and stakeholders of the planet’s largest cloud observatory
This presentation discusses the evolution (and hurdles that came with) of the Atmospheric Radiation Measurement (ARM) program’s open source endeavors, starting with the Python ARM Radar Toolkit to the Atmospheric data Community Toolkit in 2018, the expansion of our open-source presence on Github in 2019 and what is planned for the future.
Zachary Sherman, Scott Collis, Max Grover, Robert Jackson, Adam Theisen
SciPy Tools Plenaries¶
SciPy Tools Plenary - CEL team
Introducing the Contributor Experience Lead team at the SciPy 2022
Inessa Pawson
SciPy Tools Plenary on Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.
Elliott Sales de Andrade
SciPy Tools Plenary - NumPy
Annual update on the NumPy project at SciPy 2022
Inessa Pawson
Lightning Talks¶
Downsampling Time Series Data for Visualizations
Exploring the largest triangle three bucket algorithm to downsample time series data.
Delaina Moore
Analysis as Applications: Quick introduction to lockfiles
An opinionated argument for the use of lockfiles in scientific analysis in a similar manner to Python application deployment. This talk was inspired by Brett Cannon’s ‘pip-secure-install’ project and a Twitter conversation with Dustin Ingram on April 20, 2020.
Matthew Feickert