Skip to contentSkip to frontmatterSkip to Backmatter

Posters and Slides

Accepted Paper Slides

Building Binary Extensions with pybind11, scikit-build, and cibuildwheel

Building binary extensions is easier than ever thanks to several key libraries. Pybind11 provides a natural C++ language for extensions without requiring pre-processing or special dependencies. Scikit-build ties the premier C++ build system, CMake, into the Python extension build process. And cibuildwheel makes it easy to build highly compatible wheels for over 80 different platforms using CI or on your local machine.

Henry Schreiner, Joe Rickerby, Ralf Grosse-Kunstleve, Wenzel Jakob, Matthieu Darbois, Aaron Gokaslan, Jean-Christophe Fillion-Robin, Matt McCormick

10.25080/majora-212e5952-033

Python Development Schemes for Monte Carlo Neutronics on High Performance Computing

We investigate three methods of hardware accleeration on both GPUs and CPUs for a Monte Carlo neutron transport simulation code writen in Python. The accelerating schemes we examine are Pykokks, Numba, and hardware code generating libraries like PyCUDA. This work was supported by the Center for Exascale Monte-Carlo Neutron Transport (CEMeNT) a PSAAP-III project funded by the Department of Energy, grant number: DE-NA003967.

Jackson P. Morgan, Kyle E. Niemeyer

10.25080/majora-212e5952-034

Awkward Packaging: Building scikit-HEP

Scikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iMinuit are examples of libraries that are used beyond the original HEP focus.

Henry Schreiner, Jim Pivarski, Eduardo Rodrigues

10.25080/majora-212e5952-035

Development of Accessible, Aesthetically-Pleasing Color Sequences

Many types of data visualization, e.g., line plots and scatter plots, utilize a discrete palette of colors, a color sequence, to differentiate between the categories of data being plotted. Unfortunately, many commonly-used color sequences offer poor accessibility to individuals with color-vision deficiencies, using colors that such individuals find difficult to differentiate between. Here, the development of new, accessible color sequences is discussed. As new color sequences must be aesthetically pleasing if they are to see widespread adoption, a crowd-sourced survey was used to estimate aesthetic preference, while accessibility aspects were handled via quantitative analysis.

Matthew A. Petroff

10.25080/majora-212e5952-036

Cutting Edge Climate Science in the Cloud with Pangeo

Climate change is one of the most challenging issues of our time. To prevent the worst outcomes, we need to drastically accelerate the creation and distribution of scientific knowledge. But the complex and massive datasets produced by numerical climate models render the common ‘download and analyze’ workflow inefficient, blocking innovative analysis and fast scientific discoveries. We present python tools and cloud infrastructure developed within the Pangeo community, enabling cutting edge climate science from a web-browser, making it efficient, reproducible, and inclusive. To demonstrate these capabilities we will reproduce a plot from the IPCC report in a live cloud demonstration.

Julius Busecke

10.25080/majora-212e5952-037

Pylira: deconvolution of images in the presence of Poisson noise

Pylira is Python package for deconvolution for images in the presence of Poisson noise. In this presentation I will explain the method in detail, show the setup and API of the Python package as well as show application examples using real astronomical data.

Axel Donath, Aneta Siemiginowska, Vinay Kashyap, Douglas Burke, Karthik Reddy Solipuram, David van Dyk

10.25080/majora-212e5952-038

Accelerating Science with the Generative Toolkit for Scientific Discovery (GT4SD)

A presentation about GT4SD: an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.

GT4SD team

10.25080/majora-212e5952-039

MModel: a modular modeling framework for scientific prototyping

MModel is a Python framework that allows for fast and modular prototyping. The library uses networkx graph for workflow construction and provides built-in toolkits such as subgraph modification and graph visualization with rich metadata.

Peter Sun, John A. Marohn

10.25080/majora-212e5952-03a

Monaco: Quantify Uncertainty and Sensitivities in Your Computational Models with a Monte Carlo Library

Quantify uncertainty and sensitivities in your existing computational models with the “monaco” library. Users define input variables randomly drawn from any of SciPy’s statistical distributions, run their model in parallel anywhere from 1 to millions of times, and postprocess the outputs to obtain meaningful, statistically significant conclusions. This talk goes over why you should always be running Monte Carlo simulations, a demo of how to set up and run a sim, and a crash course in generating relevant plots and statistics.

W. Scott Shambaugh

10.25080/majora-212e5952-03b

UFuncs and DTypes: new possibilities in NumPy

Over the past three years, NumPy has seen large changes to much of its core functionalities including universal functions, casting, and DTypes. The goal of this refactoring was to introduce extensible APIs to improve existing user-defined DTypes and unlock new ones. This refactoring is nearing its conclusion, with the work being surfaced as public-facing API. In this talk we will discuss what has been done, and newly possible applications—such as a custom NumPy DType that is aware of physical units.

Sebastian Berg, Stéfan van der Walt

10.25080/majora-212e5952-03c

10.25080/majora-212e5952-03d

pyampute: a Python library for data amputation

Amputation is the opposite of imputation; it is the creation of a missing data mask for complete datasets. Amputation is useful for evaluating the effect of missing values on the outcome of a statistical or machine learning model. In this talk, we present pyampute: the first open-source Python library for data amputation. Our package is compatible with the scikit-learn-style fit and transform paradigm, which allows for seamless integration of amputation in a larger, more complex data processing pipeline.

Rianne M Schouten, Davina Zamanzadeh, Prabhant Singh

10.25080/majora-212e5952-03e

Scientific Python: From GitHub to TikTok

The Scientific Python project aims to better coordinate the ecosystem and grow the community. This talk focuses on our efforts to expand our community by generating a welcoming and friendly environment where people collaborate, build, and improve together.

Juanita Gomez Romero, Stéfan van der Walt, K. Jarrod Millman, Melissa Weber Mendonça, Inessa Pawson

10.25080/majora-212e5952-03f

Scientific Python: By maintainers, for maintainers

Tools for maintainers and how we can help each others.

Pamphile T. Roy, Stéfan van der Walt, K. Jarrod Millman, Melissa Weber Mendonça

10.25080/majora-212e5952-040

Improving random sampling in Python: scipy.stats.sampling and scipy.stats.qmc

Why and how to use scipy.stats.sampling and scipy.stats.qmc?

Pamphile T. Roy, Matt Haberland, Christoph Baumgarten, Tirth Patel

10.25080/majora-212e5952-041

Petabyte-scale ocean data analytics on staggered grids via the grid ufunc protocol in xGCM

We analysed the highest resolution global ocean simulation to date, using xGCM, xhistogram, and dask.

Thomas Nicholas, Julius Busecke, Ryan Abernathey

10.25080/majora-212e5952-042

Accepted Posters

Optimal Review Assignments for the SciPy Conference Using Binary Integer Linear Programming in SciPy 1.9

Each year, the SciPy Conference receives hundreds of submissions, and dozens of volunteers offer to review them to help make selections for the conference. How should submissions be assigned to reviewers to distribute the work fairly while 1) ensuring that each submission receives at least three reviews, 2) preventing conflicts of interest, and 3) respecting reviewers’ domains of expertise? Binary integer linear programming is an ideal framework for defining and solving ‘scheduling’ or ‘assignment’ problems like this. In this poster, we show how users can formulate and solve problems of this type with new, accessible tools in the scientific Python ecosystem.

Matt Haberland, Nicholas McKibben

10.25080/majora-212e5952-029

Contributing to Open Source Software: From not knowing Python to becoming a Spyder core developer

Experience overview of becoming an open source developer and updates on the work being done in the Spyder IDE project for 2022

Daniel Althviz Moré

10.25080/majora-212e5952-02a

Semi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Image Labeling

Python GUI and library for semantic image segmentation and annotation

Nathan Jessurun, Olivia P. Dizon-Paradis, Dan E. Capecci, Damon L. Woodard, Navid Asadizanjani

10.25080/majora-212e5952-02b

Bioframe: Operating on Genomic Interval Dataframes

Python library for working with genomic interval dataframes.

Nezar Abdennur, Geoffrey Fudenberg, Ilya M. Flyamer, Aleksandra Galitsyna, Anton Goloborodko, Maxim Imakaev, Trevor Manz, Sergey V. Venev

10.25080/majora-212e5952-02c

Likeness: a toolkit for connecting the social fabric of place to human dynamics

Richly-attributed synthetic population data are crucial for discerning human dynamics while preserving privacy. The Likeness toolkit provides a solution to this problem with a suite of Python packages that generate population data as individual agents in appropriate nighttime locations and allocates them to probable daytime activity spaces. Through a case study utilizing students and faculty as agents, the results of Likeness simulations are shown to recreate high-fidelity school capacities, comparable to empirical data sources.

Joseph V. Tuccillo, James D. Gaboardi

10.25080/majora-212e5952-02d

10.25080/majora-212e5952-02e

Kiwi: Python Tool for Tex Processing and Classification

A user-friendly desktop tool for text visualization and classification. This allows users within the field to avoid creating boilerplate code for basic NLP tasks and users new to machine learning to plug and play with various models and methods. Our main goal is to make natural language processing accessible and easy.

Neelima Pulagam, Sai Marasani, Brian Sass

10.25080/majora-212e5952-02f

10.25080/majora-212e5952-030

Design of a Scientific Data Analysis Support Platform

Studying the design features necessary for a workflow and experiment management system, and presenting Curifactory: an open source package that meets these design features.

Nathan Martindale, Jason Hite, Scott Stewart, Mark Adams

10.25080/majora-212e5952-031

Opening ARM: A pivot to community software to meet the needs of users and stakeholders of the planet’s largest cloud observatory

This presentation discusses the evolution (and hurdles that came with) of the Atmospheric Radiation Measurement (ARM) program’s open source endeavors, starting with the Python ARM Radar Toolkit to the Atmospheric data Community Toolkit in 2018, the expansion of our open-source presence on Github in 2019 and what is planned for the future.

Zachary Sherman, Scott Collis, Max Grover, Robert Jackson, Adam Theisen

10.25080/majora-212e5952-032

SciPy Tools Plenaries

SciPy Tools Plenary - CEL team

Introducing the Contributor Experience Lead team at the SciPy 2022

Inessa Pawson

10.25080/majora-212e5952-043

SciPy Tools Plenary on Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.

Elliott Sales de Andrade

10.25080/majora-212e5952-044

SciPy Tools Plenary - NumPy

Annual update on the NumPy project at SciPy 2022

Inessa Pawson

10.25080/majora-212e5952-045

Lightning Talks

Downsampling Time Series Data for Visualizations

Exploring the largest triangle three bucket algorithm to downsample time series data.

Delaina Moore

10.25080/majora-212e5952-027

Analysis as Applications: Quick introduction to lockfiles

An opinionated argument for the use of lockfiles in scientific analysis in a similar manner to Python application deployment. This talk was inspired by Brett Cannon’s ‘pip-secure-install’ project and a Twitter conversation with Dustin Ingram on April 20, 2020.

Matthew Feickert

10.25080/majora-212e5952-028