Proceedings of the 21st Python in Science Conference

doi:10.25080/majora-212e5952-046

Proceedings

SciPy Proceedings

Download PDF Download Bibtex

Back to Article

Posters and Slides

Download Article

Contents

Accepted Paper Slides¶

Building Binary Extensions with pybind11, scikit-build, and cibuildwheel

Building binary extensions is easier than ever thanks to several key libraries. Pybind11 provides a natural C++ language for extensions without requiring pre-processing or special dependencies. Scikit-build ties the premier C++ build system, CMake, into the Python extension build process. And cibuildwheel makes it easy to build highly compatible wheels for over 80 different platforms using CI or on your local machine.

Henry Schreiner, Joe Rickerby, Ralf Grosse-Kunstleve, +5

https://doi.org/10.25080/majora-212e5952-033

Python Development Schemes for Monte Carlo Neutronics on High Performance Computing

We investigate three methods of hardware accleeration on both GPUs and CPUs for a Monte Carlo neutron transport simulation code writen in Python. The accelerating schemes we examine are Pykokks, Numba, and hardware code generating libraries like PyCUDA. This work was supported by the Center for Exascale Monte-Carlo Neutron Transport (CEMeNT) a PSAAP-III project funded by the Department of Energy, grant number: DE-NA003967.

Jackson P. Morgan, Kyle E. Niemeyer

https://doi.org/10.25080/majora-212e5952-034

Awkward Packaging: Building scikit-HEP

Scikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iMinuit are examples of libraries that are used beyond the original HEP focus.

Henry Schreiner, Jim Pivarski, Eduardo Rodrigues

https://doi.org/10.25080/majora-212e5952-035

Development of Accessible, Aesthetically-Pleasing Color Sequences

Many types of data visualization, e.g., line plots and scatter plots, utilize a discrete palette of colors, a color sequence, to differentiate between the categories of data being plotted. Unfortunately, many commonly-used color sequences offer poor accessibility to individuals with color-vision deficiencies, using colors that such individuals find difficult to differentiate between. Here, the development of new, accessible color sequences is discussed. As new color sequences must be aesthetically pleasing if they are to see widespread adoption, a crowd-sourced survey was used to estimate aesthetic preference, while accessibility aspects were handled via quantitative analysis.

Matthew A. Petroff

https://doi.org/10.25080/majora-212e5952-036

Cutting Edge Climate Science in the Cloud with Pangeo

Climate change is one of the most challenging issues of our time. To prevent the worst outcomes, we need to drastically accelerate the creation and distribution of scientific knowledge. But the complex and massive datasets produced by numerical climate models render the common 'download and analyze' workflow inefficient, blocking innovative analysis and fast scientific discoveries. We present python tools and cloud infrastructure developed within the Pangeo community, enabling cutting edge climate science from a web-browser, making it efficient, reproducible, and inclusive. To demonstrate these capabilities we will reproduce a plot from the IPCC report in a live cloud demonstration.

Julius Busecke

https://doi.org/10.25080/majora-212e5952-037

Pylira: deconvolution of images in the presence of Poisson noise

Pylira is Python package for deconvolution for images in the presence of Poisson noise. In this presentation I will explain the method in detail, show the setup and API of the Python package as well as show application examples using real astronomical data.

Axel Donath, Aneta Siemiginowska, Vinay Kashyap, +3

https://doi.org/10.25080/majora-212e5952-038

Accelerating Science with the Generative Toolkit for Scientific Discovery (GT4SD)

A presentation about GT4SD: an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.

GT4SD team

https://doi.org/10.25080/majora-212e5952-039

MModel: a modular modeling framework for scientific prototyping

MModel is a Python framework that allows for fast and modular prototyping. The library uses networkx graph for workflow construction and provides built-in toolkits such as subgraph modification and graph visualization with rich metadata.

Peter Sun, John A. Marohn

https://doi.org/10.25080/majora-212e5952-03a

Monaco: Quantify Uncertainty and Sensitivities in Your Computational Models with a Monte Carlo Library

Quantify uncertainty and sensitivities in your existing computational models with the “monaco” library. Users define input variables randomly drawn from any of SciPy's statistical distributions, run their model in parallel anywhere from 1 to millions of times, and postprocess the outputs to obtain meaningful, statistically significant conclusions. This talk goes over why you should always be running Monte Carlo simulations, a demo of how to set up and run a sim, and a crash course in generating relevant plots and statistics.

W. Scott Shambaugh

https://doi.org/10.25080/majora-212e5952-03b

UFuncs and DTypes: new possibilities in NumPy

Over the past three years, NumPy has seen large changes to much of its core functionalities including universal functions, casting, and DTypes. The goal of this refactoring was to introduce extensible APIs to improve existing user-defined DTypes and unlock new ones. This refactoring is nearing its conclusion, with the work being surfaced as public-facing API. In this talk we will discuss what has been done, and newly possible applications—such as a custom NumPy DType that is aware of physical units.

Sebastian Berg, Stéfan van der Walt

https://doi.org/10.25080/majora-212e5952-03c

Per Python ad astra: interactive Astrodynamics with poliastro

This talk presents poliastro, an open-source Python library for interactive Astrodynamics that features an easy-to-use API and tools for quick visualization. poliastro implements core Astrodynamics algorithms and leverages numba and Astropy. During the talk, we will describe the two-layer architecture that allows poliastro to offer an approachable API with good performance, discuss the challenges we faced to validate our code, and comment on the successes and failures of the project in trying to build a rich and diverse community. Source code of poliastro is available at https://github.com/poliastro/poliastro/ and documentation is online at https://docs.poliastro.space/.

Juan Luis Cano Rodríguez

https://doi.org/10.25080/majora-212e5952-03d

pyampute: a Python library for data amputation

Amputation is the opposite of imputation; it is the creation of a missing data mask for complete datasets. Amputation is useful for evaluating the effect of missing values on the outcome of a statistical or machine learning model. In this talk, we present pyampute: the first open-source Python library for data amputation. Our package is compatible with the scikit-learn-style fit and transform paradigm, which allows for seamless integration of amputation in a larger, more complex data processing pipeline.

Rianne M Schouten, Davina Zamanzadeh, Prabhant Singh

https://doi.org/10.25080/majora-212e5952-03e

Scientific Python: From GitHub to TikTok

The Scientific Python project aims to better coordinate the ecosystem and grow the community. This talk focuses on our efforts to expand our community by generating a welcoming and friendly environment where people collaborate, build, and improve together.

Juanita Gomez Romero, Stéfan van der Walt, K. Jarrod Millman, +2

https://doi.org/10.25080/majora-212e5952-03f

Scientific Python: By maintainers, for maintainers

Tools for maintainers and how we can help each others.

Pamphile T. Roy, Stéfan van der Walt, K. Jarrod Millman, +1

https://doi.org/10.25080/majora-212e5952-040

Improving random sampling in Python: scipy.stats.sampling and scipy.stats.qmc

Why and how to use scipy.stats.sampling and scipy.stats.qmc?

Pamphile T. Roy, Matt Haberland, Christoph Baumgarten, +1

https://doi.org/10.25080/majora-212e5952-041

Petabyte-scale ocean data analytics on staggered grids via the grid ufunc protocol in xGCM

We analysed the highest resolution global ocean simulation to date, using xGCM, xhistogram, and dask.

Thomas Nicholas, Julius Busecke, Ryan Abernathey

https://doi.org/10.25080/majora-212e5952-042

Accepted Posters¶

Optimal Review Assignments for the SciPy Conference Using Binary Integer Linear Programming in SciPy 1.9

Each year, the SciPy Conference receives hundreds of submissions, and dozens of volunteers offer to review them to help make selections for the conference. How should submissions be assigned to reviewers to distribute the work fairly while 1) ensuring that each submission receives at least three reviews, 2) preventing conflicts of interest, and 3) respecting reviewers' domains of expertise? Binary integer linear programming is an ideal framework for defining and solving 'scheduling' or 'assignment' problems like this. In this poster, we show how users can formulate and solve problems of this type with new, accessible tools in the scientific Python ecosystem.

Matt Haberland, Nicholas McKibben

https://doi.org/10.25080/majora-212e5952-029

Contributing to Open Source Software: From not knowing Python to becoming a Spyder core developer

Experience overview of becoming an open source developer and updates on the work being done in the Spyder IDE project for 2022

Daniel Althviz Moré

https://doi.org/10.25080/majora-212e5952-02a

Semi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Image Labeling

Python GUI and library for semantic image segmentation and annotation

Nathan Jessurun, Olivia P. Dizon-Paradis, Dan E. Capecci, +2

https://doi.org/10.25080/majora-212e5952-02b

Bioframe: Operating on Genomic Interval Dataframes

Python library for working with genomic interval dataframes.

Nezar Abdennur, Geoffrey Fudenberg, Ilya M. Flyamer, +5

https://doi.org/10.25080/majora-212e5952-02c

Likeness: a toolkit for connecting the social fabric of place to human dynamics

Richly-attributed synthetic population data are crucial for discerning human dynamics while preserving privacy. The Likeness toolkit provides a solution to this problem with a suite of Python packages that generate population data as individual agents in appropriate nighttime locations and allocates them to probable daytime activity spaces. Through a case study utilizing students and faculty as agents, the results of Likeness simulations are shown to recreate high-fidelity school capacities, comparable to empirical data sources.

Joseph V. Tuccillo, James D. Gaboardi

https://doi.org/10.25080/majora-212e5952-02d

pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling

pyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models, and classifying data with existing pre-trained audio classification models or custom user-built models. This library contains features built in Python that were originally published in MATLAB. pyAudioProcessing allows the user to compute various features from audio files including Gammatone Frequency Cepstral Coefficients (GFCC), Mel Frequency Cepstral Coefficients (MFCC), spectral features, chroma features, and others such as beat-based and cepstrum-based features from audio. One can use these features along with one’s own classification backend or any of the popular scikit-learn classifiers that have been integrated into pyAudioProcessing. Cleaning functions to strip unwanted portions from the audio are another offering of the library. It further contains integrations with other audio functionalities such as frequency and time-series visualizations and audio format conversions. This software aims to provide machine learning engineers, data scientists, researchers, and students with a set of baseline models to classify audio. The library is available at https://github.com/jsingh811/pyAudioProcessing and is under GPL-3.0 license.

Jyotika Singh

https://doi.org/10.25080/majora-212e5952-02e

Kiwi: Python Tool for Tex Processing and Classification

A user-friendly desktop tool for text visualization and classification. This allows users within the field to avoid creating boilerplate code for basic NLP tasks and users new to machine learning to plug and play with various models and methods. Our main goal is to make natural language processing accessible and easy.

Neelima Pulagam, Sai Marasani, Brian Sass

https://doi.org/10.25080/majora-212e5952-02f

Phylogeography: Analysis of genetic and climatic data of SARS-CoV-2

Due to the fact that the SARS-CoV-2 pandemic reaches its peak, researchers around the globe are combining efforts to investigate the genetics of different variants to better deal with its distribution. This paper discusses phylogeographic approaches to examine how patterns of divergence within SARS-CoV-2 coincide with geographic features, such as climatic features. First, we propose a python-based bioinformatic pipeline called **aPhylogeo** for phylogeographic analysis written in Python 3 that help researchers better understand the distribution of the virus in specific regions via a configuration file, and then run all the analysis operations in a single run. In particular, the aPhylogeo tool determines which parts of the genetic sequence undergo a high mutation rate depending on geographic conditions, using a sliding window that moves along the genetic sequence alignment in user-defined steps and a window size. As a Python-based cross-platform program, aPhylogeo works on Windows®, MacOS X® and GNU/Linux. The implementation of this pipeline is publicly available on GitHub (https://github.com/tahiri-lab/aPhylogeo). Second, we present an example of analysis of our new aPhylogeo tool on real data (SARS-CoV-2) to understand the occurrence of different variants.

Wanlin Li, Aleksandr Koshkarov, My-Linh Luu, +1

https://doi.org/10.25080/majora-212e5952-030

Design of a Scientific Data Analysis Support Platform

Studying the design features necessary for a workflow and experiment management system, and presenting Curifactory: an open source package that meets these design features.

Nathan Martindale, Jason Hite, Scott Stewart, +1

https://doi.org/10.25080/majora-212e5952-031

Opening ARM: A pivot to community software to meet the needs of users and stakeholders of the planet’s largest cloud observatory

This presentation discusses the evolution (and hurdles that came with) of the Atmospheric Radiation Measurement (ARM) program's open source endeavors, starting with the Python ARM Radar Toolkit to the Atmospheric data Community Toolkit in 2018, the expansion of our open-source presence on Github in 2019 and what is planned for the future.

Zachary Sherman, Scott Collis, Max Grover, +2

https://doi.org/10.25080/majora-212e5952-032

SciPy Tools Plenaries¶

SciPy Tools Plenary - CEL team

Introducing the Contributor Experience Lead team at the SciPy 2022

Inessa Pawson

https://doi.org/10.25080/majora-212e5952-043

SciPy Tools Plenary on Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.

Elliott Sales de Andrade

https://doi.org/10.25080/majora-212e5952-044

SciPy Tools Plenary - NumPy

Annual update on the NumPy project at SciPy 2022

Inessa Pawson

https://doi.org/10.25080/majora-212e5952-045

Lightning Talks¶

Downsampling Time Series Data for Visualizations

Exploring the largest triangle three bucket algorithm to downsample time series data.

Delaina Moore

https://doi.org/10.25080/majora-212e5952-027

Analysis as Applications: Quick introduction to lockfiles

An opinionated argument for the use of lockfiles in scientific analysis in a similar manner to Python application deployment. This talk was inspired by Brett Cannon's 'pip-secure-install' project and a Twitter conversation with Dustin Ingram on April 20, 2020.

Matthew Feickert

https://doi.org/10.25080/majora-212e5952-028

Proceedings of the 21st Python in Science Conference

Organization

Proceedings of the 21st Python in Science Conference