ProceedingsSciPy ProceedingsContent License: Creative Commons Attribution 3.0 Unported (CC-BY-3.0)Credit must be given to the creatorProceedings of the 21st Python in Science ConferenceSciPy 2022, Austin, Texas July 11 - July 17July 11, 2022https://doi.org/10.25080/majora-212e5952-046Download PDFDownload BibtexBack to ArticlePosters and SlidesDownload ArticleContentsProceedings of the 21st Python in Science ConferenceOrganizationPosters and SlidesSponsored StudentsSupporting DocumentsOrganizationPosters and SlidesSponsored StudentsPosters and SlidesAccepted Paper Slides¶Building Binary Extensions with pybind11, scikit-build, and cibuildwheelBuilding Binary Extensions with pybind11, scikit-build, and cibuildwheelBuilding binary extensions is easier than ever thanks to several key libraries. Pybind11 provides a natural C++ language for extensions without requiring pre-processing or special dependencies. Scikit-build ties the premier C++ build system, CMake, into the Python extension build process. And cibuildwheel makes it easy to build highly compatible wheels for over 80 different platforms using CI or on your local machine.Henry Schreiner, Joe Rickerby, Ralf Grosse-Kunstleve, +5https://doi.org/10.25080/majora-212e5952-033Python Development Schemes for Monte Carlo Neutronics on High Performance ComputingPython Development Schemes for Monte Carlo Neutronics on High Performance ComputingWe investigate three methods of hardware accleeration on both GPUs and CPUs for a Monte Carlo neutron transport simulation code writen in Python. The accelerating schemes we examine are Pykokks, Numba, and hardware code generating libraries like PyCUDA. This work was supported by the Center for Exascale Monte-Carlo Neutron Transport (CEMeNT) a PSAAP-III project funded by the Department of Energy, grant number: DE-NA003967.Jackson P. Morgan, Kyle E. Niemeyerhttps://doi.org/10.25080/majora-212e5952-034Awkward Packaging: Building scikit-HEPAwkward Packaging: Building scikit-HEPScikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iMinuit are examples of libraries that are used beyond the original HEP focus.Henry Schreiner, Jim Pivarski, Eduardo Rodrigueshttps://doi.org/10.25080/majora-212e5952-035Development of Accessible, Aesthetically-Pleasing Color SequencesDevelopment of Accessible, Aesthetically-Pleasing Color SequencesMany types of data visualization, e.g., line plots and scatter plots, utilize a discrete palette of colors, a color sequence, to differentiate between the categories of data being plotted. Unfortunately, many commonly-used color sequences offer poor accessibility to individuals with color-vision deficiencies, using colors that such individuals find difficult to differentiate between. Here, the development of new, accessible color sequences is discussed. As new color sequences must be aesthetically pleasing if they are to see widespread adoption, a crowd-sourced survey was used to estimate aesthetic preference, while accessibility aspects were handled via quantitative analysis.Matthew A. Petroffhttps://doi.org/10.25080/majora-212e5952-036Cutting Edge Climate Science in the Cloud with PangeoCutting Edge Climate Science in the Cloud with PangeoClimate change is one of the most challenging issues of our time. To prevent the worst outcomes, we need to drastically accelerate the creation and distribution of scientific knowledge. But the complex and massive datasets produced by numerical climate models render the common 'download and analyze' workflow inefficient, blocking innovative analysis and fast scientific discoveries. We present python tools and cloud infrastructure developed within the Pangeo community, enabling cutting edge climate science from a web-browser, making it efficient, reproducible, and inclusive. To demonstrate these capabilities we will reproduce a plot from the IPCC report in a live cloud demonstration.Julius Buseckehttps://doi.org/10.25080/majora-212e5952-037Pylira: deconvolution of images in the presence of Poisson noisePylira: deconvolution of images in the presence of Poisson noisePylira is Python package for deconvolution for images in the presence of Poisson noise. In this presentation I will explain the method in detail, show the setup and API of the Python package as well as show application examples using real astronomical data.Axel Donath, Aneta Siemiginowska, Vinay Kashyap, +3https://doi.org/10.25080/majora-212e5952-038Accelerating Science with the Generative Toolkit for Scientific Discovery (GT4SD)Accelerating Science with the Generative Toolkit for Scientific Discovery (GT4SD)A presentation about GT4SD: an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.GT4SD teamhttps://doi.org/10.25080/majora-212e5952-039MModel: a modular modeling framework for scientific prototypingMModel: a modular modeling framework for scientific prototypingMModel is a Python framework that allows for fast and modular prototyping. The library uses networkx graph for workflow construction and provides built-in toolkits such as subgraph modification and graph visualization with rich metadata.Peter Sun, John A. Marohnhttps://doi.org/10.25080/majora-212e5952-03aMonaco: Quantify Uncertainty and Sensitivities in Your Computational Models with a Monte Carlo LibraryMonaco: Quantify Uncertainty and Sensitivities in Your Computational Models with a Monte Carlo LibraryQuantify uncertainty and sensitivities in your existing computational models with the “monaco” library. Users define input variables randomly drawn from any of SciPy's statistical distributions, run their model in parallel anywhere from 1 to millions of times, and postprocess the outputs to obtain meaningful, statistically significant conclusions. This talk goes over why you should always be running Monte Carlo simulations, a demo of how to set up and run a sim, and a crash course in generating relevant plots and statistics.W. Scott Shambaughhttps://doi.org/10.25080/majora-212e5952-03bUFuncs and DTypes: new possibilities in NumPyUFuncs and DTypes: new possibilities in NumPyOver the past three years, NumPy has seen large changes to much of its core functionalities including universal functions, casting, and DTypes. The goal of this refactoring was to introduce extensible APIs to improve existing user-defined DTypes and unlock new ones. This refactoring is nearing its conclusion, with the work being surfaced as public-facing API. In this talk we will discuss what has been done, and newly possible applications—such as a custom NumPy DType that is aware of physical units.Sebastian Berg, Stéfan van der Walthttps://doi.org/10.25080/majora-212e5952-03cPer Python ad astra: interactive Astrodynamics with poliastroPer Python ad astra: interactive Astrodynamics with poliastroThis talk presents poliastro, an open-source Python library for interactive Astrodynamics that features an easy-to-use API and tools for quick visualization. poliastro implements core Astrodynamics algorithms and leverages numba and Astropy. During the talk, we will describe the two-layer architecture that allows poliastro to offer an approachable API with good performance, discuss the challenges we faced to validate our code, and comment on the successes and failures of the project in trying to build a rich and diverse community. Source code of poliastro is available at https://github.com/poliastro/poliastro/ and documentation is online at https://docs.poliastro.space/.Juan Luis Cano Rodríguezhttps://doi.org/10.25080/majora-212e5952-03dpyampute: a Python library for data amputationpyampute: a Python library for data amputationAmputation is the opposite of imputation; it is the creation of a missing data mask for complete datasets. Amputation is useful for evaluating the effect of missing values on the outcome of a statistical or machine learning model. In this talk, we present pyampute: the first open-source Python library for data amputation. Our package is compatible with the scikit-learn-style fit and transform paradigm, which allows for seamless integration of amputation in a larger, more complex data processing pipeline.Rianne M Schouten, Davina Zamanzadeh, Prabhant Singhhttps://doi.org/10.25080/majora-212e5952-03eScientific Python: From GitHub to TikTokScientific Python: From GitHub to TikTokThe Scientific Python project aims to better coordinate the ecosystem and grow the community. This talk focuses on our efforts to expand our community by generating a welcoming and friendly environment where people collaborate, build, and improve together.Juanita Gomez Romero, Stéfan van der Walt, K. Jarrod Millman, +2https://doi.org/10.25080/majora-212e5952-03fScientific Python: By maintainers, for maintainersScientific Python: By maintainers, for maintainersTools for maintainers and how we can help each others.Pamphile T. Roy, Stéfan van der Walt, K. Jarrod Millman, +1https://doi.org/10.25080/majora-212e5952-040Improving random sampling in Python: scipy.stats.sampling and scipy.stats.qmcImproving random sampling in Python: scipy.stats.sampling and scipy.stats.qmcWhy and how to use scipy.stats.sampling and scipy.stats.qmc?Pamphile T. Roy, Matt Haberland, Christoph Baumgarten, +1https://doi.org/10.25080/majora-212e5952-041Petabyte-scale ocean data analytics on staggered grids via the grid ufunc protocol in xGCMPetabyte-scale ocean data analytics on staggered grids via the grid ufunc protocol in xGCMWe analysed the highest resolution global ocean simulation to date, using xGCM, xhistogram, and dask.Thomas Nicholas, Julius Busecke, Ryan Abernatheyhttps://doi.org/10.25080/majora-212e5952-042Accepted Posters¶Optimal Review Assignments for the SciPy Conference Using Binary Integer Linear Programming in SciPy 1.9Optimal Review Assignments for the SciPy Conference Using Binary Integer Linear Programming in SciPy 1.9Each year, the SciPy Conference receives hundreds of submissions, and dozens of volunteers offer to review them to help make selections for the conference. How should submissions be assigned to reviewers to distribute the work fairly while 1) ensuring that each submission receives at least three reviews, 2) preventing conflicts of interest, and 3) respecting reviewers' domains of expertise? Binary integer linear programming is an ideal framework for defining and solving 'scheduling' or 'assignment' problems like this. In this poster, we show how users can formulate and solve problems of this type with new, accessible tools in the scientific Python ecosystem.Matt Haberland, Nicholas McKibbenhttps://doi.org/10.25080/majora-212e5952-029Contributing to Open Source Software: From not knowing Python to becoming a Spyder core developerContributing to Open Source Software: From not knowing Python to becoming a Spyder core developerExperience overview of becoming an open source developer and updates on the work being done in the Spyder IDE project for 2022Daniel Althviz Moréhttps://doi.org/10.25080/majora-212e5952-02aSemi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Image LabelingSemi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Image LabelingPython GUI and library for semantic image segmentation and annotationNathan Jessurun, Olivia P. Dizon-Paradis, Dan E. Capecci, +2https://doi.org/10.25080/majora-212e5952-02bBioframe: Operating on Genomic Interval DataframesBioframe: Operating on Genomic Interval DataframesPython library for working with genomic interval dataframes.Nezar Abdennur, Geoffrey Fudenberg, Ilya M. Flyamer, +5https://doi.org/10.25080/majora-212e5952-02cLikeness: a toolkit for connecting the social fabric of place to human dynamicsLikeness: a toolkit for connecting the social fabric of place to human dynamicsRichly-attributed synthetic population data are crucial for discerning human dynamics while preserving privacy. The Likeness toolkit provides a solution to this problem with a suite of Python packages that generate population data as individual agents in appropriate nighttime locations and allocates them to probable daytime activity spaces. Through a case study utilizing students and faculty as agents, the results of Likeness simulations are shown to recreate high-fidelity school capacities, comparable to empirical data sources.Joseph V. Tuccillo, James D. Gaboardihttps://doi.org/10.25080/majora-212e5952-02dpyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning ModelingpyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning ModelingpyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models, and classifying data with existing pre-trained audio classification models or custom user-built models. This library contains features built in Python that were originally published in MATLAB. pyAudioProcessing allows the user to compute various features from audio files including Gammatone Frequency Cepstral Coefficients (GFCC), Mel Frequency Cepstral Coefficients (MFCC), spectral features, chroma features, and others such as beat-based and cepstrum-based features from audio. One can use these features along with one’s own classification backend or any of the popular scikit-learn classifiers that have been integrated into pyAudioProcessing. Cleaning functions to strip unwanted portions from the audio are another offering of the library. It further contains integrations with other audio functionalities such as frequency and time-series visualizations and audio format conversions. This software aims to provide machine learning engineers, data scientists, researchers, and students with a set of baseline models to classify audio. The library is available at https://github.com/jsingh811/pyAudioProcessing and is under GPL-3.0 license.Jyotika Singhhttps://doi.org/10.25080/majora-212e5952-02eKiwi: Python Tool for Tex Processing and ClassificationKiwi: Python Tool for Tex Processing and ClassificationA user-friendly desktop tool for text visualization and classification. This allows users within the field to avoid creating boilerplate code for basic NLP tasks and users new to machine learning to plug and play with various models and methods. Our main goal is to make natural language processing accessible and easy.Neelima Pulagam, Sai Marasani, Brian Sasshttps://doi.org/10.25080/majora-212e5952-02fPhylogeography: Analysis of genetic and climatic data of SARS-CoV-2Phylogeography: Analysis of genetic and climatic data of SARS-CoV-2Due to the fact that the SARS-CoV-2 pandemic reaches its peak, researchers around the globe are combining efforts to investigate the genetics of different variants to better deal with its distribution. This paper discusses phylogeographic approaches to examine how patterns of divergence within SARS-CoV-2 coincide with geographic features, such as climatic features. First, we propose a python-based bioinformatic pipeline called **aPhylogeo** for phylogeographic analysis written in Python 3 that help researchers better understand the distribution of the virus in specific regions via a configuration file, and then run all the analysis operations in a single run. In particular, the aPhylogeo tool determines which parts of the genetic sequence undergo a high mutation rate depending on geographic conditions, using a sliding window that moves along the genetic sequence alignment in user-defined steps and a window size. As a Python-based cross-platform program, aPhylogeo works on Windows®, MacOS X® and GNU/Linux. The implementation of this pipeline is publicly available on GitHub (https://github.com/tahiri-lab/aPhylogeo). Second, we present an example of analysis of our new aPhylogeo tool on real data (SARS-CoV-2) to understand the occurrence of different variants.Wanlin Li, Aleksandr Koshkarov, My-Linh Luu, +1https://doi.org/10.25080/majora-212e5952-030Design of a Scientific Data Analysis Support PlatformDesign of a Scientific Data Analysis Support PlatformStudying the design features necessary for a workflow and experiment management system, and presenting Curifactory: an open source package that meets these design features.Nathan Martindale, Jason Hite, Scott Stewart, +1https://doi.org/10.25080/majora-212e5952-031Opening ARM: A pivot to community software to meet the needs of users and stakeholders of the planet’s largest cloud observatoryOpening ARM: A pivot to community software to meet the needs of users and stakeholders of the planet’s largest cloud observatoryThis presentation discusses the evolution (and hurdles that came with) of the Atmospheric Radiation Measurement (ARM) program's open source endeavors, starting with the Python ARM Radar Toolkit to the Atmospheric data Community Toolkit in 2018, the expansion of our open-source presence on Github in 2019 and what is planned for the future.Zachary Sherman, Scott Collis, Max Grover, +2https://doi.org/10.25080/majora-212e5952-032SciPy Tools Plenaries¶SciPy Tools Plenary - CEL teamSciPy Tools Plenary - CEL teamIntroducing the Contributor Experience Lead team at the SciPy 2022Inessa Pawsonhttps://doi.org/10.25080/majora-212e5952-043SciPy Tools Plenary on MatplotlibSciPy Tools Plenary on MatplotlibMatplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.Elliott Sales de Andradehttps://doi.org/10.25080/majora-212e5952-044SciPy Tools Plenary - NumPySciPy Tools Plenary - NumPyAnnual update on the NumPy project at SciPy 2022Inessa Pawsonhttps://doi.org/10.25080/majora-212e5952-045Lightning Talks¶Downsampling Time Series Data for VisualizationsDownsampling Time Series Data for VisualizationsExploring the largest triangle three bucket algorithm to downsample time series data.Delaina Moorehttps://doi.org/10.25080/majora-212e5952-027Analysis as Applications: Quick introduction to lockfilesAnalysis as Applications: Quick introduction to lockfilesAn opinionated argument for the use of lockfiles in scientific analysis in a similar manner to Python application deployment. This talk was inspired by Brett Cannon's 'pip-secure-install' project and a Twitter conversation with Dustin Ingram on April 20, 2020.Matthew Feickerthttps://doi.org/10.25080/majora-212e5952-028Proceedings of the 21st Python in Science ConferenceOrganizationProceedings of the 21st Python in Science ConferenceSponsored Students