Skip to contentSkip to frontmatterSkip to Backmatter

Proceedings of SciPy 2025

The 24rd annual SciPy conference was held in Tacoma, WA at the Tacoma Convention Center, July 7-13, 2025.

SciPy brings together attendees from industry, academia and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.

Full proceedings, posters and slides, and organizing committee can be found at https://proceedings.scipy.org/articles/hptk5424

CFSpy: A Python Library for the Computation of Chen-Fliess Series

The CFSpy Python package is presented to compute reachable sets of nonlinear control-affine systems. The optimization part is done using SciPy Optimize.

Ivan Perez Avellaneda
https://doi.org/10.25080/mfwm5796
Quantum Chemistry Acceleration: Comparative Performance Analysis of Modern DFT Implementations

This paper examines the acceleration of quantum chemistry calculations through modern implementations of Density Functional Theory (DFT). We present a comparative performance analysis between traditional frameworks and optimized implementations, demonstrating substantial computational efficiency gains. Applications to electrolyte membrane structure analysis illustrate practical benefits, enabling more extensive simulations in reduced timeframes. Benchmarks highlight speedups achieved via modern code optimization techniques in Python-based quantum chemistry environments.

Kyohei Sahara
https://doi.org/10.25080/dvta2583
Explore Solvable and Unsolvable Equations with SymPy

Why can we solve some equations easily, while others seem impossible? And another thing: why is this knowledge hidden from us?

Carl Kadie
https://doi.org/10.25080/kcht7476
Reproducible Machine Learning Workflows for Scientists with Pixi

Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like Pixi and the CUDA conda-forge packages, provide reproducibility solutions for while providing high level semantics well suited for researchers.

Matthew Feickert, Ruben Arts, and John Kirkham
https://doi.org/10.25080/nwuf8465
Redist, a python tool for model-agnostics binned-likelihood fits in High Energy Physics

Redist is a package for building model-agnostic binned-likelihood fits, allowing combination and enhancing the reinterpretation of datasets. Redist allows direct inference of theoretical parameters of any beyond the Standard Model scenario through histogram reweighting, properly taking into account changes in kinematic distributions.

Marco Colonna, Johannes Albrecht, Florian Bernlochner et al.
https://doi.org/10.25080/ppgk2467
Advancing High Energy Physics Data Analysis with Julia -- A Case for JuliaHEP

The computational challenges in HEP require optimized solutions for handling complex data structures, parallel computing, and Just-in-Time (JIT) compilation. While Python and C++ remain the standard, Julia presents an opportunity to improve performance while maintaining usability.

Ianna Osborne and Jerry 🦑 Ling
https://doi.org/10.25080/yunw5822
Ocetrac: An Object-based Framework for Tracking and Quantifying Climate Events in Gridded Datasets

Ocetrac is a Python package for tracking and analyzing spatiotemporal structures in gridded climate data. Building on its core submodule, which detects objects using morphological image processing, Ocetrac’s Measures submodule enables quantification of tracked objects, computing shape, motion, intensity, contextual, and temporal measures.

Cassia Cai, Jacob T. Cohen, Hillary Scannell et al.
https://doi.org/10.25080/vxkf4244
NeuroConv: Streamlining Neurophysiology Data Conversion to the NWB Standard

The field of experimental neurophysiology uses dozens of unique acquisition devices, many of which store data in their own proprietary formats, making data sharing a difficult task. Our library reads these formats and transforms them to the standard NWB format, a FAIR data structure supported by major data archives.

Heberto Mayorquin, Cody Baker, Paul Adkisson-Floro et al.
https://doi.org/10.25080/cehj4257
An Active Learning Plugin In Napari To Fine-Tune Models For Large-scale Bioimage Analysis

A plugin to perform active learning for fine-tuning deep learning models for bioimage analysis and understanding, all integrated within the napari viewer and Next-Generation-File-Formats ecosystem for large-scale image analysis.

Fernando Cervantes-Sanchez
https://doi.org/10.25080/ectn7568
Phlower: A Deep Learning Framework Supporting PyTorch Tensors with Physical Dimensions
Riku Sakamoto
https://doi.org/10.25080/vwty6796
OptiMask: Efficiently Finding the Largest NaN-Free Submatrix

OptiMask is a heuristic designed to compute the largest, not necessarily contiguous, submatrix of a matrix with missing data. It identifies the optimal set of columns and rows to remove to maximize the number of retained elements.

Cyril Joly
https://doi.org/10.25080/uaha7744
Numba v2: Towards a SuperOptimizing Python Compiler

This paper presents early work on Numba v2, a general-purpose compiler for Python that integrates equality saturation (EqSat) as a foundation for program analysis and transformation. Whilst inspired in part by the needs of AI/ML workloads, and also supporting tensor optimizations, Numba v2 is not a tensor-oriented compiler. Instead, it provides a flexible framework where user-defined mathematical and domain-specific rewrites participate in the compilation process as a complement to the compilation supported by Numba today.

Siu Kwan Lam, Stuart Archibald, and Stan Seibert
https://doi.org/10.25080/fncj2446
Enhancing Curve Fitting with SciPy: Interactive Spline Modeling and Reproducibility with SplineCloud

This article explores how SciPy’s spline fitting tools power the core of SplineCloud — an open platform for interactive constructing and sharing regression models. It introduces a modern workflow for creating, refining, and reusing curve-fitting models, making data modeling and analysis more transparent, collaborative, and reproducible.

Vadym Pasko
https://doi.org/10.25080/xydg9873
Zamba: Computer vision for wildlife conservation

We present Zamba, an open source Python package designed to streamline camera trap data processing through machine learning. Zamba supports inference and custom model training for species classification on videos and images, as well as depth estimation inference for videos.

Emily Dorne, Jay Qi, Peter Bull et al.
https://doi.org/10.25080/crcw9835
Python is all you need: an overview of the composable, Python-native data stack

For the past decade, tools like dbt have formed a cornerstone of the modern data stack, and Python-first alternatives couldn't compete with the scale and performance of modern SQL—until now. New integrations with Ibis, the portable Python dataframe library, enable building and orchestrating scalable data engineering pipelines using existing open-source libraries like Kedro, Pandera, and more.

Deepyaman Datta
https://doi.org/10.25080/wjjm7869
Pipeline-level differentiable programming for the real world

We present Tesseract, a software ecosystem that provides pipeline-level automatic differentiation and differentiable physics programming at scale, and demonstrate its utility on a parametric shape optimization problem.

Andrin Rehmann, Dion Häfner, Alessandro Angioi et al.
https://doi.org/10.25080/kvfm5762
Imprecise uncertainty management with uncertain numbers to facilitate trustworthy computations

This paper demonstrates the framework of uncertain number which allows for a closed computation ecosystem whereby trustful computations can be conducted in a rigorous manner. This paper presents an overview of the main capabilities of the library PyUncertainNumber.

Yu Chen and Scott Ferson
https://doi.org/10.25080/ahrt5264
On the Path to Seamless Python Serverless Data Analytics
Enrique Molina-Giménez, German T. Eizaguirre, and Pedro García-López
https://doi.org/10.25080/yjpp7553
Challenges and Implementations for ML Inference in High-energy Physics

Accelerating the inference of learning methods for analysis at the world's largest scientific facility

Sanjiban Sengupta and Lorenzo Moneta
https://doi.org/10.25080/rpaa9684
A Practical Guide to Data Quality: Problems, Detection, and Strategy

This paper serves as a practical guide for software developers and data practitioners to understand, identify, and address data quality issues. We present a comprehensive taxonomy of 23 common data quality problems, synthesizing definitions and contexts from empirical software engineering and database literature. Subsequently, we catalogue 22 distinct methods for detecting these issues, categorizing them into actionable strategies.

Edson Bomfim
https://doi.org/10.25080/tyyd7356
Performing Object Detection on Drone Orthomosaics with Meta's Segment Anything Model (SAM)

This article presents a workflow that utilizes SAM's automatic mask generation skill to effectively perform the task of object detection zero-shot on a high-resolution drone orthomosaic. The generated output is 20% more spatially accurate than that produced using proprietary software, with 400% greater IoU.

Nicholas McCarty
https://doi.org/10.25080/uhje9464
Unlocking the Missing 78%: Inclusive Communities for the Future of Scientific Python

This article quantifies the gender gap in the scientific Python community, analyzes its root causes, and proposes the Visibility–Invitation–Mechanism (VIM) framework as a practical solution. A case study of the IBM Women in AI (WAI) User Group is presented to demonstrate the framework's effectiveness.

Noor Aftab
https://doi.org/10.25080/ncmh8429
Explaining ML predictions with SHAP

This article explores how SHAP (SHapley Additive exPlanations) can be used to interpret machine learning model predictions by providing consistent and theoretically grounded feature attributions.

Avik Basu
https://doi.org/10.25080/mhum9729
Eyes in the sky: Estimating Inland Water Quality Using Landsat Data

Algal blooms threaten human health and aquatic ecosystems, making monitoring essential. While Chlorophyll-A (Chl-a) effectively indicates algal presence, laboratory analysis is complex. This study utilizes satellite imagery as an alternative, addressing previous research limitations caused by scarce lab data.

Kedar Dabhadkar
https://doi.org/10.25080/jcuj3732
Empowering Learners - Teaching Reproducible Research with Open-Source Tools

This paper outlines a strategy to teach scientific analysis using Python, integrated with open science publishing concepts.

Deborah Khider, Julien Emile-Geay, David Edge et al.
https://doi.org/10.25080/wcfp5784
A Lightweight Pipeline for Rewards-Guided Synthetic Text Generation Using NeMo and RAPIDS

The paper introduces a lightweight pipelines for rewards-guided synthetic text generation using two NVIDIA products: NeMo and RAPIDS (i.e., cuDF, cuML)

Jiajia Ding, Arham Mehta, and Nirmal Juluru
https://doi.org/10.25080/hprh2773
Extension of the OpenMC depletion module for transport-independent depletion

This paper describes new functionality in OpenMC's depletion module for depleting materials independent of a neutron transport simulation. The paper valdiates the capability against transport-coupled depletion on a simple model.

Oleksandr Yardas, Paul Romano, Madicken Muk et al.
https://doi.org/10.25080/ngdf5738
Jupyter Book 2 and the MyST Document Stack

Jupyter Book allows researchers and educators to create books and knowledge bases that are reusable, reproducible, and interactive. This new foundation introduces a scalable way to publish interactive computational content, support structured metadata, and enable content reuse across contexts.

Project Jupyter, Evan Bolyen, J Gregory Caporaso et al.
https://doi.org/10.25080/hwcj9957
SciPy Proceedings: An Exemplar for Publishing Computational Open Science

The SciPy Proceedings have served as a cornerstone of scholarly communication within the scientific Python ecosystem since their introduction in 2008. In 2024, the publication process underwent a significant transformation, adopting a new open-source infrastructure built on MyST Markdown and Curvenote. This transition enabled a web-first, interactive publishing workflow that improves reproducibility, readability, and metadata quality.

Rowan Cockett, Franklin Koch, and Steve Purves
https://doi.org/10.25080/frwc3537