Proceedings of SciPy 2025
The 24rd annual SciPy conference was held in Tacoma, WA at the Tacoma Convention Center, July 7-13, 2025.
SciPy brings together attendees from industry, academia and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.
Full proceedings, posters and slides, and organizing committee can be found at https://
The CFSpy Python package is presented to compute reachable sets of nonlinear control-affine systems. The optimization part is done using SciPy Optimize.
This paper examines the acceleration of quantum chemistry calculations through modern implementations of Density Functional Theory (DFT). We present a comparative performance analysis between traditional frameworks and optimized implementations, demonstrating substantial computational efficiency gains. Applications to electrolyte membrane structure analysis illustrate practical benefits, enabling more extensive simulations in reduced timeframes. Benchmarks highlight speedups achieved via modern code optimization techniques in Python-based quantum chemistry environments.
Why can we solve some equations easily, while others seem impossible? And another thing: why is this knowledge hidden from us?
Scientific researchers need reproducible software environments for complex applications that can run across heterogeneous computing platforms. Modern open source tools, like Pixi and the CUDA conda-forge packages, provide reproducibility solutions for while providing high level semantics well suited for researchers.
Redist is a package for building model-agnostic binned-likelihood fits, allowing combination and enhancing the reinterpretation of datasets. Redist allows direct inference of theoretical parameters of any beyond the Standard Model scenario through histogram reweighting, properly taking into account changes in kinematic distributions.
The computational challenges in HEP require optimized solutions for handling complex data structures, parallel computing, and Just-in-Time (JIT) compilation. While Python and C++ remain the standard, Julia presents an opportunity to improve performance while maintaining usability.
Ocetrac is a Python package for tracking and analyzing spatiotemporal structures in gridded climate data. Building on its core submodule, which detects objects using morphological image processing, Ocetrac’s Measures submodule enables quantification of tracked objects, computing shape, motion, intensity, contextual, and temporal measures.
The field of experimental neurophysiology uses dozens of unique acquisition devices, many of which store data in their own proprietary formats, making data sharing a difficult task. Our library reads these formats and transforms them to the standard NWB format, a FAIR data structure supported by major data archives.
A plugin to perform active learning for fine-tuning deep learning models for bioimage analysis and understanding, all integrated within the napari viewer and Next-Generation-File-Formats ecosystem for large-scale image analysis.
OptiMask is a heuristic designed to compute the largest, not necessarily contiguous, submatrix of a matrix with missing data. It identifies the optimal set of columns and rows to remove to maximize the number of retained elements.
This paper presents early work on Numba v2, a general-purpose compiler for Python that integrates equality saturation (EqSat) as a foundation for program analysis and transformation. Whilst inspired in part by the needs of AI/ML workloads, and also supporting tensor optimizations, Numba v2 is not a tensor-oriented compiler. Instead, it provides a flexible framework where user-defined mathematical and domain-specific rewrites participate in the compilation process as a complement to the compilation supported by Numba today.
This article explores how SciPy’s spline fitting tools power the core of SplineCloud — an open platform for interactive constructing and sharing regression models. It introduces a modern workflow for creating, refining, and reusing curve-fitting models, making data modeling and analysis more transparent, collaborative, and reproducible.
We present Zamba, an open source Python package designed to streamline camera trap data processing through machine learning. Zamba supports inference and custom model training for species classification on videos and images, as well as depth estimation inference for videos.
For the past decade, tools like dbt have formed a cornerstone of the modern data stack, and Python-first alternatives couldn't compete with the scale and performance of modern SQL—until now. New integrations with Ibis, the portable Python dataframe library, enable building and orchestrating scalable data engineering pipelines using existing open-source libraries like Kedro, Pandera, and more.
We present Tesseract, a software ecosystem that provides pipeline-level automatic differentiation and differentiable physics programming at scale, and demonstrate its utility on a parametric shape optimization problem.
This paper demonstrates the framework of uncertain number which allows for a closed computation ecosystem whereby trustful computations can be conducted in a rigorous manner. This paper presents an overview of the main capabilities of the library PyUncertainNumber.
Accelerating the inference of learning methods for analysis at the world's largest scientific facility
This paper serves as a practical guide for software developers and data practitioners to understand, identify, and address data quality issues. We present a comprehensive taxonomy of 23 common data quality problems, synthesizing definitions and contexts from empirical software engineering and database literature. Subsequently, we catalogue 22 distinct methods for detecting these issues, categorizing them into actionable strategies.
This article presents a workflow that utilizes SAM's automatic mask generation skill to effectively perform the task of object detection zero-shot on a high-resolution drone orthomosaic. The generated output is 20% more spatially accurate than that produced using proprietary software, with 400% greater IoU.
This article quantifies the gender gap in the scientific Python community, analyzes its root causes, and proposes the Visibility–Invitation–Mechanism (VIM) framework as a practical solution. A case study of the IBM Women in AI (WAI) User Group is presented to demonstrate the framework's effectiveness.
This article explores how SHAP (SHapley Additive exPlanations) can be used to interpret machine learning model predictions by providing consistent and theoretically grounded feature attributions.
Algal blooms threaten human health and aquatic ecosystems, making monitoring essential. While Chlorophyll-A (Chl-a) effectively indicates algal presence, laboratory analysis is complex. This study utilizes satellite imagery as an alternative, addressing previous research limitations caused by scarce lab data.
This paper outlines a strategy to teach scientific analysis using Python, integrated with open science publishing concepts.
The paper introduces a lightweight pipelines for rewards-guided synthetic text generation using two NVIDIA products: NeMo and RAPIDS (i.e., cuDF, cuML)
This paper describes new functionality in OpenMC's depletion module for depleting materials independent of a neutron transport simulation. The paper valdiates the capability against transport-coupled depletion on a simple model.
Jupyter Book allows researchers and educators to create books and knowledge bases that are reusable, reproducible, and interactive. This new foundation introduces a scalable way to publish interactive computational content, support structured metadata, and enable content reuse across contexts.
The SciPy Proceedings have served as a cornerstone of scholarly communication within the scientific Python ecosystem since their introduction in 2008. In 2024, the publication process underwent a significant transformation, adopting a new open-source infrastructure built on MyST Markdown and Curvenote. This transition enabled a web-first, interactive publishing workflow that improves reproducibility, readability, and metadata quality.