Proceedings of SciPy 2024

The 23rd annual SciPy conference will be held in Tacoma, WA at the Tacoma Convention Center, July 8-14, 2024.

SciPy brings together attendees from industry, academia and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.

Full proceedings, posters and slides, and organizing committee can be found at https://proceedings.scipy.org/articles/proceedings-2024

Any notebook served: authoring and sharing reusable interactive widgets

Jupyter Widgets enable interactive code and data visualization in notebooks, but creating and distributing widgets across the Jupyter ecosystem is challenging. The anywidget project introduces a standard and toolset for portable, web-based widgets in various computing environments, simplifying development and extending compatibility beyond Jupyter. Its approach has fostered a rich widget ecosystem, driving the creation of new widgets and adoption of the standard by multiple platforms.
Trevor Manz, Nils Gehlenborg, Nezar Abdennur
https://doi.org/10.25080/NRPV2311

Making Research Data Flow With Python

The increasing volume of research data in fields such as astronomy, biology, and engineering necessitates efficient distributed data management. This paper presents the Librarian, a custom framework designed for data transfer in large academic collaborations, designed for the Simons Observatory.
Josh Borrow, Paul La Plante, James Aguirre, +1
https://doi.org/10.25080/HWGA5253

Echostack: A flexible and scalable open-source software toolbox for echosounder data processing

Wu-Jung Lee, Valentina Staneva, Landung “Don” Setiawan, +5
https://doi.org/10.25080/WXRH8633

Scikit-build-core

Discover how scikit-build-core revolutionizes Python extension building with its seamless integration of CMake and Python packaging standards. Learn about its enhanced features for cross-compilation, multi-platform support, and simplified configuration, which enable writing binary extensions with pybind11, Nanobind, Fortran, Cython, C++, and more.
Henry Schreiner, Jean-Christophe Fillion-Robin, Matt McCormick
https://doi.org/10.25080/FMKR8387

Model Share AI

Machine learning is revolutionizing a wide range of research areas and industries, but many ML projects never progress past the proof-of-concept stage. To address this problem, we introduce Model Share AI, a platform designed to streamline collaborative model development, model provenance tracking, and model deployment.
Heinrich Peters, Michael Parrott
https://doi.org/10.25080/MDCE8355

Ecological and Geographic Influences on Cumacea Genetics in the Northern North Atlantic

Cumacea are vital indicators of benthic health in marine ecosystems. This study investigated the influence of environmental (i.e., biological or ecosystemic), climatic (i.e., meteorological or atmospheric), and geographic (i.e., spatial or regional) attributes on their genetic variability in the Northern North Atlantic, focusing on Icelandic waters.
Justin Gagnon, Nadia Tahiri
https://doi.org/10.25080/NVYF1037

Funix - The laziest way to build GUI apps in Python

Presenting a model or algorithm as a GUI application is a common need in the scientific and engineering community. Funix was created to automatically launch apps from existing Python functions, automatically selecting widgets based on the types of the arguments and returning functions according to the type-to-widget mapping defined in a theme.
Forrest Sheng Bao, Mike Qi, Ruixuan Tu, +1
https://doi.org/10.25080/JFYN3740

Cyanobacteria detection in small, inland water bodies with CyFi

Harmful algal blooms pose major health risks to human and aquatic life. CyFi is an open-source Python package that enables detection of cyanobacteria in inland water bodies using 10-30m Sentinel-2 imagery and a computationally efficient tree-based machine learning model.
Emily Dorne, Katie Wetstone, Trista Brophy Cerquera, +1
https://doi.org/10.25080/PDHK7238

Continuous Tools for Scientific Publishing

Science requires new mediums to compose ideas and ways to share research findings iteratively, as early as possible and connected directly to software and data. In this paper we discuss two tools for scientific authoring and publishing, MyST Markdown and Curvenote, and illustrate examples of improving metadata, reimagining the reading experience, including computational content, and transforming publishing practices for individuals and societies through automation and continuous practices.
Rowan Cockett, Steve Purves, Franklin Koch, +1
https://doi.org/10.25080/NKVC9349

geosnap: The Geospatial Neighborhood Analysis Package

Understanding neighborhood context is critical for social science research, public policy analysis, and urban planning. We introduce geosnap, the Geospatial Neighborhood Analysis Package, a suite of tools for exploring, modeling, and visualizing the social context and spatial extent of neighborhoods and regions over time.
Elijah Knaap, Sergio Rey
https://doi.org/10.25080/FVWM4182

Orchestrating Bioinformatics Workflows Across a Heterogeneous Toolset with Flyte

Pryce Turner
https://doi.org/10.25080/DDJJ4932

Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration

Valentina Staneva, Soham Butala, Landung (Don) Setiawan, +1
https://doi.org/10.25080/JXDK4427

How the Scientific Python ecosystem helps answer fundamental questions of the Universe

Matthew Feickert, Nikolai Hartmann, Lukas Heinrich, +6
https://doi.org/10.25080/KMXN4784

RoughPy

Sam Morley, Terry Lyons
https://doi.org/10.25080/DXWY3560

ITK-Wasm

Matthew McCormick, Paul Elliott
https://doi.org/10.25080/TCFJ5130

Supporting Greater Interactivity in the IPython Visualization Ecosystem

Nathan Martindale, Jacob Smith, Lisa Linville
https://doi.org/10.25080/GVHT1072

THEIA: An Offline Tool for Tradespace Visualization

Samuel Williams, Scott Christensen, Marvin Brown
https://doi.org/10.25080/RVRR7774

Improving Code Quality with Array and DataFrame Type Hints

This article demonstrates practical approaches to fully type-hinting generic NumPy arrays and StaticFrame DataFrames, and shows how the same annotations can improve code quality with both static analysis and runtime validation.
Christopher Ariza
https://doi.org/10.25080/WPXM6451

Predx-Tools

Histopathological images, which are digitized images of human or animal tissue, contain insights into disease state. We present PredX-Tools, a suite of simple and easy to use python GUI applications which facilitate analysis of histopathological images and provide a no-code platform for data scientists and researchers to perform analysis on raw and transformed data.
Brian Falkenstein, Shannon Quinn, Chakra Chennubhotla, +2
https://doi.org/10.25080/YCFW5807

Voice Computing with Python in Jupyter Notebooks

Jupyter is a popular platform for writing interactive computational narratives that contain computer code and its output interleaved with prose that describes the code and the output. It is possible to use one’s voice to interact with Jupyter notebooks.
Blaine H. M. Mooers
https://doi.org/10.25080/MCYV2126

Evaluating Probabilistic Forecasters with sktime and tsbootstrap — Easy-to-Use, Configurable Frameworks for Reproducible Science

Evaluating probabilistic forecasts is complex and essential across various domains, yet no comprehensive software framework exists to simplify this task. Despite extensive literature on evaluation methodologies, current practices are fragmented and often lack reproducibility. To address this gap, we introduce a reproducible experimental workflow for evaluating probabilistic forecasting algorithms using the sktime package.
Benedikt Heidrich, Sankalp Gilda, Franz Kiraly
https://doi.org/10.25080/VPNX1595

AI-Driven Watermarking Technique for Safeguarding Text Integrity in the Digital Age

Identifying the sources is vital for generative AI models, like ChatGPT and Bard, due to concerns about copyright infringement and plagiarism. In this paper, we explore text watermarking as a potential solution. We investigate techniques including physical watermarking and logical watermarking.
Atharva Rasane
https://doi.org/10.25080/DHKD1726

Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions

Feature selection is crucial for reducing data dimensionality as well as enhancing model interpretability and performance in machine learning tasks. This study explores the possibility of performing feature selection on a subset of data to reduce the computational burden.
Amadi Gabriel Udu, Andrea Lecchini-Visintini, Steve R. Gunn, +3
https://doi.org/10.25080/TPGN6857

Mandala: Compositional Memoization for Simple & Powerful Scientific Data Management

We present mandala, a Python library that largely eliminates the accidental complexity of scientific data management and incremental computing. While most traditional and/or popular data management solutions are based on logging, mandala takes a fundamentally different approach, using memoization of function calls as the fundamental unit of saving, loading, querying and deleting computational artifacts.
Aleksandar Makelov
https://doi.org/10.25080/JHPV7385

multinterp

Multivariate interpolation is a fundamental tool in scientific computing used to approximate the values of a function between known data points in multiple dimensions. Despite its importance, the Python ecosystem offers a fragmented landscape of specialized tools for this task; the multinterp package was developed to address this challenge.
Alan Lujan
https://doi.org/10.25080/FGCJ9164

Python-Based GeoImagery Dataset Development for Deep Learning-Driven Forest Wildfire Detection

Valeria Martin, Derek Morgan, K. Brent Venable
https://doi.org/10.25080/YADT7194

Mamba Models a possible replacement for Transformers?

Suvrakamal Das, Rounak Sen, Saikrishna Devendiran
https://doi.org/10.25080/XHDR4700

Algorithms to Determine Asteroid’s Physical Properties using Sparse and Dense Photometry, Robotic Telescopes and Open Data

Arushi Nath
https://doi.org/10.25080/TWCF2755

Training a Supervised Cilia Segmentation Model from Self-Supervision

Understanding cilia behavior is essential in diagnosing and treating such diseases, but, the tasks of automatically analyzing cilia are often a labor and time-intensive. In this work we overcome this bottleneck by developing a robust, self-supervised framework exploiting the visual similarity of normal and dysfunctional cilia.
Seyed Alireza Vaezi, Shannon Quinn
https://doi.org/10.25080/HXCJ6205