Contents
Proceedings of SciPy 2024
The 23rd annual SciPy conference will be held in Tacoma, WA at the Tacoma Convention Center, July 8-14, 2024.
SciPy brings together attendees from industry, academia and government to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.
Full proceedings, posters and slides, and organizing committee can be found at https://
Any notebook served: authoring and sharing reusable interactive widgets
Any notebook served: authoring and sharing reusable interactive widgets
Jupyter Widgets enable interactive code and data visualization in notebooks, but creating and distributing widgets across the Jupyter ecosystem is challenging. The anywidget project introduces a standard and toolset for portable, web-based widgets in various computing environments, simplifying development and extending compatibility beyond Jupyter. Its approach has fostered a rich widget ecosystem, driving the creation of new widgets and adoption of the standard by multiple platforms.
Trevor Manz, Nils Gehlenborg, Nezar Abdennur
https://doi.org/10.25080/NRPV2311
Making Research Data Flow With Python
Making Research Data Flow With Python
The increasing volume of research data in fields such as astronomy, biology, and engineering necessitates efficient distributed data management. This paper presents the Librarian, a custom framework designed for data transfer in large academic collaborations, designed for the Simons Observatory.
Josh Borrow, Paul La Plante, James Aguirre, +1
https://doi.org/10.25080/HWGA5253
Echostack: A flexible and scalable open-source software toolbox for echosounder data processing
Echostack: A flexible and scalable open-source software toolbox for echosounder data processing
Wu-Jung Lee, Valentina Staneva, Landung “Don” Setiawan, +5
https://doi.org/10.25080/WXRH8633
Scikit-build-core
Scikit-build-core
Discover how scikit-build-core revolutionizes Python extension building with its seamless integration of CMake and Python packaging standards. Learn about its enhanced features for cross-compilation, multi-platform support, and simplified configuration, which enable writing binary extensions with pybind11, Nanobind, Fortran, Cython, C++, and more.
Henry Schreiner, Jean-Christophe Fillion-Robin, Matt McCormick
https://doi.org/10.25080/FMKR8387
Model Share AI
Model Share AI
Machine learning is revolutionizing a wide range of research areas and industries, but many ML projects never progress past the proof-of-concept stage. To address this problem, we introduce Model Share AI, a platform designed to streamline collaborative model development, model provenance tracking, and model deployment.
Heinrich Peters, Michael Parrott
https://doi.org/10.25080/MDCE8355
Ecological and Geographic Influences on Cumacea Genetics in the Northern North Atlantic
Ecological and Geographic Influences on Cumacea Genetics in the Northern North Atlantic
Cumacea are vital indicators of benthic health in marine ecosystems. This study investigated the influence of environmental (i.e., biological or ecosystemic), climatic (i.e., meteorological or atmospheric), and geographic (i.e., spatial or regional) attributes on their genetic variability in the Northern North Atlantic, focusing on Icelandic waters.
Justin Gagnon, Nadia Tahiri
https://doi.org/10.25080/NVYF1037
Funix - The laziest way to build GUI apps in Python
Funix - The laziest way to build GUI apps in Python
Presenting a model or algorithm as a GUI application is a common need in the scientific and engineering community. Funix was created to automatically launch apps from existing Python functions, automatically selecting widgets based on the types of the arguments and returning functions according to the type-to-widget mapping defined in a theme.
Forrest Sheng Bao, Mike Qi, Ruixuan Tu, +1
https://doi.org/10.25080/JFYN3740
Cyanobacteria detection in small, inland water bodies with CyFi
Cyanobacteria detection in small, inland water bodies with CyFi
Harmful algal blooms pose major health risks to human and aquatic life. CyFi is an open-source Python package that enables detection of cyanobacteria in inland water bodies using 10-30m Sentinel-2 imagery and a computationally efficient tree-based machine learning model.
Emily Dorne, Katie Wetstone, Trista Brophy Cerquera, +1
https://doi.org/10.25080/PDHK7238
Continuous Tools for Scientific Publishing
Continuous Tools for Scientific Publishing
Science requires new mediums to compose ideas and ways to share research findings iteratively, as early as possible and connected directly to software and data. In this paper we discuss two tools for scientific authoring and publishing, MyST Markdown and Curvenote, and illustrate examples of improving metadata, reimagining the reading experience, including computational content, and transforming publishing practices for individuals and societies through automation and continuous practices.
Rowan Cockett, Steve Purves, Franklin Koch, +1
https://doi.org/10.25080/NKVC9349
geosnap: The Geospatial Neighborhood Analysis Package
geosnap: The Geospatial Neighborhood Analysis Package
Understanding neighborhood context is critical for social science research, public policy analysis, and urban planning. We introduce geosnap, the Geospatial Neighborhood Analysis Package, a suite of tools for exploring, modeling, and visualizing the social context and spatial extent of neighborhoods and regions over time.
Elijah Knaap, Sergio Rey
https://doi.org/10.25080/FVWM4182
Orchestrating Bioinformatics Workflows Across a Heterogeneous Toolset with Flyte
Orchestrating Bioinformatics Workflows Across a Heterogeneous Toolset with Flyte
Pryce Turner
https://doi.org/10.25080/DDJJ4932
Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration
Echodataflow: Recipe-based Fisheries Acoustics Workflow Orchestration
Valentina Staneva, Soham Butala, Landung (Don) Setiawan, +1
https://doi.org/10.25080/JXDK4427
How the Scientific Python ecosystem helps answer fundamental questions of the Universe
How the Scientific Python ecosystem helps answer fundamental questions of the Universe
Matthew Feickert, Nikolai Hartmann, Lukas Heinrich, +6
https://doi.org/10.25080/KMXN4784
RoughPy
RoughPy
Sam Morley, Terry Lyons
https://doi.org/10.25080/DXWY3560
ITK-Wasm
ITK-Wasm
Matthew McCormick, Paul Elliott
https://doi.org/10.25080/TCFJ5130
Supporting Greater Interactivity in the IPython Visualization Ecosystem
Supporting Greater Interactivity in the IPython Visualization Ecosystem
Nathan Martindale, Jacob Smith, Lisa Linville
https://doi.org/10.25080/GVHT1072
THEIA: An Offline Tool for Tradespace Visualization
THEIA: An Offline Tool for Tradespace Visualization
Samuel Williams, Scott Christensen, Marvin Brown
https://doi.org/10.25080/RVRR7774
Improving Code Quality with Array and DataFrame Type Hints
Improving Code Quality with Array and DataFrame Type Hints
This article demonstrates practical approaches to fully type-hinting generic NumPy arrays and StaticFrame DataFrames, and shows how the same annotations can improve code quality with both static analysis and runtime validation.
Christopher Ariza
https://doi.org/10.25080/WPXM6451
Predx-Tools
Predx-Tools
Histopathological images, which are digitized images of human or animal tissue, contain insights into disease state. We present PredX-Tools, a suite of simple and easy to use python GUI applications which facilitate analysis of histopathological images and provide a no-code platform for data scientists and researchers to perform analysis on raw and transformed data.
Brian Falkenstein, Shannon Quinn, Chakra Chennubhotla, +2
https://doi.org/10.25080/YCFW5807
Voice Computing with Python in Jupyter Notebooks
Voice Computing with Python in Jupyter Notebooks
Jupyter is a popular platform for writing interactive computational narratives that contain computer code and its output interleaved with prose that describes the code and the output. It is possible to use one’s voice to interact with Jupyter notebooks.
Blaine H. M. Mooers
https://doi.org/10.25080/MCYV2126
Evaluating Probabilistic Forecasters with sktime and tsbootstrap — Easy-to-Use, Configurable Frameworks for Reproducible Science
Evaluating Probabilistic Forecasters with sktime and tsbootstrap — Easy-to-Use, Configurable Frameworks for Reproducible Science
Evaluating probabilistic forecasts is complex and essential across various domains, yet no comprehensive software framework exists to simplify this task. Despite extensive literature on evaluation methodologies, current practices are fragmented and often lack reproducibility. To address this gap, we introduce a reproducible experimental workflow for evaluating probabilistic forecasting algorithms using the sktime package.
Benedikt Heidrich, Sankalp Gilda, Franz Kiraly
https://doi.org/10.25080/VPNX1595
AI-Driven Watermarking Technique for Safeguarding Text Integrity in the Digital Age
AI-Driven Watermarking Technique for Safeguarding Text Integrity in the Digital Age
Identifying the sources is vital for generative AI models, like ChatGPT and Bard, due to concerns about copyright infringement and plagiarism. In this paper, we explore text watermarking as a potential solution. We investigate techniques including physical watermarking and logical watermarking.
Atharva Rasane
https://doi.org/10.25080/DHKD1726
Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions
Computational Resource Optimisation in Feature Selection under Class Imbalance Conditions
Feature selection is crucial for reducing data dimensionality as well as enhancing model interpretability and performance in machine learning tasks. This study explores the possibility of performing feature selection on a subset of data to reduce the computational burden.
Amadi Gabriel Udu, Andrea Lecchini-Visintini, Steve R. Gunn, +3
https://doi.org/10.25080/TPGN6857
Mandala: Compositional Memoization for Simple & Powerful Scientific Data Management
Mandala: Compositional Memoization for Simple & Powerful Scientific Data Management
We present mandala, a Python library that largely eliminates the accidental complexity of scientific data management and incremental computing. While most traditional and/or popular data management solutions are based on logging, mandala takes a fundamentally different approach, using memoization of function calls as the fundamental unit of saving, loading, querying and deleting computational artifacts.
Aleksandar Makelov
https://doi.org/10.25080/JHPV7385
multinterp
multinterp
Multivariate interpolation is a fundamental tool in scientific computing used to approximate the values of a function between known data points in multiple dimensions. Despite its importance, the Python ecosystem offers a fragmented landscape of specialized tools for this task; the multinterp package was developed to address this challenge.
Alan Lujan
https://doi.org/10.25080/FGCJ9164
Python-Based GeoImagery Dataset Development for Deep Learning-Driven Forest Wildfire Detection
Python-Based GeoImagery Dataset Development for Deep Learning-Driven Forest Wildfire Detection
Valeria Martin, Derek Morgan, K. Brent Venable
https://doi.org/10.25080/YADT7194
Mamba Models a possible replacement for Transformers?
Mamba Models a possible replacement for Transformers?
Suvrakamal Das, Rounak Sen, Saikrishna Devendiran
https://doi.org/10.25080/XHDR4700
Algorithms to Determine Asteroid’s Physical Properties using Sparse and Dense Photometry, Robotic Telescopes and Open Data
Algorithms to Determine Asteroid’s Physical Properties using Sparse and Dense Photometry, Robotic Telescopes and Open Data
Arushi Nath
https://doi.org/10.25080/TWCF2755
Training a Supervised Cilia Segmentation Model from Self-Supervision
Training a Supervised Cilia Segmentation Model from Self-Supervision
Understanding cilia behavior is essential in diagnosing and treating such diseases, but, the tasks of automatically analyzing cilia are often a labor and time-intensive. In this work we overcome this bottleneck by developing a robust, self-supervised framework exploiting the visual similarity of normal and dysfunctional cilia.
Seyed Alireza Vaezi, Shannon Quinn
https://doi.org/10.25080/HXCJ6205