Research Articles - SciPy Proceedings

2018

Proceedings of the Python in Science Conference 2018

There are 24 articles published in this collection

2018 | Article

Yaksh is a free and open-source online evaluation platform. At its core, Yaksh focuses on problem-based learning and lets teachers create practice exercises and quizzes which are evaluated in real-time.

Prabhu Ramachandran, Prathamesh Salunke, Ankit Javalkar, Aditya Palaparthy, Mahesh Gudi, Hardik Ghaghada

2018 | Article

signac: A Python framework for data and workflow management

Computational research requires versatile data and workflow management tools that can easily adapt to the highly dynamic requirements of scientific investigations. Many existing tools require strict adherence to a particular usage pattern, so researchers often use less robust ad hoc solutions that they find easier to adopt.

Vyas Ramasubramani, Carl S. Adorf, Paul M. Dodd, Bradley D. Dice, Sharon C. Glotzer

2018 | Article

Scalable Feature Extraction with Aerial and Satellite Imagery

Deep learning techniques have greatly advanced the performance of the already rapidly developing field of computer vision, which powers a variety of emerging technologies—from facial recognition to augmented reality to self-driving cars.

Virginia Ng, Daniel Hofmann

2018 | Article

A Bayesian’s journey to a better research workflow

This work began when the two authors met at a software development meeting. Konstantinos was building Bayesian models in his research and wanted to learn how to better manage his research process. Marianne was working on data analysis workflows in industry and wanted to learn more about Bayesian statistics.

Konstantinos Vamvourellis, Marianne Corvellec

2018 | Article

Design and Implementation of pyPRISM: A Polymer Liquid-State Theory Framework

In this work, we describe the code structure, implementation, and usage of a Python-based, open-source framework, pyPRISM, for conducting polymer liquid-state theory calculations. Polymer Reference Interaction Site Model (PRISM) theory describes the equilibrium spatial-correlations, thermodynamics, and structure of liquid-like polymer systems and macromolecular materials.

Tyler B. Martin, Thomas E. Gartner III, Ronald L. Jones, Chad R. Snyder, Arthi Jayaraman

2018 | Article

Spatio-temporal analysis of socioeconomic neighborhoods: The Open Source Longitudinal Neighborhood Analysis Package (OSLNAP)

The neighborhood effects literature represents a wide span of the social sciences broadly concerned with the influence of spatial context on social processes. From the study of segregation dynamics, the relationships between the built environment and health outcomes, to the impact of concentrated poverty on social efficacy, neighborhoods are a central construct in empirical work.

Sergio Rey, Elijah Knaap, Su Han, Levi Wolf, Wei Kang

2018 | Article

Binder 2.0 - Reproducible, interactive, sharable environments for science at scale

Binder is an open source web service that lets users create sharable, interactive, reproducible environments in the cloud. It is powered by other core projects in the open source ecosystem, including JupyterHub and Kubernetes for managing cloud resources.

Project Jupyter, Matthias Bussonnier, Jessica Forde, Jeremy Freeman, Brian Granger, Tim Head, Chris Holdgraf, Kyle Kelley, Gladys Nalvarte, Andrew Osheroff, M Pacer, Yuvi Panda, Fernando Perez, Benjamin Ragan-Kelley, Carol Willing

2018 | Article

Harnessing the Power of Scientific Python to Investigate Biogeochemistry and Metaproteomes of the Central Pacific Ocean

Oceanographic expeditions commonly generate millions of data points for various chemical, biological, and physical features, all in different formats. Scientific Python tools are extremely useful for synthesizing this data to make sense of major trends in the changing ocean environment.

Noelle A. Held, Jaclyn K. Saunders, Joe Futrelle, Mak A. Saito

2018 | Article

Organic Molecules in Space: Insights from the NASA Ames Molecular Database in the era of the James Webb Space Telescope

We present the software tool pyPAHdb to the scientific astronomical community, which is used to characterize emission from one of the most prevalent types of organic molecules in space, namely polycyclic aromatic hydrocarbons (PAHs).

Matthew J. Shannon, Christiaan Boersma

2018 | Article

Real-Time Digital Signal Processing Using pyaudio\_helper and the ipywidgets

The focus of this paper is on teaching real-time digital signal processing to electrical and computer engineers using the Jupyter notebook and the code module pyaudio\_helper, which is a component of the package scikit-dsp-comm.

Mark Wickert

2018 | Article

Exploring the Extended Kalman Filter for GPS Positioning Using Simulated User and Satellite Track Data

This paper describes a Python computational tool for exploring the use of the extended Kalman filter (EKF) for position estimation using the Global Positioning System (GPS) pseudorange measurements. The development was motivated by the need for an example generator in a training class on Kalman filtering, with emphasis on GPS.

Mark Wickert, Chiranth Siddappa

2018 | Article

WrightSim: Using PyCUDA to Simulate Multidimensional Spectra

Nonlinear multidimensional spectroscopy (MDS) is a powerful experimental technique used to interrogate complex chemical systems. MDS promises to reveal energetics, dynamics, and coupling features of and between the many quantum-mechanical states that these systems contain.

Kyle F Sunden, Blaise J Thompson, John C Wright

2018 | Article

Bringing ipywidgets Support to plotly.py

Plotly.js is a declarative JavaScript data visualization library built on D3 and WebGL that supports a wide range of statistical, scientific, financial, geographic, and 3-dimensional visualizations. Support for creating Plotly.

Jon Mease

2018 | Article

Sparse: A more modern sparse array library

This paper is about sparse multi-dimensional arrays in Python. We discuss their applications, layouts, and current implementations in the SciPy ecosystem along with strengths and weaknesses. We then introduce a new package for sparse arrays that builds on the legacy of the scipy.

Hameer Abbasi

2018 | Article

Text and data mining scientific articles with allofplos

Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use.

Elizabeth Seiver, M Pacer, Sebastian Bassi

2018 | Article

Safe handling instructions for missing data

In machine learning tasks, it is common to handle missing data by removing observations with missing values, or replacing missing data with the mean value for its feature. To show why this is problematic, we use listwise deletion and mean imputing to recover missing values from artificially created datasets, and we compare those models against ones with full information.

Dillon Niederhut

2018 | Article

EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks

Building environmental simulation workflows is typically a slow process involving multiple proprietary desktop tools that do not interoperate well. In this work, we demonstrate building flexible, lightweight workflows entirely in Jupyter notebooks.

Dharhas Pothina, Philipp J. F. Rudiger, James A Bednar, Scott Christensen, Kevin Winters, Kimberly Pevey, Christopher E. Ball, Gregory Brener

2018 | Article

Practical Applications of Astropy

Packages developed under the auspices of the Astropy Project (astropy2013, astropy2018) address many common problems faced by astronomers in their computational projects. In this paper we describe how capabilities provided by Astropy have been employed in two current projects.

David Shupe, Frank Masci, Russ Laher, Ben Rusholme, Lee Armus

2018 | Article

Developing a Start-to-Finish Pipeline for Accelerometer-Based Activity Recognition Using Long Short-Term Memory Recurrent Neural Networks

Increased prevalence of smartphones and wearable devices has facilitated the collection of triaxial accelerometer data for numerous Human Activity Recognition (HAR) tasks. Concurrently, advances in the theory and implementation of long short-term memory (LSTM) recurrent neural networks (RNNs) has made it possible to process this data in its raw form, enabling on-device online analysis.

Christian McDaniel, Shannon Quinn

2018 | Article

The Econ-ARK and HARK: Open Source Tools for Computational Economics

The Economics Algorithmic Repository and toolKit (Econ-ARK) aims to become a focal resource for computational economics. Its first ‘framework,’ the Heterogeneous Agent Resources and Toolkit (HARK), provides a modern, robust, transparent set of tools to solve a class of macroeconomic models whose usefulness has become increasingly apparent both for economic policy and for research purposes, but whose adoption has been limited because the existing literature derives from idiosyncratic, hand-crafted, and often impenetrable legacy code.

Christopher D. Carroll, Alexander M. Kaufman, Jacqueline L. Kazil, Nathan M. Palmer, Matthew N. White

2018 | Article

Composable Multi-Threading and Multi-Processing for Numeric Libraries

Python is popular among scientific communities that value its simplicity and power, especially as it comes along with numeric libraries such as NumPy, SciPy, Dask, and Numba. As CPU core counts keep increasing, these modules can make use of many cores via multi-threading for efficient multi-core parallelism.

Anton Malakhov, David Liu, Anton Gorshkov, Terry Wilmarth

2018 | Article

Equity, Scalability, and Sustainability of Data Science Infrastructure

We seek to understand the current state of equity, scalability, and sustainability of data science education infrastructure in both the U.S. and Canada. Our analysis of the technological, funding, and organizational structure of four types of institutions shows an increasing divergence in the ability of universities across the United States to provide students with accessible data science education infrastructure, primarily JupyterHub.

Anthony Suen, Laura Norén, Alan Liang, Andrea Tu

2018 | Article

Dynamic Social Network Modeling of Diffuse Subcellular Morphologies

The use of fluorescence microscopy has catalyzed new insights into biological function, and spurred the development of quantitative models from rich biomedical image datasets. While image processing in some capacity is commonplace for extracting and modeling quantitative knowledge from biological systems at varying scales, general-purpose approaches for more advanced modeling are few.

Andrew Durden, Allyson T Loy, Barbara Reaves, Mojtaba Fazli, Abigail Courtney, Frederick D Quinn, S Chakra Chennubhotla, Shannon P Quinn

2018 | Article

Cloudknot: A Python Library to Run your Existing Code on AWS Batch

We introduce Cloudknot, a software library that simplifies cloud-based distributed computing by programmatically executing user-defined functions (UDFs) in AWS Batch. It takes as input a Python function, packages it as a container, creates all the necessary AWS constituent resources to submit jobs, monitors their execution and gathers the results, all from within the Python environment.

Adam Richie-Halford, Ariel Rokem

2019

Proceedings of the Python in Science Conference 2019

There are 20 articles published in this collection

2019 | Article

PMDA - Parallel Molecular Dynamics Analysis

MDAnalysis is an object-oriented Python library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats. With the development of highly optimized MD software packages on high performance computing (HPC) resources, the size of simulation trajectories is growing up to many terabytes in size.

Shujie Fan, Max Linke, Ioannis Paraskevakos, Richard J. Gowers, Michael Gecht, Oliver Beckstein

2019 | Article

Visualization of Bioinformatics Data with Dash Bio

Plotly's Dash is a library that empowers data scientists to create interactive web applications declaratively in Python. Dash Bio is a bioinformatics-oriented suite of components that are compatible with Dash.

Shammamah Hossain

2019 | Article

Better and faster hyperparameter optimization with Dask

Nearly every machine learning model requires hyperparameters, parameters that the user must specify before training begins and influence model performance. Finding the optimal set of hyperparameters is often a time- and resource-consuming process.

Scott Sievert, Tom Augspurger, Matthew Rocklin

2019 | Article

PyDDA: A new Pythonic Wind Retrieval Package

PyDDA is a new community framework aimed at wind retrievals that depends only upon utilities in the SciPy ecosystem such as scipy, numpy, and dask. It can support retrievals of winds using information from weather radar networks constrained by high resolution forecast models over grids that cover thousands of kilometers at kilometer-scale resolution.

Robert Jackson, Scott Collis, Timothy Lang, Corey Potvin, Todd Munson

2019 | Article

Parkinson's Classification and Feature Extraction from Diffusion Tensor Images

Parkinson’s disease (PD) affects over 6.2 million people around the world. Despite its prevalence, there is still no cure, and diagnostic methods are extremely subjective, relying on observation of physical motor symptoms and response to treatment protocols.

Rajeswari Sivakumar, Shannon Quinn

2019 | Article

PyLZJD: An Easy to Use Tool for Machine Learning

As Machine Learning (ML) becomes more widely known and popular, so too does the desire for new users from other backgrounds to apply ML techniques to their own domains. A difficult prerequisite that often confounds new users is the feature creation and engineering process.

Edward Raff, Joe Aurelio, Charles Nicholas

2019 | Article

Parameter Estimation Using the Python Package pymcmcstat

A Bayesian approach to solving inverse problems provides insight regarding model limitations as well as the underlying model and observation uncertainty. In this paper we introduce pymcmcstat, which provides a wide variety of tools for estimating unknown parameter distributions.

Paul R. Miles, Ralph C. Smith

2019 | Article

An intelligent shopping list based on the application of partitioning and machine learning algorithms

A grocery list is an integral part of the shopping experience of many consumers. Several mobile retail studies of grocery apps indicate that potential customers place the highest priority on features that help them to create and manage personalized shopping lists.

Nadia Tahiri, Bogdan Mazoure, Vladimir Makarenkov

2019 | Article

A Real-Time 3D Audio Simulator for Cognitive Hearing Science

This paper describes the development of a 3D audio simulator for use in cognitive hearing science studies and also for general 3D audio experimentation. The framework that the simulator is built upon is pyaudio\_helper, which is a module of the package scikit-dsp-comm.

Mark Wickert

2019 | Article

Optimizing Python-Based Spectroscopic Data Processing on NERSC Supercomputers

We present a case study of optimizing a Python-based cosmology data processing pipeline designed to run in parallel on thousands of cores using supercomputers at the National Energy Research Scientific Computing Center (NERSC).

Laurie A. Stephey, Rollin C. Thomas, Stephen J. Bailey

2019 | Article

Solving Polynomial Systems with phcpy

The solutions of a system of polynomials in several variables are often needed, e.g.: in the design of mechanical systems, and in phase-space analyses of nonlinear biological dynamics. Reliable, accurate, and comprehensive numerical solutions are available through PHCpack, a FOSS package for solving polynomial systems with homotopy continuation.

Jasmine Otto, Angus Forbes, Jan Verschelde

2019 | Article

Case study: Real-world machine learning application for hardware failure detection

When designing microprocessors, engineers must verify whether the proposed design, defined in hardware description language, does what is intended. During this verification process, engineers run simulation tests and can fix bugs if the tests have failed.

Hongsup Shin

2019 | Article

Codebraid: Live Code in Pandoc Markdown

Codebraid executes code blocks and inline code in Pandoc Markdown documents as part of the document build process. Code can be executed with a built-in system or Jupyter kernels. Either way, a single document can involve multiple programming languages, as well as multiple independent sessions or processes per language.

Geoffrey M. Poore

2019 | Article

pyjanitor: A Cleaner API for Cleaning Data

The pandas library has become the de facto library for data wrangling in the Python programming language. However, inconsistencies in the pandas application programming interface (API), while idiomatic due to historical use, prevent use of expressive, fluent programming idioms that enable self-documenting pandas code.

Eric J. Ma, Zachary Barry, Sam Zuckerman, Zachary Sailer

2019 | Article

Developing a Graph Convolution-Based Analysis Pipeline for Multi-Modal Neuroimage Data: An Application to Parkinson's Disease

Parkinson's disease (PD) is a highly prevalent neurodegenerative condition originating in subcortical areas of the brain and resulting in progressively worsening motor, cognitive, and psychiatric (e.g.

Christian McDaniel, Shannon Quinn, PhD

2019 | Article

CAF Implementation on FPGA Using Python Tools

The purpose of this project is to provide a real time geolocation solution by generating code for the complex ambiguity function (CAF) in a hardware description language (HDL) and the implementation on FPGA hardware.

Chiranth Siddappa, Mark Wickert

2019 | Article

Analyzing Particle Systems for Machine Learning and Data Visualization with freud

The freud Python library analyzes particle data output from molecular dynamics simulations. The library's design and its variety of high-performance methods make it a powerful tool for many modern applications.

Bradley D. Dice, Vyas Ramasubramani, Eric S. Harper, Matthew P. Spellings, Joshua A. Anderson, Sharon C. Glotzer

2019 | Article

Accelerating the Advancement of Data Science Education

We outline a synthesis of strategies created in collaboration with 35+ colleges and universities on how to advance undergraduate data science education on a national scale. The four core pillars of this strategy include the integration of data science education across all domains, establishing adoptable and scalable cyberinfrastructure, applying data science to non-traditional domains, and incorporating ethical content into data science curricula.

Eric Van Dusen, Anthony Suen, Alan Liang, Amal Bhatnagar

2019 | Article

Deep and Ensemble Learning to Win the Army RCO AI Signal Classification Challenge

Automatic modulation classification is a challenging problem with multiple applications including cognitive radio and signals intelligence. Most of the existing efforts to solve this problem are only applicable when the signal to noise ratio (SNR) is high and/or long observations of the signal are available.

Andres Vila, Donna Branchevsky, Kyle Logue, Sebastian Olsen, Esteban Valles, Darren Semmen, Alex Utter, Eugene Grayver

2019 | Article

Expert RF Feature Extraction to Win the Army RCO AI Signal Classification Challenge

Kyle Logue, Esteban Valles, Andres Vila, Alex Utter, Darren Semmen, Eugene Grayver, Sebastian Olsen, Donna Branchevsky

2020

Proceedings of the Python in Science Conference 2020

There are 23 articles published in this collection

2020 | Article

Towards an Unsupervised Spatiotemporal Representation of Cilia Video Using A Modular Generative Pipeline

Motile cilia are a highly conserved organelle found on the exterior of many human cells. Cilia beat in rhythmic patterns to transport substances or generate signaling gradients. Disruption of these patterns is often indicative of diseases known as ciliopathies, whose consequences can include dysfunction of macroscopic structures within the lungs, kidneys, brain, and other organs.

Meekail Zain, Sonia Rao, Nathan Safir, Quinn Wyner, Isabella Humphrey, Alex Eldridge, Chenxiao Li, BahaaEddin AlAila, Shannon Quinn

2020 | Article

Falsify your Software: validating scientific code with property-based testing

Where traditional example-based tests check software using manually-specified input-output pairs, property-based tests exploit a general description of valid inputs and program behaviour to automatically search for falsifying examples.

Zac Hatfield-Dodds

2020 | Article

Software Engineering as Research Method: Aligning Roles in Econ-ARK

While general purpose scientific software has enjoyed great success in industry and academia, domain specific scientific software has not yet become well-established in many disciplines where it has potential.

Sebastian Benthall, Mridul Seth

2020 | Article

SHADOW: A workflow scheduling algorithm reference and testing framework

As the scale of science projects increase, so does the demand on computing infrastructures. The complexity of science processing pipelines, and the heterogeneity of the environments on which they are run, continues to increase; in order to deal with this, the algorithmic approaches to executing these applications must also be adapted and improved to deal with this increased complexity.

Ryan W. Bunney, Andreas Wicenec, Mark Reynolds

2020 | Article

Leading magnetic fusion energy science into the big-and-fast data lane

We present Delta, a Python framework that connects magnetic fusion experiments to high-performance computing (HPC) facilities in order leverage advanced data analysis for near real-time decisions. Using the ADIOS I/O framework, Delta streams measurement data with over 300 MByte/sec from a remote experimental site in Korea to Cori, a Cray XC-40 supercomputer at the National Energy Energy Research Scientific Computing Centre in California.

Ralph Kube, R Michael Churchill, Jong Youl Choi, Ruonan Wang, Scott Klasky, CS Chang, Minjun J. Choi, Jinseop Park

2020 | Article

Pydra - a flexible and lightweight dataflow engine for scientific analyses

This paper presents a new lightweight dataflow engine written in Python: Pydra. Pydra is developed as an open-source project in the neuroimaging community, but it is designed as a general-purpose dataflow engine to support any scientific domain.

Dorota Jarecka, Mathias Goncalves, Christopher J. Markiewicz, Oscar Esteban, Nicole Lo, Jakub Kaczmarzyk, Satrajit Ghosh

2020 | Article

Combining Physics-Based and Data-Driven Modeling for Pressure Prediction in Well Construction

A framework for combining physics-based and data-driven models to improve well construction is presented in this study. Additionally, the proposed approach provides a more robust and accurate model that mitigates the disadvantages of using purely physics-based or data-driven models.

Oney Erge, Eric van Oort

2020 | Article

pandera: Statistical Data Validation of Pandas Dataframes

pandas is an essential tool in the data scientist’s toolkit for modern data engineering, analysis, and modeling in the Python ecosystem. However, dataframes can often be difficult to reason about in terms of their data types and statistical properties as data is reshaped from its raw form to one that’s ready for analysis.

Niels Bantilan

2020 | Article

Having your cake and eating it: Exploiting Python for programmer productivity and performance on micro-core architectures using ePython

Micro-core architectures combine many simple, low memory, low power computing cores together in a single package. These can be used as a co-processor or standalone but due to limited on-chip memory and esoteric nature of the hardware, writing efficient parallel codes for these chips is challenging.

Maurice Jamieson, Nick Brown, Sihang Liu

2020 | Article

Matched Filter Mismatch Losses in MPSK and MQAM Using Semi-Analytic BEP Modeling

The focus of this paper is the bit error probability (BEP) performance degradation when the transmit and receive pulse shaping filters are mismatched. The modulation schemes considered are MPSK and MQAM.

Mark Wickert, David Peckham

2020 | Article

Spectral Analysis of Mitochondrial Dynamics: A Graph-Theoretic Approach to Understanding Subcellular Pathology

Perturbations of organellar structures within a cell are useful indicators of the cell’s response to viral or bacterial invaders. Of the various organelles, mitochondria are meaningful to model because they show distinct migration patterns in the presence of potentially fatal infections, such as tuberculosis.

Marcus Hill, Mojtaba Fazli, Rachel Mattson, Meekail Zain, Andrew Durden, Allyson T Loy, Barbara Reaves, Abigail Courtney, Frederick D Quinn, S Chakra Chennubhotla, Shannon P Quinn

2020 | Article

High-performance operator evaluations with ease of use: libCEED's Python interface

libCEED is a new lightweight, open-source library for high-performance matrix-free Finite Element computations. libCEED offers a portable interface to high-performance implementations, selectable at runtime, tuned for a variety of current and emerging computational architectures, including CPUs and GPUs.

Valeria Barra, Jed Brown, Jeremy Thompson, Yohann Dudouit

2020 | Article

Awkward Array: JSON-like data, NumPy-like idioms

NumPy simplifies and accelerates mathematical calculations in Python, but only for rectilinear arrays of numbers. Awkward Array provides a similar interface for JSON-like data: slicing, masking, broadcasting, and performing vectorized math on the attributes of objects, unequal-length nested lists (i.

Jim Pivarski, Ianna Osborne, Pratyush Das, Anish Biswas, Peter Elmer

2020 | Article

Learning from evolving data streams

Ubiquitous data poses challenges on current machine learning systems to store, handle and analyze data at scale. Traditionally, this task is tackled by dividing the data into (large) batches. Models are trained on a data batch and then used to obtain predictions.

Jacob Montiel

2020 | Article

Boost-histogram: High-Performance Histograms as Objects

Unlike arrays and tables, histograms in Python have usually been denied their own object, and have been represented as a single operation producing several arrays. Boost-histogram is a new Python library that provides histograms that can be filled, manipulated, sliced, and projected as objects.

Henry Schreiner, Hans Dembinski, Shuo Liu, Jim Pivarski

2020 | Article

Network visualizations with Pyvis and VisJS

Pyvis is a Python module that enables visualizing and interactively manipulating network graphs in the Jupyter notebook, or as a standalone web application. Pyvis is built on top of the powerful and mature VisJS JavaScript library, which allows for fast and responsive interactions while also abstracting away the low-level JavaScript and HTML.

Giancarlo Perrone, Jose Unpingco, Haw-minn Lu

2020 | Article

Introduction to Geometric Learning in Python with Geomstats

There is a growing interest in leveraging differential geometry in the machine learning community. Yet, the adoption of the associated geometric computations has been inhibited by the lack of a reference implementation.

Nina Miolane, Nicolas Guigui, Hadi Zaatiti, Christian Shewmake, Hatem Hajri, Daniel Brooks, Alice Le Brigant, Johan Mathe, Benjamin Hou, Yann Thanwerdas, Stefan Heyder, Olivier Peltre, Niklas Koep, Yann Cabanes, Thomas Gerald, Paul Chauchat, Bernhard Kainz, Claire Donnat, Susan Holmes, Xavier Pennec

2020 | Article

Netlist Analysis and Transformations Using SpyDrNet

Digital hardware circuits (i.e., for application specific integrated circuits or field programmable gate array circuits) can contain a large number of discrete components and connections. These connections are defined by a data structure called a \textquotedbl{}netlist\textquotedbl{}.

Dallin Skouson, Andrew Keller, Michael Wirthlin

2020 | Article

Compyle: a Python package for parallel computing

Compyle allows users to execute a restricted subset of Python on a variety of HPC platforms. It is an embedded domain-specific language (eDSL) for parallel computing. It currently supports multi-core execution using Cython, and OpenCL and CUDA for GPU devices.

Aditya Bhosale, Prabhu Ramachandran

2020 | Article

HOOMD-blue version 3.0 A Modern, Extensible, Flexible, Object-Oriented API for Molecular Simulations

HOOMD-blue is a library for running molecular dynamics and hard particle Monte Carlo simulations that uses pybind11 to provide a Python interface to fast C++ internals. The package is designed to scale from a single CPU core to thousands of NVIDIA or AMD GPUs.

Brandon L. Butler, Vyas Ramasubramani, Joshua A. Anderson, Sharon C. Glotzer

2020 | Article

Fluctuation X-ray Scattering real-time app

The Linac Coherent Light Source (LCLS) at the SLAC National Accelerator Laboratory is an X-ray Free Electron Laser (X-FEL) facility enabling scientists to take snapshots of single macromolecules to study their structure and dynamics.

Antoine Dujardin, Elliott Slaugther, Jeffrey Donatelli, Peter Zwart, Amedeo Perazzo, Chun Hong Yoon

2020 | Article

Quasi-orthonormal Encoding for Machine Learning Applications

Most machine learning models, especially artificial neural networks, require numerical, not categorical data. We briefly describe the advantages and disadvantages of common encoding schemes. For example, one-hot encoding is commonly used for attributes with a few unrelated categories and word embeddings for attributes with many related categories (e.

Haw-minn Lu

2020 | Article

Securing Your Collaborative Jupyter Notebooks in the Cloud using Container and Load Balancing Services

Jupyter has become the go-to platform for developing data applications but data and security concerns, especially when dealing with healthcare, have become paramount for many institutions and applications dealing with sensitive information.

Haw-minn Lu, Adrian Kwong, José Unpingco

2021

Proceedings of the Python in Science Conference 2021

There are 20 articles published in this collection

2021 | Article

Cell Tracking in 3D using deep learning segmentations

Live-cell imaging is a highly used technique to study cell migration and dynamics over time. Although many computational tools have been developed during the past years to automatically detect and track cells, they are optimized to detect cell nuclei with similar shapes and/or cells not clustering together.

Varun Kapoor, Claudia Carabaña

2021 | Article

CNN Based ToF Image Processing

In this paper a Time of Flight (ToF) camera specific data processing pipeline is presented, followed by real life applications using artificial intelligence. These applications include use cases such as gesture recognition, movement direction estimation or physical exercises monitoring.

Marian-Leontin Pop, Szilard Molnar, Alexandru Pop, Benjamin Kelenyi, Levente Tamas, Andrei Cozma

2021 | Article

Multithreaded parallel Python through OpenMP support in Numba

A modern CPU delivers performance through parallelism. A program that exploits the performance available from a CPU must run in parallel on multiple cores. This is usually best done through multithreading.

Todd Anderson, Tim Mattson

2021 | Article

Training machine learning models faster with Dask

Machine learning (ML) relies on stochastic algorithms, all of which rely on gradient approximations with \textquotedbl{}batch size\textquotedbl{} examples. Growing the batch size as the optimization proceeds is a simple and usable method to reduce the training time, provided that the number of workers grows with the batch size.

Joesph Holt, Scott Sievert

2021 | Article

Monitoring Scientific Python Usage on a Supercomputer

In 2021, more than 30\% of users at the National Energy Research Scientific Computing Center (NERSC) used Python on the Cori supercomputer. To determine this we have developed and open-sourced a simple, minimally invasive monitoring framework that leverages standard Python features to capture Python imports and other job data via a package called \textquotedbl{}Customs\textquotedbl{}.

Rollin Thomas, Laurie Stephey, Annette Greiner, Brandon Cook

2021 | Article

Classification of Diffuse Subcellular Morphologies

Characterizing dynamic sub-cellular morphologies in response to perturbation remains a challenging and important problem. Many organelles are anisotropic and difficult to segment, and few methods exist for quantifying the shape, size, and quantity of these organelles.

Neelima Pulagam, Marcus Hill, Mojtaba Fazli, Rachel Mattson, Meekail Zain, Andrew Durden, Frederick D Quinn, S Chakra Chennubhotla, Shannon P Quinn

2021 | Article

PyRSB: Portable Performance on Multithreaded Sparse BLAS Operations

This article introduces PyRSB, a Python interface to the LIBRSB library. LIBRSB is a portable performance library offering so called Sparse BLAS (Sparse Basic Linear Algebra Subprograms) operations for modern multicore CPUs.

Michele Martone, Simone Bacchio

2021 | Article

Programmatically Identifying Cognitive Biases Present in Software Development

Mitigating bias in AI-enabled systems is a topic of great concern within the research community. While efforts are underway to increase model interpretability and de-bias datasets, little attention has been given to identifying biases that are introduced by developers as part of the software engineering process.

Amanda E. Kraft, Matthew Widjaja, Trevor M. Sands, Brad J. Galego

2021 | Article

Conformal Mappings with SymPy: Towards Python-driven Analytical Modeling in Physics

This contribution shows how the symbolic computing Python library SymPy can be used to improve flow force modeling due to a Couette-type flow, i.e. a flow of viscous fluid in the region between two bodies, where one body is in tangential motion relative to the other.

Zoufiné Lauer-Baré, Erich Gaertig

2021 | Article

PyBMRB: Data visualization tool for BioMagResBank

The Biological Magnetic Resonance Data Bank (BioMagResBank or BMRB https://bmrb.io), founded in 1988, is the international, open archive for data generated by Nuclear Magnetic Resonance (NMR) spectroscopy of biological systems.

Kumaran Baskaran, Jonathan R Wedell, Eldon L. Ulrich, Jeffery C. Hoch, John L. Markley

2021 | Article

Social Media Analysis using Natural Language Processing Techniques

Social media is very popularly used every day with daily content viewing and/or posting that in turn influences people around this world in a variety of ways. Social media platforms, such as YouTube, have a lot of activity that goes on every day in terms of video posting, watching and commenting.

Jyotika Singh

2021 | Article

PyCID: A Python Library for Causal Influence Diagrams

Why did a decision maker select a certain decision? What behaviour does a certain objective incentivise? How can we improve this behaviour and ensure that a decision-maker chooses decisions with safer or fairer consequences? This paper introduces the Python package PyCID, built upon pgmpy, that implements (causal) influence diagrams, a widely used graphical modelling framework for decision-making problems.

James Fox, Tom Everitt, Ryan Carey, Eric Langlois, Alessandro Abate, Michael Wooldridge

2021 | Article

CLAIMED, a visual and scalable component library for Trusted AI

CLAIMED is a component library for artificial intelligence, machine learning, \textquotedbl{}extract, transform, load\textquotedbl{} processes and data science. The goal is to enable low-code/no-code rapid prototyping by providing ready-made components for various business domains, supporting various computer languages, working on various data flow editors and running on diverse execution engines.

Romeo Kienzler, Ivan Nesic

2021 | Article

Natural Language Processing with Pandas DataFrames

Most areas of Python data science have standardized on using Pandas DataFrames for representing and manipulating structured data in memory. Natural Language Processing (NLP), not so much. We believe that Pandas has the potential to serve as a universal data structure for NLP data.

Frederick Reiss, Bryan Cutler, Zachary Eichenberger

2021 | Article

MPI-parallel Molecular Dynamics Trajectory Analysis with the H5MD Format in the MDAnalysis Python Package

Molecular dynamics (MD) computer simulations help elucidate details of the molecular processes in complex biological systems, from protein dynamics to drug discovery. One major issue is that these MD simulation files are now commonly terabytes in size, which means analyzing the data from these files becomes a painstakingly expensive task.

Edis Jakupovic, Oliver Beckstein

2021 | Article

Accelerating Spectroscopic Data Processing Using Python and GPUs on NERSC Supercomputers

The Dark Energy Spectroscopic Instrument (DESI) will create the most detailed 3D map of the Universe to date by measuring redshifts in light spectra of over 30 million galaxies. The extraction of 1D spectra from 2D spectrograph traces in the instrument output is one of the main computational bottlenecks of DESI data processing pipeline, which is predominantly implemented in Python.

Daniel Margala, Laurie Stephey, Rollin Thomas, Stephen Bailey

2021 | Article

signac: Data Management and Workflows for Computational Researchers

The signac data management framework (https://signac.io) helps researchers execute reproducible computational studies, scales workflows from laptops to supercomputers, and emphasizes portability and fast prototyping.

Bradley D. Dice, Brandon L. Butler, Vyas Ramasubramani, Alyssa Travitz, Michael M. Henry, Hardik Ojha, Kelly L. Wang, Carl S. Adorf, Eric Jankowski, Sharon C. Glotzer

2021 | Article

Modernizing computing by structural biologists with Jupyter and Colab

Protein crystallography produces most of the protein structures used in structure-based drug design. The process of protein structure determination is computationally intensive and error-prone because many software packages are involved.

Blaine H. M. Mooers

2021 | Article

Using Python for Analysis and Verification of Mixed-mode Signal Chains

Any application involving sensitive measurements of the physical world starts with accurate, precise, and low-noise signal chain. Modern, highly integrated data acquisition devices can often be directly connected to sensor outputs, performing analog signal conditioning, digitization, and digital filtering on a single silicon device, greatly simplifying system electronics.

Mark Thoren, Cristina Suteu

2021 | Article

How PDFrw and fillable forms improves throughput at a Covid-19 Vaccine Clinic

PDFrw was used to prepopulate Covid-19 vaccination forms to improve the efficiency and integrity of the vaccination process in terms of federal and state privacy requirements. We will describe the vaccination process from the initial appointment, through the vaccination delivery, to the creation of subsequent required documentation.

Haw-minn Lu, José Unpingco

2024

Proceedings of the Python in Science Conference 2024

There are 0 articles published in this collection

2022

Proceedings of the Python in Science Conference 2022

There are 39 articles published in this collection

2022 | Article

Low Level Feature Extraction for Cilia Segmentation

Cilia are organelles found on the surface of some cells in the human body that sweep rhythmically to transport substances. Dysfunction of ciliary motion is often indicative of diseases known as ciliopathies, which disrupt the functionality of macroscopic structures within the lungs, kidneys and other organs li2018composite.

Meekail Zain, Eric Miller, Shannon P Quinn, Cecilia Lo

2022 | Article

Enabling Active Learning Pedagogy and Insight Mining with a Grammar of Model Analysis

Modern engineering models are complex, with dozens of inputs, uncertainties arising from simplifying assumptions, and dense output data. While major strides have been made in the computational scalability of complex models, relatively less attention has been paid to user-friendly, reusable tools to explore and make sense of these models.

Zachary del Rosario

2022 | Article

Automatic random variate generation in Python

The generation of random variates is an important tool that is required in many applications. Various software programs or packages contain generators for standard distributions like the normal, exponential or Gamma, e.

Christoph Baumgarten, Tirth Patel

2022 | Article

atoMEC: An open-source average-atom Python code

Average-atom models are an important tool in studying matter under extreme conditions, such as those conditions experienced in planetary cores, brown and white dwarfs, and during inertial confinement fusion.

Timothy J. Callow, Daniel Kotik, Eli Kraisler, Attila Cangi

2022 | Article

Monaco: A Monte Carlo Library for Performing Uncertainty and Sensitivity Analyses

This paper introduces monaco, a Python library for conducting Monte Carlo simulations of computational models, and performing uncertainty analysis (UA) and sensitivity analysis (SA) on the results. UA and SA are critical to effective and responsible use of models in science, engineering, and public policy, however their use is uncommon.

W. Scott Shambaugh

2022 | Article

A Python Pipeline for Rapid Application Development (RAD)

Rapid Application Development (RAD) is the ability to rapidly prototype an interactive interface through frequent feedback, so that it can be quickly deployed and delivered to stakeholders and customers.

Scott D. Christensen, Marvin S. Brown, Robert B. Haehnel, Joshua Q. Church, Amanda Catlett, Dallon C. Schofield, Quyen T. Brannon, Stacy T. Smith

2022 | Article

Variational Autoencoders For Semi-Supervised Deep Metric Learning

Deep metric learning (DML) methods generally do not incorporate unlabelled data. We propose borrowing components of the variational autoencoder (VAE) methodology to extend DML methods to train on semi-supervised datasets.

Nathan Safir, Meekail Zain, Curtis Godwin, Eric Miller, Bella Humphrey, Shannon P Quinn

2022 | Article

Wailord: Parsers and Reproducibility for Quantum Chemistry

Data driven advances dominate the applied sciences landscape, with quantum chemistry being no exception to the rule. Dataset biases and human error are key bottlenecks in the development of reproducible and generalized insights.

Rohit Goswami

2022 | Article

RocketPy: Combining Open-Source and Scientific Libraries to Make the Space Sector More Modern and Accessible

In recent years we are seeing exponential growth in the space sector, with new companies emerging in it. On top of that more people are becoming fascinated to participate in the aerospace revolution, which motivates students and hobbyists to build more High Powered and Sounding Rockets.

João Lemes Gribel Soares, Mateus Stano Junqueira, Oscar Mauricio Prada Ramirez, Patrick Sampaio dos Santos Brandão, Adriano Augusto Antongiovanni, Guilherme Fernandes Alves, Giovani Hidalgo Ceotto

2022 | Article

Improving PyDDA's atmospheric wind retrievals using automatic differentiation and Augmented Lagrangian methods

Robert Jackson, Rebecca Gjini, Sri Hari Krishna Narayanan, Matt Menickelly, Paul Hovland, Jan Hückelheim, Scott Collis

2022 | Article

pyDAMPF: a Python package for modeling mechanical properties of hygroscopic materials under interaction with a nanoprobe

pyDAMPF is a tool oriented to the Atomic Force Microscopy (AFM) community, which allows the simulation of the physical properties of materials under variable relative humidity (RH). In particular, pyDAMPF is mainly focused on the mechanical properties of polymeric hygroscopic nanofibers that play an essential role in designing tissue scaffolds for implants and filtering devices.

Willy Menacho, Gonzalo Marcelo Ramírez-Ávila, Horacio V. Guzman

2022 | Article

popmon: Analysis Package for Dataset Shift Detection

popmon is an open-source Python package to check the stability of a tabular dataset. popmon creates histograms of features binned in time-slices, and compares the stability of its profiles and distributions using statistical tests, both over time and with respect to a reference dataset.

Simon Brugman, Tomas Sostak, Pradyot Patil, Max Baak

2022 | Article

Experience report of physics-informed neural networks in fluid simulations: pitfalls and frustration

Though PINNs (physics-informed neural networks) are now deemed as a complement to traditional CFD (computational fluid dynamics) solvers rather than a replacement, their ability to solve the Navier-Stokes equations without given data is still of great interest.

Pi-Yueh Chuang, Lorena A. Barba

2022 | Article

The Geoscience Community Analysis Toolkit: An Open Development, Community Driven Toolkit in the Scientific Python Ecosystem

The Geoscience Community Analysis Toolkit (GeoCAT) team develops and maintains data analysis and visualization tools on structured and unstructured grids for the geosciences community in the Scientific Python Ecosystem (SPE).

Orhan Eroglu, Anissa Zacharias, Michaela Sizemore, Alea Kootz, Heather Craker, John Clyne

2022 | Article

Design of a Scientific Data Analysis Support Platform

Software data analytic workflows are a critical aspect of modern scientific research and play a crucial role in testing scientific hypotheses. A typical scientific data analysis life cycle in a research project must include several steps that may not be fundamental to testing the hypothesis, but are essential for reproducibility.

Nathan Martindale, Jason Hite, Scott Stewart, Mark Adams

2022 | Article

Temporal Word Embeddings Analysis for Disease Prevention

Human languages' semantics and structure constantly change over time through mediums such as culturally significant events. By viewing the semantic changes of words during notable events, contexts of existing and novel words can be predicted for similar, current events.

Nathan Jacobi, Ivan Mo, Albert You, Krishi Kishore, Zane Page, Shannon P. Quinn, Tim Heckman

2022 | Article

Global optimization software library for research and education

Machine learning models are often represented by functions given by computer programs. Optimization of such functions is a challenging task because traditional derivative based optimization methods with guaranteed convergence properties cannot be used.

Nadia Udler

2022 | Article

Phylogeography: Analysis of genetic and climatic data of SARS-CoV-2

Due to the fact that the SARS-CoV-2 pandemic reaches its peak, researchers around the globe are combining efforts to investigate the genetics of different variants to better deal with its distribution.

Aleksandr Koshkarov, Wanlin Li, My-Linh Luu, Nadia Tahiri

2022 | Article

Search for Extraterrestrial Intelligence: GPU Accelerated TurboSETI

A common technique adopted by the Search For Extraterrestrial Intelligence (SETI) community is monitoring electromagnetic radiation for signs of extraterrestrial technosignatures using ground-based radio observatories.

Luigi Cruz, Wael Farah, Richard Elkins

2022 | Article

pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling

pyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models, and classifying data with existing pre-trained audio classification models or custom user-built models.

Jyotika Singh

2022 | Article

A New Python API for Webots Robotics Simulations

Webots is a popular open-source package for 3D robotics simulations. It can also be used as a 3D interactive environment for other physics-based modeling, virtual reality, teaching or games. Webots has provided a simple API allowing Python programs to control robots and/or the simulated world, but this API is inefficient and does not provide many \textquotedbl{}pythonic\textquotedbl{} conveniences.

Justin C. Fisher

2022 | Article

poliastro: a Python library for interactive astrodynamics

Space is more popular than ever, with the growing public awareness of interplanetary scientific missions, as well as the increasingly large number of satellite companies planning to deploy satellite constellations.

Juan Luis Cano Rodríguez, Jorge Martínez Garrido

2022 | Article

Likeness: a toolkit for connecting the social fabric of place to human dynamics

The ability to produce richly-attributed synthetic populations is key for understanding human dynamics, responding to emergencies, and preparing for future events, all while protecting individual privacy.

Joseph V. Tuccillo, James D. Gaboardi

2022 | Article

Keeping your Jupyter notebook code quality bar high (and production ready) with Ploomber

This paper walks through this interactive tutorial. It is highly recommended running this interactively so it’s easier to follow and see the results in real-time. There’s a binder link in there as well, so you can launch it instantly.

Ido Michael

2022 | Article

Awkward Packaging: building Scikit-HEP

Scikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iminuit are examples of libraries that are used beyond the original HEP focus.

Henry Schreiner, Jim Pivarski, Eduardo Rodrigues

2022 | Article

Incorporating Task-Agnostic Information in Task-Based Active Learning Using a Variational Autoencoder

It is often much easier and less expensive to collect data than to label it. Active learning (AL) (settles2009active) responds to this issue by selecting which unlabeled data are best to label next. Standard approaches utilize task-aware AL, which identifies informative samples based on a trained supervised model.

Curtis Godwin, Meekail Zain, Nathan Safir, Bella Humphrey, Shannon P Quinn

2022 | Article

Codebraid Preview for VS Code: Pandoc Markdown Preview with Jupyter Kernels

Codebraid Preview is a VS Code extension that provides a live preview of Pandoc Markdown documents with optional support for executing embedded code. Unlike typical Markdown previews, all Pandoc features are fully supported because Pandoc itself generates the preview.

Geoffrey M. Poore

2022 | Article

Pylira: deconvolution of images in the presence of Poisson noise

All physical and astronomical imaging observations are degraded by the finite angular resolution of the camera and telescope systems. The recovery of the true image is limited by both how well the instrument characteristics are known and by the magnitude of measurement noise.

Axel Donath, Aneta Siemiginowska, Vinay Kashyap, Douglas Burke, Karthik Reddy Solipuram, David van Dyk

2022 | Article

Python vs. the pandemic: a case study in high-stakes software development

When it became clear in early 2020 that COVID-19 was going to be a major public health threat, politicians and public health officials turned to academic disease modelers like us for urgent guidance. Academic software development is typically a slow and haphazard process, and we realized that business-as-usual would not suffice for dealing with this crisis.

Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, Romesh G. Abeysuriya, Jamie A. Cohen, Lauren George, Michał Jastrzebski, Michael Famulare, Edward Wenger, Daniel J. Klein

2022 | Article

Bayesian Estimation and Forecasting of Time Series in statsmodels

Statsmodels, a Python library for statistical and econometric analysis, has traditionally focused on frequentist inference, including in its models for time series data. This paper introduces the powerful features for Bayesian inference of time series models that exist in statsmodels, with applications to model fitting, forecasting, time series decomposition, data simulation, and impulse response functions.

Chad Fulton

2022 | Article

USACE Coastal Engineering Toolkit and a Method of Creating a Web-Based Application

In the early 1990s the Automated Coastal Engineering Systems, ACES, was created with the goal of providing state-of-the-art computer-based tools to increase the accuracy, reliability, and cost-effectiveness of Corps coastal engineering endeavors.

Amanda Catlett, Theresa R. Coumbe, Scott D. Christensen, Mary A. Byrant

2022 | Article

Papyri: better documentation for the scientific ecosystem in Jupyter

We present here the idea behind Papyri, a framework we are developing to provide a better documentation experience for the scientific ecosystem. In particular, we wish to provide a documentation browser (from within Jupyter or other IDEs and Python editors) that gives a unified experience, cross library navigation search and indexing.

Matthias Bussonnier, Camille Carvalho

2022 | Article

Python for Global Applications: teaching scientific Python in context to law and diplomacy students

For students across domains and disciplines, the message has been communicated loud and clear: data skills are an essential qualification for today’s job market. This includes not only the traditional introductory stats coursework but also machine learning, artificial intelligence, and programming in Python or R.

Anna Haensch, Karin Knudson

2022 | Article

The myth of the normal curve and what to do about it

This paper gives an overview of the issues associated with the normal curve. The concern with traditional methods, in terms of robustness to violations of normality, have been known for over a half century and modern alternatives have been recommended; however, for various reasons that have been discussed, modern robust methods have not yet become commonplace in applied research settings.

Allan Campopiano

2022 | Article

A Novel Pipeline for Cell Instance Segmentation, Tracking and Motility Classification of Toxoplasma Gondii in 3D Space

Toxoplasma gondii is the parasitic protozoan that causes disseminated toxoplasmosis, a disease that is estimated to infect around one-third of the world's population. While the disease is commonly asymptomatic, the success of the parasite is in large part due to its ability to easily spread through nucleated cells.

Seyed Alireza Vaezi, Gianni Orlando, Mojtaba Fazli, Gary Ward, Silvia Moreno, Shannon Quinn

2022 | Article

Utilizing SciPy and other open source packages to provide a powerful API for materials manipulation in the Schrödinger Materials Suite

The use of several open source scientific packages in the Schrödinger Materials Science Suite will be discussed. A typical workflow for materials discovery will be described, discussing how open source packages have been incorporated at every stage.

Alexandr Fonari, Farshad Fallah, Michael Rauch

2022 | Article

Galyleo: A General-Purpose Extensible Visualization Solution

Galyleo is an open-source, extensible dashboarding solution integrated with JupyterLab jupyterlab. Galyleo is a standalone web application integrated as an iframe lawson2011introducing into a JupyterLab tab.

Rick McGeer, Andreas Bergen, Mahdiyar Biazi, Matt Hemmings, Robin Schreiber

2022 | Article

Semi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Labeling

Most semantic image annotation platforms suffer severe bottlenecks when handling large images, complex regions of interest, or numerous distinct foreground regions in a single image. We have developed the Semi-Supervised Semantic Annotator (S3A) to address each of these issues and facilitate rapid collection of ground truth pixel-level labeled data.

Nathan Jessurun, Daniel E. Capecci, Olivia P. Dizon-Paradis, Damon L. Woodard, Navid Asadizanjani

2022 | Article

The Advanced Scientific Data Format (ASDF): An Update

We report on progress in developing and extending the new (ASDF) format we have developed for the data from the James Webb and Nancy Grace Roman Space Telescopes since we reported on it at a previous Scipy.

Perry Greenfield, Edward Slavich, William Jamieson, Nadia Dencheva

2023

Proceedings of the Python in Science Conference 2023

There are 19 articles published in this collection

2023 | Article

Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem

The Python array API standard specifies standardized application programming interfaces and behaviors for array and tensor objects and operations. The establishment and subsequent adoption of the standard aims to reduce ecosystem fragmentation and facilitate array library interoperability.

Aaron Meurer, Athan Reines, Ralf Gommers, Yao-Lung L. Fang, John Kirkham, Matthew Barber, Stephan Hoyer, Andreas Müller, Sheng Zha, Saul Shanabrook, Stephannie Jiménez Gacha, Mario Lezcano-Casado, Thomas J. Fan, Tyler Reddy, Alexandre Passos, Hyukjin Kwon, Travis Oliphant, Consortium for Python Data API Standards

2023 | Article

A Modified Strassen Algorithm to Accelerate Numpy Large Matrix Multiplication with Integer Entries

We present a Strassen type algorithm for multiplying large matrices with integer entries. The algorithm is the standard Strassen divide and conquer algorithm but it crosses over to Numpy when either the row or column dimension of one of the matrices drops below 128.

Anthony Breitzman

2023 | Article

An Accessible Python based Author Identification Process

Author identification also known as ‘author attribution’ and more recently ‘forensic linguistics’ involves identifying true authors of anonymous texts. In this paper we replicate the analysis but in a much more accessible way using modern text mining methods and Python.

Anthony Breitzman

2023 | Article

Biomolecular Crystallographic Computing with Jupyter

To further advance this use of Jupyter, we developed a collection of code fragments that use the vast Computational Crystallography Toolbox (cctbx) library for novel analyses. We made versions of this library for use in JupyterLab and Colab.

Blaine H. M. Mooers

2023 | Article

Bayesian Statistics with Python, No Resampling Necessary

TensorFlow Probability is a powerful library for statistical analysis in Python. Using TensorFlow Probability’s implementation of Bayesian methods, modelers can incorporate prior information and obtain parameter estimates and a quantified degree of belief in the results.

Charles Lindsey

2023 | Article

Using Numba for GPU acceleration of Neutron Beamline Digital Twins

Digital twins of neutron instruments using Monte Carlo ray tracing have proven to be useful in neutron data analysis and verifying instrument and sample designs. In this paper, we present a GPU accelerated version of MCViNE using Python and Numba to balance user extensibility with performance.

Coleman J. Kendrick, Jiao Y. Y. Lin, Garrett E. Granroth

2023 | Article

EEG-to-fMRI Neuroimaging Cross Modal Synthesis in Python

Electroencepholography and functional magnetic resonance imaging are two ways of recording brain activity. We developed a Python package, EEG-to-fMRI, which provides cross modal neuroimaging synthesis functionalities.

David Calhas

2023 | Article

vak: a neural network framework for researchers studying animal acoustic communication

The study of acoustic communication is being revolutionized by deep neural network models. To address this need, we developed vak, a neural network framework designed for acoustic communication researchers.

David Nicholson, Yarden Cohen

2023 | Article

Emukit: A Python toolkit for decision making under uncertainty

Emukit is a highly flexible Python toolkit for enriching decision making under uncertainty with statistical emulation. It is particularly pertinent to complex processes and simulations where data are scarce or difficult to acquire.

Andrei Paleyes, Maren Mahsereci, Neil D. Lawrence

2023 | Article

Using Blosc2 NDim As A Fast Explorer Of The Milky Way (Or Any Other NDim Dataset)

Large multidimensional datasets are widely used in various engineering and scientific applications. We have added support for large dimensional datasets to Blosc2, a compression and format library.

Project Blosc, Francesc Alted, Marta Iborra, Oscar Guiñón, David Ibáñez, Sergio Barrachina

Large multidimensional datasets are widely used in various engineering and scientific applications. We have added support for large dimensional datasets to Blosc2, a compression and format library.

2023 | Article

MDAKits: A Framework for FAIR-Compliant Molecular Simulation Analysis

The reproducibility and transparency of scientific findings are widely recognized as crucial for promoting scientific progress. The MDAKits framework provides a cookiecutter template, best practices documentation, and a continually validated registry.

Irfan Alibay, Lily Wang, Fiona Naughton, Ian Kenney, Jonathan Barnoud, Richard J Gowers, Oliver Beckstein

2023 | Article

The Pandata Scalable Open-Source Analysis Stack

As the scale of scientific data analysis continues to grow, traditional domain-specific tools often struggle with data of increasing size and complexity. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-specific analyses.

James A. Bednar, Martin Durant

2023 | Article

Spatial Microsimulation and Activity Allocation in Python: An Update on the Likeness Toolkit

Understanding human security and social equity issues within human systems requires large-scale models of population dynamics that simulate high-fidelity representations of individuals and access to essential activities (work/school, social, errands, health). Likeness is a Python toolkit that provides spatial microsimulation project.

Joseph V. Tuccillo, James D. Gaboardi

2023 | Article

itk-elastix: Medical image registration in Python

Image registration plays a vital role in understanding changes that occur in 2D and 3D scientific imaging datasets. In this paper, we introduce itk-elastix, a user-friendly Python wrapping of the mature elastix registration toolbox.

Konstantinos Ntatsis, Niels Dekker, Viktor van der Valk, Tom Birdsong, Dženan Zukić, Stefan Klein, Marius Staring, Matthew McCormick

2023 | Article

PyQtGraph - High Performance Visualization for All Platforms

PyQtGraph is a plotting library with high performance, cross-platform support and interactivity as its primary objectives. These goals are achieved by connecting the Qt GUI framework and the scientific Python ecosystem.

Ognyan Moore, Nathan Jessurun, Martin Chase, Nils Nemitz, Luke Campagnola

2023 | Article

Pandera: Going Beyond Pandas Data Validation

Data quality remains a core concern for practitioners in machine learning, data science, and data engineering, and many specialized packages have emerged to fulfill the need of validating and monitoring data and models. This paper outlines pandera’s motivation and challenges that took it from being a pandas-only data validation framework to one that is extensible to other non-pandas-compliant dataframe-like libraries.

Niels Bantilan

2023 | Article

libyt: a Tool for Parallel In Situ Analysis with yt

In the era of exascale computing, storage and analysis of large scale data have become more important and difficult. We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or other Python packages in parallel during simulation runtime.

Shin-Rong Tsai, Hsi-Yu Schive, Matthew J. Turk

2023 | Article

Data Reduction Network

Multidimensional categorical data is widespread but not easily visualized using standard methods. For example, questionnaire data generally consists of questions with categorical responses. Popular methods of handling categorical data include one-hot encoding and enumeration, which applies an unwarranted and potentially misleading notional order to the data. To address this, we introduce a novel visualization method named Data Reduction Network.

Haoyin Xu, Haw-minn Lu, José Unpingco

2023 | Article

aPhyloGeo-Covid: A Web Interface for Reproducible Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake

The gene sequencing data, along with the associated lineage tracing and research data generated throughout the Coronavirus disease 2019 (COVID-19) pandemic, constitute invaluable resources that profoundly empower phylogeography research. To optimize the utilization of these resources, we have developed an interactive analysis platform called aPhyloGeo-Covid.

Wanlin Li, Nadia Tahiri

Articles

A collection of research articles

There are 0 articles published in this collection