2018

Proceedings of the Python in Science Conference 2018

There are 24 articles published in this collection
2018 | Article
Yaksh: Facilitating Learning by Doing
Article thumbnail
Prabhu Ramachandran, Prathamesh Salunke, Ankit Javalkar, Aditya Palaparthy, Mahesh Gudi, Hardik Ghaghada

Yaksh is a free and open-source online evaluation platform. At its core, Yaksh focuses on problem-based learning and lets teachers create practice exercises and quizzes which are evaluated in real-time.

2018 | Article
signac: A Python framework for data and workflow management
Article thumbnail
Vyas Ramasubramani, Carl S. Adorf, Paul M. Dodd, Bradley D. Dice, Sharon C. Glotzer

Computational research requires versatile data and workflow management tools that can easily adapt to the highly dynamic requirements of scientific investigations. Many existing tools require strict adherence to a particular usage pattern, so researchers often use less robust ad hoc solutions that they find easier to adopt.

2018 | Article
Scalable Feature Extraction with Aerial and Satellite Imagery
Article thumbnail
Virginia Ng, Daniel Hofmann

Deep learning techniques have greatly advanced the performance of the already rapidly developing field of computer vision, which powers a variety of emerging technologies—from facial recognition to augmented reality to self-driving cars.

2018 | Article
A Bayesian’s journey to a better research workflow
Article thumbnail
Konstantinos Vamvourellis, Marianne Corvellec

This work began when the two authors met at a software development meeting. Konstantinos was building Bayesian models in his research and wanted to learn how to better manage his research process. Marianne was working on data analysis workflows in industry and wanted to learn more about Bayesian statistics.

2018 | Article
Design and Implementation of pyPRISM: A Polymer Liquid-State Theory Framework
Article thumbnail
Tyler B. Martin, Thomas E. Gartner III, Ronald L. Jones, Chad R. Snyder, Arthi Jayaraman

In this work, we describe the code structure, implementation, and usage of a Python-based, open-source framework, pyPRISM, for conducting polymer liquid-state theory calculations. Polymer Reference Interaction Site Model (PRISM) theory describes the equilibrium spatial-correlations, thermodynamics, and structure of liquid-like polymer systems and macromolecular materials.

2018 | Article
Spatio-temporal analysis of socioeconomic neighborhoods: The Open Source Longitudinal Neighborhood Analysis Package (OSLNAP)
Article thumbnail
Sergio Rey, Elijah Knaap, Su Han, Levi Wolf, Wei Kang

The neighborhood effects literature represents a wide span of the social sciences broadly concerned with the influence of spatial context on social processes. From the study of segregation dynamics, the relationships between the built environment and health outcomes, to the impact of concentrated poverty on social efficacy, neighborhoods are a central construct in empirical work.

2018 | Article
Binder 2.0 - Reproducible, interactive, sharable environments for science at scale
Article thumbnail
Project Jupyter, Matthias Bussonnier, Jessica Forde, Jeremy Freeman, Brian Granger, Tim Head, Chris Holdgraf, Kyle Kelley, Gladys Nalvarte, Andrew Osheroff, M Pacer, Yuvi Panda, Fernando Perez, Benjamin Ragan-Kelley, Carol Willing

Binder is an open source web service that lets users create sharable, interactive, reproducible environments in the cloud. It is powered by other core projects in the open source ecosystem, including JupyterHub and Kubernetes for managing cloud resources.

2018 | Article
Harnessing the Power of Scientific Python to Investigate Biogeochemistry and Metaproteomes of the Central Pacific Ocean
Article thumbnail
Noelle A. Held, Jaclyn K. Saunders, Joe Futrelle, Mak A. Saito

Oceanographic expeditions commonly generate millions of data points for various chemical, biological, and physical features, all in different formats. Scientific Python tools are extremely useful for synthesizing this data to make sense of major trends in the changing ocean environment.

2018 | Article
Organic Molecules in Space: Insights from the NASA Ames Molecular Database in the era of the James Webb Space Telescope
Article thumbnail
Matthew J. Shannon, Christiaan Boersma

We present the software tool pyPAHdb to the scientific astronomical community, which is used to characterize emission from one of the most prevalent types of organic molecules in space, namely polycyclic aromatic hydrocarbons (PAHs).

2018 | Article
Real-Time Digital Signal Processing Using pyaudio\_helper and the ipywidgets
Article thumbnail
Mark Wickert

The focus of this paper is on teaching real-time digital signal processing to electrical and computer engineers using the Jupyter notebook and the code module pyaudio\_helper, which is a component of the package scikit-dsp-comm.

2018 | Article
Exploring the Extended Kalman Filter for GPS Positioning Using Simulated User and Satellite Track Data
Article thumbnail
Mark Wickert, Chiranth Siddappa

This paper describes a Python computational tool for exploring the use of the extended Kalman filter (EKF) for position estimation using the Global Positioning System (GPS) pseudorange measurements. The development was motivated by the need for an example generator in a training class on Kalman filtering, with emphasis on GPS.

2018 | Article
WrightSim: Using PyCUDA to Simulate Multidimensional Spectra
Article thumbnail
Kyle F Sunden, Blaise J Thompson, John C Wright

Nonlinear multidimensional spectroscopy (MDS) is a powerful experimental technique used to interrogate complex chemical systems. MDS promises to reveal energetics, dynamics, and coupling features of and between the many quantum-mechanical states that these systems contain.

2018 | Article
Bringing ipywidgets Support to plotly.py
Article thumbnail
Jon Mease

Plotly.js is a declarative JavaScript data visualization library built on D3 and WebGL that supports a wide range of statistical, scientific, financial, geographic, and 3-dimensional visualizations. Support for creating Plotly.

2018 | Article
Sparse: A more modern sparse array library
Article thumbnail
Hameer Abbasi

This paper is about sparse multi-dimensional arrays in Python. We discuss their applications, layouts, and current implementations in the SciPy ecosystem along with strengths and weaknesses. We then introduce a new package for sparse arrays that builds on the legacy of the scipy.

2018 | Article
Text and data mining scientific articles with allofplos
Article thumbnail
Elizabeth Seiver, M Pacer, Sebastian Bassi

Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use.

2018 | Article
Safe handling instructions for missing data
Article thumbnail
Dillon Niederhut

In machine learning tasks, it is common to handle missing data by removing observations with missing values, or replacing missing data with the mean value for its feature. To show why this is problematic, we use listwise deletion and mean imputing to recover missing values from artificially created datasets, and we compare those models against ones with full information.

2018 | Article
EarthSim: Flexible Environmental Simulation Workflows Entirely Within Jupyter Notebooks
Article thumbnail
Dharhas Pothina, Philipp J. F. Rudiger, James A Bednar, Scott Christensen, Kevin Winters, Kimberly Pevey, Christopher E. Ball, Gregory Brener

Building environmental simulation workflows is typically a slow process involving multiple proprietary desktop tools that do not interoperate well. In this work, we demonstrate building flexible, lightweight workflows entirely in Jupyter notebooks.

2018 | Article
Practical Applications of Astropy
Article thumbnail
David Shupe, Frank Masci, Russ Laher, Ben Rusholme, Lee Armus

Packages developed under the auspices of the Astropy Project (astropy2013, astropy2018) address many common problems faced by astronomers in their computational projects. In this paper we describe how capabilities provided by Astropy have been employed in two current projects.

2018 | Article
Developing a Start-to-Finish Pipeline for Accelerometer-Based Activity Recognition Using Long Short-Term Memory Recurrent Neural Networks
Article thumbnail
Christian McDaniel, Shannon Quinn

Increased prevalence of smartphones and wearable devices has facilitated the collection of triaxial accelerometer data for numerous Human Activity Recognition (HAR) tasks. Concurrently, advances in the theory and implementation of long short-term memory (LSTM) recurrent neural networks (RNNs) has made it possible to process this data in its raw form, enabling on-device online analysis.

2018 | Article
The Econ-ARK and HARK: Open Source Tools for Computational Economics
Article thumbnail
Christopher D. Carroll, Alexander M. Kaufman, Jacqueline L. Kazil, Nathan M. Palmer, Matthew N. White

The Economics Algorithmic Repository and toolKit (Econ-ARK) aims to become a focal resource for computational economics. Its first ‘framework,’ the Heterogeneous Agent Resources and Toolkit (HARK), provides a modern, robust, transparent set of tools to solve a class of macroeconomic models whose usefulness has become increasingly apparent both for economic policy and for research purposes, but whose adoption has been limited because the existing literature derives from idiosyncratic, hand-crafted, and often impenetrable legacy code.

2018 | Article
Composable Multi-Threading and Multi-Processing for Numeric Libraries
Article thumbnail
Anton Malakhov, David Liu, Anton Gorshkov, Terry Wilmarth

Python is popular among scientific communities that value its simplicity and power, especially as it comes along with numeric libraries such as NumPy, SciPy, Dask, and Numba. As CPU core counts keep increasing, these modules can make use of many cores via multi-threading for efficient multi-core parallelism.

2018 | Article
Equity, Scalability, and Sustainability of Data Science Infrastructure
Article thumbnail
Anthony Suen, Laura Norén, Alan Liang, Andrea Tu

We seek to understand the current state of equity, scalability, and sustainability of data science education infrastructure in both the U.S. and Canada. Our analysis of the technological, funding, and organizational structure of four types of institutions shows an increasing divergence in the ability of universities across the United States to provide students with accessible data science education infrastructure, primarily JupyterHub.

2018 | Article
Dynamic Social Network Modeling of Diffuse Subcellular Morphologies
Article thumbnail
Andrew Durden, Allyson T Loy, Barbara Reaves, Mojtaba Fazli, Abigail Courtney, Frederick D Quinn, S Chakra Chennubhotla, Shannon P Quinn

The use of fluorescence microscopy has catalyzed new insights into biological function, and spurred the development of quantitative models from rich biomedical image datasets. While image processing in some capacity is commonplace for extracting and modeling quantitative knowledge from biological systems at varying scales, general-purpose approaches for more advanced modeling are few.

2018 | Article
Cloudknot: A Python Library to Run your Existing Code on AWS Batch
Article thumbnail
Adam Richie-Halford, Ariel Rokem

We introduce Cloudknot, a software library that simplifies cloud-based distributed computing by programmatically executing user-defined functions (UDFs) in AWS Batch. It takes as input a Python function, packages it as a container, creates all the necessary AWS constituent resources to submit jobs, monitors their execution and gathers the results, all from within the Python environment.

2019

Proceedings of the Python in Science Conference 2019

There are 20 articles published in this collection
2019 | Article
PMDA - Parallel Molecular Dynamics Analysis
Article thumbnail
Shujie Fan, Max Linke, Ioannis Paraskevakos, Richard J. Gowers, Michael Gecht, Oliver Beckstein

MDAnalysis is an object-oriented Python library to analyze trajectories from molecular dynamics (MD) simulations in many popular formats. With the development of highly optimized MD software packages on high performance computing (HPC) resources, the size of simulation trajectories is growing up to many terabytes in size.

2019 | Article
Visualization of Bioinformatics Data with Dash Bio
Article thumbnail
Shammamah Hossain

Plotly's Dash is a library that empowers data scientists to create interactive web applications declaratively in Python. Dash Bio is a bioinformatics-oriented suite of components that are compatible with Dash.

2019 | Article
Better and faster hyperparameter optimization with Dask
Article thumbnail
Scott Sievert, Tom Augspurger, Matthew Rocklin

Nearly every machine learning model requires hyperparameters, parameters that the user must specify before training begins and influence model performance. Finding the optimal set of hyperparameters is often a time- and resource-consuming process.

2019 | Article
PyDDA: A new Pythonic Wind Retrieval Package
Article thumbnail
Robert Jackson, Scott Collis, Timothy Lang, Corey Potvin, Todd Munson

PyDDA is a new community framework aimed at wind retrievals that depends only upon utilities in the SciPy ecosystem such as scipy, numpy, and dask. It can support retrievals of winds using information from weather radar networks constrained by high resolution forecast models over grids that cover thousands of kilometers at kilometer-scale resolution.

2019 | Article
Parkinson's Classification and Feature Extraction from Diffusion Tensor Images
Article thumbnail
Rajeswari Sivakumar, Shannon Quinn

Parkinson’s disease (PD) affects over 6.2 million people around the world. Despite its prevalence, there is still no cure, and diagnostic methods are extremely subjective, relying on observation of physical motor symptoms and response to treatment protocols.

2019 | Article
PyLZJD: An Easy to Use Tool for Machine Learning
Article thumbnail
Edward Raff, Joe Aurelio, Charles Nicholas

As Machine Learning (ML) becomes more widely known and popular, so too does the desire for new users from other backgrounds to apply ML techniques to their own domains. A difficult prerequisite that often confounds new users is the feature creation and engineering process.

2019 | Article
Parameter Estimation Using the Python Package pymcmcstat
Article thumbnail
Paul R. Miles, Ralph C. Smith

A Bayesian approach to solving inverse problems provides insight regarding model limitations as well as the underlying model and observation uncertainty. In this paper we introduce pymcmcstat, which provides a wide variety of tools for estimating unknown parameter distributions.

2019 | Article
An intelligent shopping list based on the application of partitioning and machine learning algorithms
Article thumbnail
Nadia Tahiri, Bogdan Mazoure, Vladimir Makarenkov

A grocery list is an integral part of the shopping experience of many consumers. Several mobile retail studies of grocery apps indicate that potential customers place the highest priority on features that help them to create and manage personalized shopping lists.

2019 | Article
A Real-Time 3D Audio Simulator for Cognitive Hearing Science
Article thumbnail
Mark Wickert

This paper describes the development of a 3D audio simulator for use in cognitive hearing science studies and also for general 3D audio experimentation. The framework that the simulator is built upon is pyaudio\_helper, which is a module of the package scikit-dsp-comm.

2019 | Article
Optimizing Python-Based Spectroscopic Data Processing on NERSC Supercomputers
Article thumbnail
Laurie A. Stephey, Rollin C. Thomas, Stephen J. Bailey

We present a case study of optimizing a Python-based cosmology data processing pipeline designed to run in parallel on thousands of cores using supercomputers at the National Energy Research Scientific Computing Center (NERSC).

2019 | Article
Solving Polynomial Systems with phcpy
Article thumbnail
Jasmine Otto, Angus Forbes, Jan Verschelde

The solutions of a system of polynomials in several variables are often needed, e.g.: in the design of mechanical systems, and in phase-space analyses of nonlinear biological dynamics. Reliable, accurate, and comprehensive numerical solutions are available through PHCpack, a FOSS package for solving polynomial systems with homotopy continuation.

2019 | Article
Case study: Real-world machine learning application for hardware failure detection
Article thumbnail
Hongsup Shin

When designing microprocessors, engineers must verify whether the proposed design, defined in hardware description language, does what is intended. During this verification process, engineers run simulation tests and can fix bugs if the tests have failed.

2019 | Article
Codebraid: Live Code in Pandoc Markdown
Article thumbnail
Geoffrey M. Poore

Codebraid executes code blocks and inline code in Pandoc Markdown documents as part of the document build process. Code can be executed with a built-in system or Jupyter kernels. Either way, a single document can involve multiple programming languages, as well as multiple independent sessions or processes per language.

2019 | Article
pyjanitor: A Cleaner API for Cleaning Data
Article thumbnail
Eric J. Ma, Zachary Barry, Sam Zuckerman, Zachary Sailer

The pandas library has become the de facto library for data wrangling in the Python programming language. However, inconsistencies in the pandas application programming interface (API), while idiomatic due to historical use, prevent use of expressive, fluent programming idioms that enable self-documenting pandas code.

2019 | Article
Developing a Graph Convolution-Based Analysis Pipeline for Multi-Modal Neuroimage Data: An Application to Parkinson's Disease
Article thumbnail
Christian McDaniel, Shannon Quinn, PhD

Parkinson's disease (PD) is a highly prevalent neurodegenerative condition originating in subcortical areas of the brain and resulting in progressively worsening motor, cognitive, and psychiatric (e.g.

2019 | Article
CAF Implementation on FPGA Using Python Tools
Article thumbnail
Chiranth Siddappa, Mark Wickert

The purpose of this project is to provide a real time geolocation solution by generating code for the complex ambiguity function (CAF) in a hardware description language (HDL) and the implementation on FPGA hardware.

2019 | Article
Analyzing Particle Systems for Machine Learning and Data Visualization with freud
Article thumbnail
Bradley D. Dice, Vyas Ramasubramani, Eric S. Harper, Matthew P. Spellings, Joshua A. Anderson, Sharon C. Glotzer

The freud Python library analyzes particle data output from molecular dynamics simulations. The library's design and its variety of high-performance methods make it a powerful tool for many modern applications.

2019 | Article
Accelerating the Advancement of Data Science Education
Article thumbnail
Eric Van Dusen, Anthony Suen, Alan Liang, Amal Bhatnagar

We outline a synthesis of strategies created in collaboration with 35+ colleges and universities on how to advance undergraduate data science education on a national scale. The four core pillars of this strategy include the integration of data science education across all domains, establishing adoptable and scalable cyberinfrastructure, applying data science to non-traditional domains, and incorporating ethical content into data science curricula.

2019 | Article
Deep and Ensemble Learning to Win the Army RCO AI Signal Classification Challenge
Article thumbnail
Andres Vila, Donna Branchevsky, Kyle Logue, Sebastian Olsen, Esteban Valles, Darren Semmen, Alex Utter, Eugene Grayver

Automatic modulation classification is a challenging problem with multiple applications including cognitive radio and signals intelligence. Most of the existing efforts to solve this problem are only applicable when the signal to noise ratio (SNR) is high and/or long observations of the signal are available.

2019 | Article
Expert RF Feature Extraction to Win the Army RCO AI Signal Classification Challenge
Article thumbnail
Kyle Logue, Esteban Valles, Andres Vila, Alex Utter, Darren Semmen, Eugene Grayver, Sebastian Olsen, Donna Branchevsky

Automatic modulation classification is a challenging problem with multiple applications including cognitive radio and signals intelligence. Most of the existing efforts to solve this problem are only applicable when the signal to noise ratio (SNR) is high and/or long observations of the signal are available.

2020

Proceedings of the Python in Science Conference 2020

There are 23 articles published in this collection
2020 | Article
Towards an Unsupervised Spatiotemporal Representation of Cilia Video Using A Modular Generative Pipeline
Article thumbnail
Meekail Zain, Sonia Rao, Nathan Safir, Quinn Wyner, Isabella Humphrey, Alex Eldridge, Chenxiao Li, BahaaEddin AlAila, Shannon Quinn

Motile cilia are a highly conserved organelle found on the exterior of many human cells. Cilia beat in rhythmic patterns to transport substances or generate signaling gradients. Disruption of these patterns is often indicative of diseases known as ciliopathies, whose consequences can include dysfunction of macroscopic structures within the lungs, kidneys, brain, and other organs.

2020 | Article
Falsify your Software: validating scientific code with property-based testing
Article thumbnail
Zac Hatfield-Dodds

Where traditional example-based tests check software using manually-specified input-output pairs, property-based tests exploit a general description of valid inputs and program behaviour to automatically search for falsifying examples.

2020 | Article
Software Engineering as Research Method: Aligning Roles in Econ-ARK
Article thumbnail
Sebastian Benthall, Mridul Seth

While general purpose scientific software has enjoyed great success in industry and academia, domain specific scientific software has not yet become well-established in many disciplines where it has potential.

2020 | Article
SHADOW: A workflow scheduling algorithm reference and testing framework
Article thumbnail
Ryan W. Bunney, Andreas Wicenec, Mark Reynolds

As the scale of science projects increase, so does the demand on computing infrastructures. The complexity of science processing pipelines, and the heterogeneity of the environments on which they are run, continues to increase; in order to deal with this, the algorithmic approaches to executing these applications must also be adapted and improved to deal with this increased complexity.

2020 | Article
Leading magnetic fusion energy science into the big-and-fast data lane
Article thumbnail
Ralph Kube, R Michael Churchill, Jong Youl Choi, Ruonan Wang, Scott Klasky, CS Chang, Minjun J. Choi, Jinseop Park

We present Delta, a Python framework that connects magnetic fusion experiments to high-performance computing (HPC) facilities in order leverage advanced data analysis for near real-time decisions. Using the ADIOS I/O framework, Delta streams measurement data with over 300 MByte/sec from a remote experimental site in Korea to Cori, a Cray XC-40 supercomputer at the National Energy Energy Research Scientific Computing Centre in California.

2020 | Article
Pydra - a flexible and lightweight dataflow engine for scientific analyses
Article thumbnail
Dorota Jarecka, Mathias Goncalves, Christopher J. Markiewicz, Oscar Esteban, Nicole Lo, Jakub Kaczmarzyk, Satrajit Ghosh

This paper presents a new lightweight dataflow engine written in Python: Pydra. Pydra is developed as an open-source project in the neuroimaging community, but it is designed as a general-purpose dataflow engine to support any scientific domain.

2020 | Article
Combining Physics-Based and Data-Driven Modeling for Pressure Prediction in Well Construction
Article thumbnail
Oney Erge, Eric van Oort

A framework for combining physics-based and data-driven models to improve well construction is presented in this study. Additionally, the proposed approach provides a more robust and accurate model that mitigates the disadvantages of using purely physics-based or data-driven models.

2020 | Article
pandera: Statistical Data Validation of Pandas Dataframes
Article thumbnail
Niels Bantilan

pandas is an essential tool in the data scientist’s toolkit for modern data engineering, analysis, and modeling in the Python ecosystem. However, dataframes can often be difficult to reason about in terms of their data types and statistical properties as data is reshaped from its raw form to one that’s ready for analysis.

2020 | Article
Having your cake and eating it: Exploiting Python for programmer productivity and performance on micro-core architectures using ePython
Article thumbnail
Maurice Jamieson, Nick Brown, Sihang Liu

Micro-core architectures combine many simple, low memory, low power computing cores together in a single package. These can be used as a co-processor or standalone but due to limited on-chip memory and esoteric nature of the hardware, writing efficient parallel codes for these chips is challenging.

2020 | Article
Matched Filter Mismatch Losses in MPSK and MQAM Using Semi-Analytic BEP Modeling
Article thumbnail
Mark Wickert, David Peckham

The focus of this paper is the bit error probability (BEP) performance degradation when the transmit and receive pulse shaping filters are mismatched. The modulation schemes considered are MPSK and MQAM.

2020 | Article
Spectral Analysis of Mitochondrial Dynamics: A Graph-Theoretic Approach to Understanding Subcellular Pathology
Article thumbnail
Marcus Hill, Mojtaba Fazli, Rachel Mattson, Meekail Zain, Andrew Durden, Allyson T Loy, Barbara Reaves, Abigail Courtney, Frederick D Quinn, S Chakra Chennubhotla, Shannon P Quinn

Perturbations of organellar structures within a cell are useful indicators of the cell’s response to viral or bacterial invaders. Of the various organelles, mitochondria are meaningful to model because they show distinct migration patterns in the presence of potentially fatal infections, such as tuberculosis.

2020 | Article
High-performance operator evaluations with ease of use: libCEED's Python interface
Article thumbnail
Valeria Barra, Jed Brown, Jeremy Thompson, Yohann Dudouit

libCEED is a new lightweight, open-source library for high-performance matrix-free Finite Element computations. libCEED offers a portable interface to high-performance implementations, selectable at runtime, tuned for a variety of current and emerging computational architectures, including CPUs and GPUs.

2020 | Article
Awkward Array: JSON-like data, NumPy-like idioms
Article thumbnail
Jim Pivarski, Ianna Osborne, Pratyush Das, Anish Biswas, Peter Elmer

NumPy simplifies and accelerates mathematical calculations in Python, but only for rectilinear arrays of numbers. Awkward Array provides a similar interface for JSON-like data: slicing, masking, broadcasting, and performing vectorized math on the attributes of objects, unequal-length nested lists (i.

2020 | Article
Learning from evolving data streams
Article thumbnail
Jacob Montiel

Ubiquitous data poses challenges on current machine learning systems to store, handle and analyze data at scale. Traditionally, this task is tackled by dividing the data into (large) batches. Models are trained on a data batch and then used to obtain predictions.

2020 | Article
Boost-histogram: High-Performance Histograms as Objects
Article thumbnail
Henry Schreiner, Hans Dembinski, Shuo Liu, Jim Pivarski

Unlike arrays and tables, histograms in Python have usually been denied their own object, and have been represented as a single operation producing several arrays. Boost-histogram is a new Python library that provides histograms that can be filled, manipulated, sliced, and projected as objects.

2020 | Article
Network visualizations with Pyvis and VisJS
Article thumbnail
Giancarlo Perrone, Jose Unpingco, Haw-minn Lu

Pyvis is a Python module that enables visualizing and interactively manipulating network graphs in the Jupyter notebook, or as a standalone web application. Pyvis is built on top of the powerful and mature VisJS JavaScript library, which allows for fast and responsive interactions while also abstracting away the low-level JavaScript and HTML.

2020 | Article
Introduction to Geometric Learning in Python with Geomstats
Article thumbnail
Nina Miolane, Nicolas Guigui, Hadi Zaatiti, Christian Shewmake, Hatem Hajri, Daniel Brooks, Alice Le Brigant, Johan Mathe, Benjamin Hou, Yann Thanwerdas, Stefan Heyder, Olivier Peltre, Niklas Koep, Yann Cabanes, Thomas Gerald, Paul Chauchat, Bernhard Kainz, Claire Donnat, Susan Holmes, Xavier Pennec

There is a growing interest in leveraging differential geometry in the machine learning community. Yet, the adoption of the associated geometric computations has been inhibited by the lack of a reference implementation.

2020 | Article
Netlist Analysis and Transformations Using SpyDrNet
Article thumbnail
Dallin Skouson, Andrew Keller, Michael Wirthlin

Digital hardware circuits (i.e., for application specific integrated circuits or field programmable gate array circuits) can contain a large number of discrete components and connections. These connections are defined by a data structure called a \textquotedbl{}netlist\textquotedbl{}.

2020 | Article
Compyle: a Python package for parallel computing
Article thumbnail
Aditya Bhosale, Prabhu Ramachandran

Compyle allows users to execute a restricted subset of Python on a variety of HPC platforms. It is an embedded domain-specific language (eDSL) for parallel computing. It currently supports multi-core execution using Cython, and OpenCL and CUDA for GPU devices.

2020 | Article
HOOMD-blue version 3.0 A Modern, Extensible, Flexible, Object-Oriented API for Molecular Simulations
Article thumbnail
Brandon L. Butler, Vyas Ramasubramani, Joshua A. Anderson, Sharon C. Glotzer

HOOMD-blue is a library for running molecular dynamics and hard particle Monte Carlo simulations that uses pybind11 to provide a Python interface to fast C++ internals. The package is designed to scale from a single CPU core to thousands of NVIDIA or AMD GPUs.

2020 | Article
Fluctuation X-ray Scattering real-time app
Article thumbnail
Antoine Dujardin, Elliott Slaugther, Jeffrey Donatelli, Peter Zwart, Amedeo Perazzo, Chun Hong Yoon

The Linac Coherent Light Source (LCLS) at the SLAC National Accelerator Laboratory is an X-ray Free Electron Laser (X-FEL) facility enabling scientists to take snapshots of single macromolecules to study their structure and dynamics.

2020 | Article
Quasi-orthonormal Encoding for Machine Learning Applications
Article thumbnail
Haw-minn Lu

Most machine learning models, especially artificial neural networks, require numerical, not categorical data. We briefly describe the advantages and disadvantages of common encoding schemes. For example, one-hot encoding is commonly used for attributes with a few unrelated categories and word embeddings for attributes with many related categories (e.

2020 | Article
Securing Your Collaborative Jupyter Notebooks in the Cloud using Container and Load Balancing Services
Article thumbnail
Haw-minn Lu, Adrian Kwong, José Unpingco

Jupyter has become the go-to platform for developing data applications but data and security concerns, especially when dealing with healthcare, have become paramount for many institutions and applications dealing with sensitive information.

2021

Proceedings of the Python in Science Conference 2021

There are 20 articles published in this collection
2021 | Article
Cell Tracking in 3D using deep learning segmentations
Article thumbnail
Varun Kapoor, Claudia Carabaña

Live-cell imaging is a highly used technique to study cell migration and dynamics over time. Although many computational tools have been developed during the past years to automatically detect and track cells, they are optimized to detect cell nuclei with similar shapes and/or cells not clustering together.

2021 | Article
CNN Based ToF Image Processing
Article thumbnail
Marian-Leontin Pop, Szilard Molnar, Alexandru Pop, Benjamin Kelenyi, Levente Tamas, Andrei Cozma

In this paper a Time of Flight (ToF) camera specific data processing pipeline is presented, followed by real life applications using artificial intelligence. These applications include use cases such as gesture recognition, movement direction estimation or physical exercises monitoring.

2021 | Article
Multithreaded parallel Python through OpenMP support in Numba
Article thumbnail
Todd Anderson, Tim Mattson

A modern CPU delivers performance through parallelism. A program that exploits the performance available from a CPU must run in parallel on multiple cores. This is usually best done through multithreading.

2021 | Article
Training machine learning models faster with Dask
Article thumbnail
Joesph Holt, Scott Sievert

Machine learning (ML) relies on stochastic algorithms, all of which rely on gradient approximations with \textquotedbl{}batch size\textquotedbl{} examples. Growing the batch size as the optimization proceeds is a simple and usable method to reduce the training time, provided that the number of workers grows with the batch size.

2021 | Article
Monitoring Scientific Python Usage on a Supercomputer
Article thumbnail
Rollin Thomas, Laurie Stephey, Annette Greiner, Brandon Cook

In 2021, more than 30\% of users at the National Energy Research Scientific Computing Center (NERSC) used Python on the Cori supercomputer. To determine this we have developed and open-sourced a simple, minimally invasive monitoring framework that leverages standard Python features to capture Python imports and other job data via a package called \textquotedbl{}Customs\textquotedbl{}.

2021 | Article
Classification of Diffuse Subcellular Morphologies
Article thumbnail
Neelima Pulagam, Marcus Hill, Mojtaba Fazli, Rachel Mattson, Meekail Zain, Andrew Durden, Frederick D Quinn, S Chakra Chennubhotla, Shannon P Quinn

Characterizing dynamic sub-cellular morphologies in response to perturbation remains a challenging and important problem. Many organelles are anisotropic and difficult to segment, and few methods exist for quantifying the shape, size, and quantity of these organelles.

2021 | Article
PyRSB: Portable Performance on Multithreaded Sparse BLAS Operations
Article thumbnail
Michele Martone, Simone Bacchio

This article introduces PyRSB, a Python interface to the LIBRSB library. LIBRSB is a portable performance library offering so called Sparse BLAS (Sparse Basic Linear Algebra Subprograms) operations for modern multicore CPUs.

2021 | Article
Programmatically Identifying Cognitive Biases Present in Software Development
Article thumbnail
Amanda E. Kraft, Matthew Widjaja, Trevor M. Sands, Brad J. Galego

Mitigating bias in AI-enabled systems is a topic of great concern within the research community. While efforts are underway to increase model interpretability and de-bias datasets, little attention has been given to identifying biases that are introduced by developers as part of the software engineering process.

2021 | Article
Conformal Mappings with SymPy: Towards Python-driven Analytical Modeling in Physics
Article thumbnail
Zoufiné Lauer-Baré, Erich Gaertig

This contribution shows how the symbolic computing Python library SymPy can be used to improve flow force modeling due to a Couette-type flow, i.e. a flow of viscous fluid in the region between two bodies, where one body is in tangential motion relative to the other.

2021 | Article
PyBMRB: Data visualization tool for BioMagResBank
Article thumbnail
Kumaran Baskaran, Jonathan R Wedell, Eldon L. Ulrich, Jeffery C. Hoch, John L. Markley

The Biological Magnetic Resonance Data Bank (BioMagResBank or BMRB https://bmrb.io), founded in 1988, is the international, open archive for data generated by Nuclear Magnetic Resonance (NMR) spectroscopy of biological systems.

2021 | Article
Social Media Analysis using Natural Language Processing Techniques
Article thumbnail
Jyotika Singh

Social media is very popularly used every day with daily content viewing and/or posting that in turn influences people around this world in a variety of ways. Social media platforms, such as YouTube, have a lot of activity that goes on every day in terms of video posting, watching and commenting.

2021 | Article
PyCID: A Python Library for Causal Influence Diagrams
Article thumbnail
James Fox, Tom Everitt, Ryan Carey, Eric Langlois, Alessandro Abate, Michael Wooldridge

Why did a decision maker select a certain decision? What behaviour does a certain objective incentivise? How can we improve this behaviour and ensure that a decision-maker chooses decisions with safer or fairer consequences? This paper introduces the Python package PyCID, built upon pgmpy, that implements (causal) influence diagrams, a widely used graphical modelling framework for decision-making problems.

2021 | Article
CLAIMED, a visual and scalable component library for Trusted AI
Article thumbnail
Romeo Kienzler, Ivan Nesic

CLAIMED is a component library for artificial intelligence, machine learning, \textquotedbl{}extract, transform, load\textquotedbl{} processes and data science. The goal is to enable low-code/no-code rapid prototyping by providing ready-made components for various business domains, supporting various computer languages, working on various data flow editors and running on diverse execution engines.

2021 | Article
Natural Language Processing with Pandas DataFrames
Article thumbnail
Frederick Reiss, Bryan Cutler, Zachary Eichenberger

Most areas of Python data science have standardized on using Pandas DataFrames for representing and manipulating structured data in memory. Natural Language Processing (NLP), not so much. We believe that Pandas has the potential to serve as a universal data structure for NLP data.

2021 | Article
MPI-parallel Molecular Dynamics Trajectory Analysis with the H5MD Format in the MDAnalysis Python Package
Article thumbnail
Edis Jakupovic, Oliver Beckstein

Molecular dynamics (MD) computer simulations help elucidate details of the molecular processes in complex biological systems, from protein dynamics to drug discovery. One major issue is that these MD simulation files are now commonly terabytes in size, which means analyzing the data from these files becomes a painstakingly expensive task.

2021 | Article
Accelerating Spectroscopic Data Processing Using Python and GPUs on NERSC Supercomputers
Article thumbnail
Daniel Margala, Laurie Stephey, Rollin Thomas, Stephen Bailey

The Dark Energy Spectroscopic Instrument (DESI) will create the most detailed 3D map of the Universe to date by measuring redshifts in light spectra of over 30 million galaxies. The extraction of 1D spectra from 2D spectrograph traces in the instrument output is one of the main computational bottlenecks of DESI data processing pipeline, which is predominantly implemented in Python.

2021 | Article
signac: Data Management and Workflows for Computational Researchers
Article thumbnail
Bradley D. Dice, Brandon L. Butler, Vyas Ramasubramani, Alyssa Travitz, Michael M. Henry, Hardik Ojha, Kelly L. Wang, Carl S. Adorf, Eric Jankowski, Sharon C. Glotzer

The signac data management framework (https://signac.io) helps researchers execute reproducible computational studies, scales workflows from laptops to supercomputers, and emphasizes portability and fast prototyping.

2021 | Article
Modernizing computing by structural biologists with Jupyter and Colab
Article thumbnail
Blaine H. M. Mooers

Protein crystallography produces most of the protein structures used in structure-based drug design. The process of protein structure determination is computationally intensive and error-prone because many software packages are involved.

2021 | Article
Using Python for Analysis and Verification of Mixed-mode Signal Chains
Article thumbnail
Mark Thoren, Cristina Suteu

Any application involving sensitive measurements of the physical world starts with accurate, precise, and low-noise signal chain. Modern, highly integrated data acquisition devices can often be directly connected to sensor outputs, performing analog signal conditioning, digitization, and digital filtering on a single silicon device, greatly simplifying system electronics.

2021 | Article
How PDFrw and fillable forms improves throughput at a Covid-19 Vaccine Clinic
Article thumbnail
Haw-minn Lu, José Unpingco

PDFrw was used to prepopulate Covid-19 vaccination forms to improve the efficiency and integrity of the vaccination process in terms of federal and state privacy requirements. We will describe the vaccination process from the initial appointment, through the vaccination delivery, to the creation of subsequent required documentation.

2024

Proceedings of the Python in Science Conference 2024

There are 0 articles published in this collection

2022

Proceedings of the Python in Science Conference 2022

There are 39 articles published in this collection
2022 | Article
Low Level Feature Extraction for Cilia Segmentation
Article thumbnail
Meekail Zain, Eric Miller, Shannon P Quinn, Cecilia Lo

Cilia are organelles found on the surface of some cells in the human body that sweep rhythmically to transport substances. Dysfunction of ciliary motion is often indicative of diseases known as ciliopathies, which disrupt the functionality of macroscopic structures within the lungs, kidneys and other organs li2018composite.

2022 | Article
Enabling Active Learning Pedagogy and Insight Mining with a Grammar of Model Analysis
Article thumbnail
Zachary del Rosario

Modern engineering models are complex, with dozens of inputs, uncertainties arising from simplifying assumptions, and dense output data. While major strides have been made in the computational scalability of complex models, relatively less attention has been paid to user-friendly, reusable tools to explore and make sense of these models.

2022 | Article
Automatic random variate generation in Python
Article thumbnail
Christoph Baumgarten, Tirth Patel

The generation of random variates is an important tool that is required in many applications. Various software programs or packages contain generators for standard distributions like the normal, exponential or Gamma, e.

2022 | Article
atoMEC: An open-source average-atom Python code
Article thumbnail
Timothy J. Callow, Daniel Kotik, Eli Kraisler, Attila Cangi

Average-atom models are an important tool in studying matter under extreme conditions, such as those conditions experienced in planetary cores, brown and white dwarfs, and during inertial confinement fusion.

2022 | Article
Monaco: A Monte Carlo Library for Performing Uncertainty and Sensitivity Analyses
Article thumbnail
W. Scott Shambaugh

This paper introduces monaco, a Python library for conducting Monte Carlo simulations of computational models, and performing uncertainty analysis (UA) and sensitivity analysis (SA) on the results. UA and SA are critical to effective and responsible use of models in science, engineering, and public policy, however their use is uncommon.

2022 | Article
A Python Pipeline for Rapid Application Development (RAD)
Article thumbnail
Scott D. Christensen, Marvin S. Brown, Robert B. Haehnel, Joshua Q. Church, Amanda Catlett, Dallon C. Schofield, Quyen T. Brannon, Stacy T. Smith

Rapid Application Development (RAD) is the ability to rapidly prototype an interactive interface through frequent feedback, so that it can be quickly deployed and delivered to stakeholders and customers.

2022 | Article
Variational Autoencoders For Semi-Supervised Deep Metric Learning
Article thumbnail
Nathan Safir, Meekail Zain, Curtis Godwin, Eric Miller, Bella Humphrey, Shannon P Quinn

Deep metric learning (DML) methods generally do not incorporate unlabelled data. We propose borrowing components of the variational autoencoder (VAE) methodology to extend DML methods to train on semi-supervised datasets.

2022 | Article
Wailord: Parsers and Reproducibility for Quantum Chemistry
Article thumbnail
Rohit Goswami

Data driven advances dominate the applied sciences landscape, with quantum chemistry being no exception to the rule. Dataset biases and human error are key bottlenecks in the development of reproducible and generalized insights.

2022 | Article
RocketPy: Combining Open-Source and Scientific Libraries to Make the Space Sector More Modern and Accessible
Article thumbnail
João Lemes Gribel Soares, Mateus Stano Junqueira, Oscar Mauricio Prada Ramirez, Patrick Sampaio dos Santos Brandão, Adriano Augusto Antongiovanni, Guilherme Fernandes Alves, Giovani Hidalgo Ceotto

In recent years we are seeing exponential growth in the space sector, with new companies emerging in it. On top of that more people are becoming fascinated to participate in the aerospace revolution, which motivates students and hobbyists to build more High Powered and Sounding Rockets.

2022 | Article
Improving PyDDA's atmospheric wind retrievals using automatic differentiation and Augmented Lagrangian methods
Article thumbnail
Robert Jackson, Rebecca Gjini, Sri Hari Krishna Narayanan, Matt Menickelly, Paul Hovland, Jan Hückelheim, Scott Collis

.

2022 | Article
pyDAMPF: a Python package for modeling mechanical properties of hygroscopic materials under interaction with a nanoprobe
Article thumbnail
Willy Menacho, Gonzalo Marcelo Ramírez-Ávila, Horacio V. Guzman

pyDAMPF is a tool oriented to the Atomic Force Microscopy (AFM) community, which allows the simulation of the physical properties of materials under variable relative humidity (RH). In particular, pyDAMPF is mainly focused on the mechanical properties of polymeric hygroscopic nanofibers that play an essential role in designing tissue scaffolds for implants and filtering devices.

2022 | Article
popmon: Analysis Package for Dataset Shift Detection
Article thumbnail
Simon Brugman, Tomas Sostak, Pradyot Patil, Max Baak

popmon is an open-source Python package to check the stability of a tabular dataset. popmon creates histograms of features binned in time-slices, and compares the stability of its profiles and distributions using statistical tests, both over time and with respect to a reference dataset.

2022 | Article
Experience report of physics-informed neural networks in fluid simulations: pitfalls and frustration
Article thumbnail
Pi-Yueh Chuang, Lorena A. Barba

Though PINNs (physics-informed neural networks) are now deemed as a complement to traditional CFD (computational fluid dynamics) solvers rather than a replacement, their ability to solve the Navier-Stokes equations without given data is still of great interest.

2022 | Article
The Geoscience Community Analysis Toolkit: An Open Development, Community Driven Toolkit in the Scientific Python Ecosystem
Article thumbnail
Orhan Eroglu, Anissa Zacharias, Michaela Sizemore, Alea Kootz, Heather Craker, John Clyne

The Geoscience Community Analysis Toolkit (GeoCAT) team develops and maintains data analysis and visualization tools on structured and unstructured grids for the geosciences community in the Scientific Python Ecosystem (SPE).

2022 | Article
Design of a Scientific Data Analysis Support Platform
Article thumbnail
Nathan Martindale, Jason Hite, Scott Stewart, Mark Adams

Software data analytic workflows are a critical aspect of modern scientific research and play a crucial role in testing scientific hypotheses. A typical scientific data analysis life cycle in a research project must include several steps that may not be fundamental to testing the hypothesis, but are essential for reproducibility.

2022 | Article
Temporal Word Embeddings Analysis for Disease Prevention
Article thumbnail
Nathan Jacobi, Ivan Mo, Albert You, Krishi Kishore, Zane Page, Shannon P. Quinn, Tim Heckman

Human languages' semantics and structure constantly change over time through mediums such as culturally significant events. By viewing the semantic changes of words during notable events, contexts of existing and novel words can be predicted for similar, current events.

2022 | Article
Global optimization software library for research and education
Article thumbnail
Nadia Udler

Machine learning models are often represented by functions given by computer programs. Optimization of such functions is a challenging task because traditional derivative based optimization methods with guaranteed convergence properties cannot be used.

2022 | Article
Phylogeography: Analysis of genetic and climatic data of SARS-CoV-2
Article thumbnail
Aleksandr Koshkarov, Wanlin Li, My-Linh Luu, Nadia Tahiri

Due to the fact that the SARS-CoV-2 pandemic reaches its peak, researchers around the globe are combining efforts to investigate the genetics of different variants to better deal with its distribution.

2022 | Article
Search for Extraterrestrial Intelligence: GPU Accelerated TurboSETI
Article thumbnail
Luigi Cruz, Wael Farah, Richard Elkins

A common technique adopted by the Search For Extraterrestrial Intelligence (SETI) community is monitoring electromagnetic radiation for signs of extraterrestrial technosignatures using ground-based radio observatories.

2022 | Article
pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling
Article thumbnail
Jyotika Singh

pyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models, and classifying data with existing pre-trained audio classification models or custom user-built models.

2022 | Article
A New Python API for Webots Robotics Simulations
Article thumbnail
Justin C. Fisher

Webots is a popular open-source package for 3D robotics simulations. It can also be used as a 3D interactive environment for other physics-based modeling, virtual reality, teaching or games. Webots has provided a simple API allowing Python programs to control robots and/or the simulated world, but this API is inefficient and does not provide many \textquotedbl{}pythonic\textquotedbl{} conveniences.

2022 | Article
poliastro: a Python library for interactive astrodynamics
Article thumbnail
Juan Luis Cano Rodríguez, Jorge Martínez Garrido

Space is more popular than ever, with the growing public awareness of interplanetary scientific missions, as well as the increasingly large number of satellite companies planning to deploy satellite constellations.

2022 | Article
Likeness: a toolkit for connecting the social fabric of place to human dynamics
Article thumbnail
Joseph V. Tuccillo, James D. Gaboardi

The ability to produce richly-attributed synthetic populations is key for understanding human dynamics, responding to emergencies, and preparing for future events, all while protecting individual privacy.

2022 | Article
Keeping your Jupyter notebook code quality bar high (and production ready) with Ploomber
Article thumbnail
Ido Michael

This paper walks through this interactive tutorial. It is highly recommended running this interactively so it’s easier to follow and see the results in real-time. There’s a binder link in there as well, so you can launch it instantly.

2022 | Article
Awkward Packaging: building Scikit-HEP
Article thumbnail
Henry Schreiner, Jim Pivarski, Eduardo Rodrigues

Scikit-HEP has grown rapidly over the last few years, not just to serve the needs of the High Energy Physics (HEP) community, but in many ways, the Python ecosystem at large. AwkwardArray, boost-histogram/hist, and iminuit are examples of libraries that are used beyond the original HEP focus.

2022 | Article
Incorporating Task-Agnostic Information in Task-Based Active Learning Using a Variational Autoencoder
Article thumbnail
Curtis Godwin, Meekail Zain, Nathan Safir, Bella Humphrey, Shannon P Quinn

It is often much easier and less expensive to collect data than to label it. Active learning (AL) (settles2009active) responds to this issue by selecting which unlabeled data are best to label next. Standard approaches utilize task-aware AL, which identifies informative samples based on a trained supervised model.

2022 | Article
Codebraid Preview for VS Code: Pandoc Markdown Preview with Jupyter Kernels
Article thumbnail
Geoffrey M. Poore

Codebraid Preview is a VS Code extension that provides a live preview of Pandoc Markdown documents with optional support for executing embedded code. Unlike typical Markdown previews, all Pandoc features are fully supported because Pandoc itself generates the preview.

2022 | Article
Pylira: deconvolution of images in the presence of Poisson noise
Article thumbnail
Axel Donath, Aneta Siemiginowska, Vinay Kashyap, Douglas Burke, Karthik Reddy Solipuram, David van Dyk

All physical and astronomical imaging observations are degraded by the finite angular resolution of the camera and telescope systems. The recovery of the true image is limited by both how well the instrument characteristics are known and by the magnitude of measurement noise.

2022 | Article
Python vs. the pandemic: a case study in high-stakes software development
Article thumbnail
Cliff C. Kerr, Robyn M. Stuart, Dina Mistry, Romesh G. Abeysuriya, Jamie A. Cohen, Lauren George, Michał Jastrzebski, Michael Famulare, Edward Wenger, Daniel J. Klein

When it became clear in early 2020 that COVID-19 was going to be a major public health threat, politicians and public health officials turned to academic disease modelers like us for urgent guidance. Academic software development is typically a slow and haphazard process, and we realized that business-as-usual would not suffice for dealing with this crisis.

2022 | Article
Bayesian Estimation and Forecasting of Time Series in statsmodels
Article thumbnail
Chad Fulton

Statsmodels, a Python library for statistical and econometric analysis, has traditionally focused on frequentist inference, including in its models for time series data. This paper introduces the powerful features for Bayesian inference of time series models that exist in statsmodels, with applications to model fitting, forecasting, time series decomposition, data simulation, and impulse response functions.

2022 | Article
USACE Coastal Engineering Toolkit and a Method of Creating a Web-Based Application
Article thumbnail
Amanda Catlett, Theresa R. Coumbe, Scott D. Christensen, Mary A. Byrant

In the early 1990s the Automated Coastal Engineering Systems, ACES, was created with the goal of providing state-of-the-art computer-based tools to increase the accuracy, reliability, and cost-effectiveness of Corps coastal engineering endeavors.

2022 | Article
Papyri: better documentation for the scientific ecosystem in Jupyter
Article thumbnail
Matthias Bussonnier, Camille Carvalho

We present here the idea behind Papyri, a framework we are developing to provide a better documentation experience for the scientific ecosystem. In particular, we wish to provide a documentation browser (from within Jupyter or other IDEs and Python editors) that gives a unified experience, cross library navigation search and indexing.

2022 | Article
Python for Global Applications: teaching scientific Python in context to law and diplomacy students
Article thumbnail
Anna Haensch, Karin Knudson

For students across domains and disciplines, the message has been communicated loud and clear: data skills are an essential qualification for today’s job market. This includes not only the traditional introductory stats coursework but also machine learning, artificial intelligence, and programming in Python or R.

2022 | Article
The myth of the normal curve and what to do about it
Article thumbnail
Allan Campopiano

This paper gives an overview of the issues associated with the normal curve. The concern with traditional methods, in terms of robustness to violations of normality, have been known for over a half century and modern alternatives have been recommended; however, for various reasons that have been discussed, modern robust methods have not yet become commonplace in applied research settings.

2022 | Article
A Novel Pipeline for Cell Instance Segmentation, Tracking and Motility Classification of Toxoplasma Gondii in 3D Space
Article thumbnail
Seyed Alireza Vaezi, Gianni Orlando, Mojtaba Fazli, Gary Ward, Silvia Moreno, Shannon Quinn

Toxoplasma gondii is the parasitic protozoan that causes disseminated toxoplasmosis, a disease that is estimated to infect around one-third of the world's population. While the disease is commonly asymptomatic, the success of the parasite is in large part due to its ability to easily spread through nucleated cells.

2022 | Article
Utilizing SciPy and other open source packages to provide a powerful API for materials manipulation in the Schrödinger Materials Suite
Article thumbnail
Alexandr Fonari, Farshad Fallah, Michael Rauch

The use of several open source scientific packages in the Schrödinger Materials Science Suite will be discussed. A typical workflow for materials discovery will be described, discussing how open source packages have been incorporated at every stage.

2022 | Article
Galyleo: A General-Purpose Extensible Visualization Solution
Article thumbnail
Rick McGeer, Andreas Bergen, Mahdiyar Biazi, Matt Hemmings, Robin Schreiber

Galyleo is an open-source, extensible dashboarding solution integrated with JupyterLab jupyterlab. Galyleo is a standalone web application integrated as an iframe lawson2011introducing into a JupyterLab tab.

2022 | Article
Semi-Supervised Semantic Annotator (S3A): Toward Efficient Semantic Labeling
Article thumbnail
Nathan Jessurun, Daniel E. Capecci, Olivia P. Dizon-Paradis, Damon L. Woodard, Navid Asadizanjani

Most semantic image annotation platforms suffer severe bottlenecks when handling large images, complex regions of interest, or numerous distinct foreground regions in a single image. We have developed the Semi-Supervised Semantic Annotator (S3A) to address each of these issues and facilitate rapid collection of ground truth pixel-level labeled data.

2022 | Article
The Advanced Scientific Data Format (ASDF): An Update
Article thumbnail
Perry Greenfield, Edward Slavich, William Jamieson, Nadia Dencheva

We report on progress in developing and extending the new (ASDF) format we have developed for the data from the James Webb and Nancy Grace Roman Space Telescopes since we reported on it at a previous Scipy.

2023

Proceedings of the Python in Science Conference 2023

There are 19 articles published in this collection
2023 | Article
Python Array API Standard: Toward Array Interoperability in the Scientific Python Ecosystem
Article thumbnail
Aaron Meurer, Athan Reines, Ralf Gommers, Yao-Lung L. Fang, John Kirkham, Matthew Barber, Stephan Hoyer, Andreas Müller, Sheng Zha, Saul Shanabrook, Stephannie Jiménez Gacha, Mario Lezcano-Casado, Thomas J. Fan, Tyler Reddy, Alexandre Passos, Hyukjin Kwon, Travis Oliphant, Consortium for Python Data API Standards

The Python array API standard specifies standardized application programming interfaces and behaviors for array and tensor objects and operations. The establishment and subsequent adoption of the standard aims to reduce ecosystem fragmentation and facilitate array library interoperability.

2023 | Article
A Modified Strassen Algorithm to Accelerate Numpy Large Matrix Multiplication with Integer Entries
Article thumbnail
Anthony Breitzman

We present a Strassen type algorithm for multiplying large matrices with integer entries. The algorithm is the standard Strassen divide and conquer algorithm but it crosses over to Numpy when either the row or column dimension of one of the matrices drops below 128.

2023 | Article
An Accessible Python based Author Identification Process
Article thumbnail
Anthony Breitzman

Author identification also known as ‘author attribution’ and more recently ‘forensic linguistics’ involves identifying true authors of anonymous texts. In this paper we replicate the analysis but in a much more accessible way using modern text mining methods and Python.

2023 | Article
Biomolecular Crystallographic Computing with Jupyter
Article thumbnail
Blaine H. M. Mooers

To further advance this use of Jupyter, we developed a collection of code fragments that use the vast Computational Crystallography Toolbox (cctbx) library for novel analyses. We made versions of this library for use in JupyterLab and Colab.

2023 | Article
Bayesian Statistics with Python, No Resampling Necessary
Article thumbnail
Charles Lindsey

TensorFlow Probability is a powerful library for statistical analysis in Python. Using TensorFlow Probability’s implementation of Bayesian methods, modelers can incorporate prior information and obtain parameter estimates and a quantified degree of belief in the results.

2023 | Article
Using Numba for GPU acceleration of Neutron Beamline Digital Twins
Article thumbnail
Coleman J. Kendrick, Jiao Y. Y. Lin, Garrett E. Granroth

Digital twins of neutron instruments using Monte Carlo ray tracing have proven to be useful in neutron data analysis and verifying instrument and sample designs. In this paper, we present a GPU accelerated version of MCViNE using Python and Numba to balance user extensibility with performance.

2023 | Article
EEG-to-fMRI Neuroimaging Cross Modal Synthesis in Python
Article thumbnail
David Calhas

Electroencepholography and functional magnetic resonance imaging are two ways of recording brain activity. We developed a Python package, EEG-to-fMRI, which provides cross modal neuroimaging synthesis functionalities.

2023 | Article
vak: a neural network framework for researchers studying animal acoustic communication
Article thumbnail
David Nicholson, Yarden Cohen

The study of acoustic communication is being revolutionized by deep neural network models. To address this need, we developed vak, a neural network framework designed for acoustic communication researchers.

2023 | Article
Emukit: A Python toolkit for decision making under uncertainty
Article thumbnail
Andrei Paleyes, Maren Mahsereci, Neil D. Lawrence

Emukit is a highly flexible Python toolkit for enriching decision making under uncertainty with statistical emulation. It is particularly pertinent to complex processes and simulations where data are scarce or difficult to acquire.

2023 | Article
Using Blosc2 NDim As A Fast Explorer Of The Milky Way (Or Any Other NDim Dataset)
Article thumbnail
Project Blosc, Francesc Alted, Marta Iborra, Oscar Guiñón, David Ibáñez, Sergio Barrachina

Large multidimensional datasets are widely used in various engineering and scientific applications. We have added support for large dimensional datasets to Blosc2, a compression and format library.

2023 | Article
MDAKits: A Framework for FAIR-Compliant Molecular Simulation Analysis
Article thumbnail
Irfan Alibay, Lily Wang, Fiona Naughton, Ian Kenney, Jonathan Barnoud, Richard J Gowers, Oliver Beckstein

The reproducibility and transparency of scientific findings are widely recognized as crucial for promoting scientific progress. The MDAKits framework provides a cookiecutter template, best practices documentation, and a continually validated registry.

2023 | Article
The Pandata Scalable Open-Source Analysis Stack
Article thumbnail
James A. Bednar, Martin Durant

As the scale of scientific data analysis continues to grow, traditional domain-specific tools often struggle with data of increasing size and complexity. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-specific analyses.

2023 | Article
Spatial Microsimulation and Activity Allocation in Python: An Update on the Likeness Toolkit
Article thumbnail
Joseph V. Tuccillo, James D. Gaboardi

Understanding human security and social equity issues within human systems requires large-scale models of population dynamics that simulate high-fidelity representations of individuals and access to essential activities (work/school, social, errands, health). Likeness is a Python toolkit that provides spatial microsimulation project.

2023 | Article
itk-elastix: Medical image registration in Python
Article thumbnail
Konstantinos Ntatsis, Niels Dekker, Viktor van der Valk, Tom Birdsong, Dženan Zukić, Stefan Klein, Marius Staring, Matthew McCormick

Image registration plays a vital role in understanding changes that occur in 2D and 3D scientific imaging datasets. In this paper, we introduce itk-elastix, a user-friendly Python wrapping of the mature elastix registration toolbox.

2023 | Article
PyQtGraph - High Performance Visualization for All Platforms
Article thumbnail
Ognyan Moore, Nathan Jessurun, Martin Chase, Nils Nemitz, Luke Campagnola

PyQtGraph is a plotting library with high performance, cross-platform support and interactivity as its primary objectives. These goals are achieved by connecting the Qt GUI framework and the scientific Python ecosystem.

2023 | Article
Pandera: Going Beyond Pandas Data Validation
Article thumbnail
Niels Bantilan

Data quality remains a core concern for practitioners in machine learning, data science, and data engineering, and many specialized packages have emerged to fulfill the need of validating and monitoring data and models. This paper outlines pandera’s motivation and challenges that took it from being a pandas-only data validation framework to one that is extensible to other non-pandas-compliant dataframe-like libraries.

2023 | Article
libyt: a Tool for Parallel In Situ Analysis with yt
Article thumbnail
Shin-Rong Tsai, Hsi-Yu Schive, Matthew J. Turk

In the era of exascale computing, storage and analysis of large scale data have become more important and difficult. We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or other Python packages in parallel during simulation runtime.

2023 | Article
Data Reduction Network
Article thumbnail
Haoyin Xu, Haw-minn Lu, José Unpingco

Multidimensional categorical data is widespread but not easily visualized using standard methods. For example, questionnaire data generally consists of questions with categorical responses. Popular methods of handling categorical data include one-hot encoding and enumeration, which applies an unwarranted and potentially misleading notional order to the data. To address this, we introduce a novel visualization method named Data Reduction Network.

2023 | Article
aPhyloGeo-Covid: A Web Interface for Reproducible Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake
Article thumbnail
Wanlin Li, Nadia Tahiri

The gene sequencing data, along with the associated lineage tracing and research data generated throughout the Coronavirus disease 2019 (COVID-19) pandemic, constitute invaluable resources that profoundly empower phylogeography research. To optimize the utilization of these resources, we have developed an interactive analysis platform called aPhyloGeo-Covid.

Articles

A collection of research articles

There are 0 articles published in this collection