
Posters and Slides
Accepted Paper Slides¶
It’s Time for the Atmospheric Science Community to ACT Together
The Atmospheric data Community Toolkit (ACT) is an open-source Python library for working with n-dimensional atmospheric time-series datasets. ACT contains functions for every aspect of the research lifecycle.
Adam Theisen
Adopting static typing in scientific projects
Are you interested in adding typing to your existing codebase, but are not sure how to get started? Are you worried about managing the typing process without pausing your project’s development?
In this talk, we’ll embrace the fact that a large project’s transition toward typing will likely happen over the course of many months, concurrently with ongoing development. However, that doesn’t mean that getting started with typing has to be difficult! We’ll share with you two examples of adopting typing in existing open-source codebases (100k and 40k lines of Python). We’ll particularly focus on the typing experience from the perspective of project maintainers, contributors, and users of these Python libraries.
We will discuss useful tools and strategies, surprising difficulties, the types of bugs and errors we found, and how the addition of typing changes the overall development experience. By the end of this talk, you’ll be able to confidently manage the migration toward typing in your own codebase.
Predrag Gruevski, Colin Carroll
cuCIM - A GPU image I/O and processing library
A presentation introducing RAPIDS cuCIM, a library for image I/O and processing on GPUs
Gregory R. Lee, Gigon Bae, Benjamin Zaitlen, John Kirkham, Rahul Choudhury
Distributed statistical inference with pyhf powered by funcX
In high energy physics (HEP) a core component of analysis of data collected at the Large Hadron Collider is performing statistical inference for binned models to extract physics information. The statistical fitting tools used in HEP have traditionally been implemented in C++, but in recent years pyhf, a pure-Python library with automatic differentiation and hardware acceleration, has grown in use for analysis related statistical inference problems. The fitting of multiple different hypotheses for new physics signatures (signals) is a computational problem that lends itself easily to parallelization, but is hampered on HPC environments by the additional tooling overhead required, which can be very difficult to master. Through use of funcX, a pure-Python high performance function serving system designed to orchestrate scientific workloads across heterogeneous computing resources, pyhf can be used as a highly scalable (fitting) function as a service (FaaS) on HPCs.
Matthew Feickert
Accepted Posters¶
Towards a Scientific Workflow Description: a yt Project Prototype for Interdisciplinary Analysis
Scientific workflow description provides an alternative to the cognitive overhead of learning a new software package and use of imperative programming paradigms often used with python. This description is encoded in a JSON schema, accessed by the user through a configuration file, and run using python modules that attach the configuration file to the code which produces output. We use yt, an computational astrophyics tool, to demonstrate how a domain specific software can operate within a descriptive framework.
Samantha Walkow, Dr. Chris Havlin, Dr. Matthew Turk, Dr. Corentin Cadiou
Using Python for Analysis and Verification of Mixed-mode Signal Chains for Analog Signal Acquisition
Accurate, precise, and low-noise sensor measurements are essential before any machine can learn about (or artificial-intelligently make decisions about) the physical world. Modern, highly integrated signal acquisition devices can perform analog signal conditioning, digitization, and digital filtering on a single silicon device, greatly simplifying system electronics. However, a complete understanding of the signal chain properties is still required to correctly utilize and debug these modern devices.
Mark Thoren, Cristina Suteu
Speeding Up Molecular Dynamics Trajectory Analysis with MPI Parallelization
Edis Jakupovic, Oliver Beckstein
Social Media Analysis using Natural Language Processing Techniques
Social media is very popularly used every day with daily content viewing and/or posting that in turn influences people around this world in a variety of ways. Social media platforms, such as YouTube, have a lot of activity that goes on every day in terms of video posting, watching and commenting. While we can open the YouTube app on our phones and look at videos and what people are commenting, it only gives us a limited view as to kind of things others around us care about and what is trending amongst other consumers of our favorite topics or videos. Crawling some of this raw data and performing analysis on it using Natural Language Processing (NLP) can be tricky given the different styles of language usage by people in today’s world. This effort highlights the YouTube’s open Data API and how to use it in python to get the raw data, data cleaning using NLP tricks and Machine Learning in python for social media interactions, and extraction of trends and key influential factors from this data in an automated fashion using pyYouTubeAnalysis.
Jyotika Singh
Programmatically Identifying Cognitive Biases Present in Software Development
Mitigating bias in AI-enabled systems is a topic of great concern within the research community. We began developing an approach to identify a subset of cognitive biases that may be present in development artifacts (e.g., version control commit messages): anchoring bias, availability bias, confirmation bias, and hyperbolic discounting. We developed multiple natural language processing (NLP) models to identify and classify the presence of bias in text originating from software development artifacts.
Amanda E. Kraft, Matthew Widjaja, Trevor M. Sands, Brad J. Galego
Visualize 3D scientific data in a Pythonic way like matplotlib
Do you want to visualize 3D scientific data in a Pythonic way like matplotlib? If you want, this poster is for you. This poster is the introduction of PyVista.
Tetsuo Koyama
causal-curve: tools to perform causal inference given a continuous treatment
There are a multitude of scenarios in both research and industry where this would be useful to evaluate the impact of a continuous “treatment” on an outcome of interest in a causal inference framework. Unfortunately, we are not aware of an established python package that is able to perform this. The causal-curve package attempts to fill that gap, providing users with tools to generate causal dose-response curves (AKA causal curves).
Roni Kobrosly
SciPy 2021: An Accurate Implementation of the Studentized Range Distribution for Python
As data becomes more and more accessible, it can be tempting to misuse data analysis techniques to find statistically significant results, a practice known as ‘p-hacking’. Tukey’s HSD (Honestly Significant Difference) test is one of several tests that guards against this practice by using the studentized range distribution to compute p-values that account for the number of comparisons performed. Implementations of Tukey’s HSD already exist within the scientific Python ecosystem, but they rely on approximations of the studentized range distribution that may not behave well outside of their intended range and, even within the intended range, are only accurate to a few digits. In this document, we present a fast, highly accurate, and direct implementation of the studentized range distribution for SciPy, and we demonstrate its speed and accuracy.
Samuel Wallan, Dominic Chmiel, Matt Haberland
Cell Tracking in 3D using Deep Learning Segmentations
Live-cell imaging is highly used technique to study cell migration and dynamics over tile. Automated analysis of florescently membrane-labelled cells can be highly challenging due to their irregular shape, variability in size and dynamic movement across Z planes making it difficult to detect and track them. Ze introduce a detailed analysis pipeline to perform segmentation with accurate shape information, combined with BTrackmate, a customized codebase of popular ImageJ/Fiji software Trackmate, to perform cell tracking inside the tissue of interest. We also created an interface in Napari to visualize the tracks along a chosen view making it possible to follow a cell along the plane of motion. We provide a detailed protocaol to implement this pipeline in a new dataset, together with the required Jupyter notebooks.
Varun Kapoor, Claudia Carabana
SciPy Tools Plenaries¶
Awkward Array
Tools update on Awkward Array.
Jim Pivarski
SciPy Tools Plenary on Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.
Elliott Sales de Andrade
NumPy – Annual Update
Presentation about the highlights and milestones of the NumPy project in 2020-2021
Inessa Pawson
SciPy Tools Plenary: Jupyter Updates
Project Jupyter creates open source software, standards, and services for interactive computing. This presentation covers recent milestones and ideas for people to contribute across the Jupyter ecosystem.
Isabela Presedo-Floyd, Matthias Bussonnier
Scientific Python Ecosystem Coordination
Planning for the Next Decade of Scientific Python: outline of first phase
K. Jarrod Millman, Stéfan van der Walt
SciPy: SciPy 2021 Tools Track
2021 updates and outlooks in SciPy
Pamphile T. Roy
SciPy Tools Plenary: scikit-image annual update
A brief update on recent improvements and future plans for scikit-image.
Gregory R. Lee
Lightning Talks¶
Social Media Analysis using Natural Language Processing Techniques
Demonstration of social media noise and cleaning methods, followed by trend analysis on YouTube with NLP and statistics using pyYouTubeAnalysis.
Jyotika Singh
seaborn-image : image data visualization in Python
High level API for attractive and descriptive image visualization in Python built on top of matplotlib
Sarthak Jariwala