ProceedingsSciPy ProceedingsContent License: Creative Commons Attribution 3.0 Unported (CC-BY-3.0)Credit must be given to the creatorProceedings of the 20th Python in Science ConferenceSciPy 2021, Austin, Texas July 12 - July 18July 12, 2021https://doi.org/10.25080/majora-1b6fd038-02bDownload PDFDownload BibtexBack to ArticlePosters and SlidesDownload ArticleContentsProceedings of the 20th Python in Science ConferenceOrganizationPosters and SlidesSponsored StudentsSupporting DocumentsOrganizationPosters and SlidesSponsored StudentsPosters and SlidesAccepted Paper Slides¶It’s Time for the Atmospheric Science Community to ACT TogetherIt’s Time for the Atmospheric Science Community to ACT TogetherThe Atmospheric data Community Toolkit (ACT) is an open-source Python library for working with n-dimensional atmospheric time-series datasets. ACT contains functions for every aspect of the research lifecycle.Adam Theisenhttps://doi.org/10.25080/majora-1b6fd038-020Adopting static typing in scientific projectsAdopting static typing in scientific projectsAre you interested in adding typing to your existing codebase, but are not sure how to get started? Are you worried about managing the typing process without pausing your project's development? In this talk, we'll embrace the fact that a large project's transition toward typing will likely happen over the course of many months, concurrently with ongoing development. However, that doesn't mean that getting started with typing has to be difficult! We'll share with you two examples of adopting typing in existing open-source codebases (100k and 40k lines of Python). We'll particularly focus on the typing experience from the perspective of project maintainers, contributors, and users of these Python libraries. We will discuss useful tools and strategies, surprising difficulties, the types of bugs and errors we found, and how the addition of typing changes the overall development experience. By the end of this talk, you'll be able to confidently manage the migration toward typing in your own codebase.Predrag Gruevski, Colin Carrollhttps://doi.org/10.25080/majora-1b6fd038-021cuCIM - A GPU image I/O and processing librarycuCIM - A GPU image I/O and processing libraryA presentation introducing RAPIDS cuCIM, a library for image I/O and processing on GPUsGregory R. Lee, Gigon Bae, Benjamin Zaitlen, +2https://doi.org/10.25080/majora-1b6fd038-022Distributed statistical inference with pyhf powered by funcXDistributed statistical inference with pyhf powered by funcXIn high energy physics (HEP) a core component of analysis of data collected at the Large Hadron Collider is performing statistical inference for binned models to extract physics information. The statistical fitting tools used in HEP have traditionally been implemented in C++, but in recent years pyhf, a pure-Python library with automatic differentiation and hardware acceleration, has grown in use for analysis related statistical inference problems. The fitting of multiple different hypotheses for new physics signatures (signals) is a computational problem that lends itself easily to parallelization, but is hampered on HPC environments by the additional tooling overhead required, which can be very difficult to master. Through use of funcX, a pure-Python high performance function serving system designed to orchestrate scientific workloads across heterogeneous computing resources, pyhf can be used as a highly scalable (fitting) function as a service (FaaS) on HPCs.Matthew Feickerthttps://doi.org/10.25080/majora-1b6fd038-023Accepted Posters¶Towards a Scientific Workflow Description: a yt Project Prototype for Interdisciplinary AnalysisTowards a Scientific Workflow Description: a yt Project Prototype for Interdisciplinary AnalysisScientific workflow description provides an alternative to the cognitive overhead of learning a new software package and use of imperative programming paradigms often used with python. This description is encoded in a JSON schema, accessed by the user through a configuration file, and run using python modules that attach the configuration file to the code which produces output. We use yt, an computational astrophyics tool, to demonstrate how a domain specific software can operate within a descriptive framework.Samantha Walkow, Dr. Chris Havlin, Dr. Matthew Turk, +1https://doi.org/10.25080/majora-1b6fd038-017Using Python for Analysis and Verification of Mixed-mode Signal Chains for Analog Signal AcquisitionUsing Python for Analysis and Verification of Mixed-mode Signal Chains for Analog Signal AcquisitionAccurate, precise, and low-noise sensor measurements are essential before any machine can learn about (or artificial-intelligently make decisions about) the physical world. Modern, highly integrated signal acquisition devices can perform analog signal conditioning, digitization, and digital filtering on a single silicon device, greatly simplifying system electronics. However, a complete understanding of the signal chain properties is still required to correctly utilize and debug these modern devices.Mark Thoren, Cristina Suteuhttps://doi.org/10.25080/majora-1b6fd038-018Speeding Up Molecular Dynamics Trajectory Analysis with MPI ParallelizationSpeeding Up Molecular Dynamics Trajectory Analysis with MPI ParallelizationEdis Jakupovic, Oliver Becksteinhttps://doi.org/10.25080/majora-1b6fd038-019Social Media Analysis using Natural Language Processing TechniquesSocial Media Analysis using Natural Language Processing TechniquesSocial media is very popularly used every day with daily content viewing and/or posting that in turn influences people around this world in a variety of ways. Social media platforms, such as YouTube, have a lot of activity that goes on every day in terms of video posting, watching and commenting. While we can open the YouTube app on our phones and look at videos and what people are commenting, it only gives us a limited view as to kind of things others around us care about and what is trending amongst other consumers of our favorite topics or videos. Crawling some of this raw data and performing analysis on it using Natural Language Processing (NLP) can be tricky given the different styles of language usage by people in today’s world. This effort highlights the YouTube’s open Data API and how to use it in python to get the raw data, data cleaning using NLP tricks and Machine Learning in python for social media interactions, and extraction of trends and key influential factors from this data in an automated fashion using pyYouTubeAnalysis.Jyotika Singhhttps://doi.org/10.25080/majora-1b6fd038-01aProgrammatically Identifying Cognitive Biases Present in Software DevelopmentProgrammatically Identifying Cognitive Biases Present in Software DevelopmentMitigating bias in AI-enabled systems is a topic of great concern within the research community. We began developing an approach to identify a subset of cognitive biases that may be present in development artifacts (e.g., version control commit messages): anchoring bias, availability bias, confirmation bias, and hyperbolic discounting. We developed multiple natural language processing (NLP) models to identify and classify the presence of bias in text originating from software development artifacts.Amanda E. Kraft, Matthew Widjaja, Trevor M. Sands, +1https://doi.org/10.25080/majora-1b6fd038-01bVisualize 3D scientific data in a Pythonic way like matplotlibVisualize 3D scientific data in a Pythonic way like matplotlibDo you want to visualize 3D scientific data in a Pythonic way like matplotlib? If you want, this poster is for you. This poster is the introduction of PyVista.Tetsuo Koyamahttps://doi.org/10.25080/majora-1b6fd038-01ccausal-curve: tools to perform causal inference given a continuous treatmentcausal-curve: tools to perform causal inference given a continuous treatmentThere are a multitude of scenarios in both research and industry where this would be useful to evaluate the impact of a continuous “treatment” on an outcome of interest in a causal inference framework. Unfortunately, we are not aware of an established python package that is able to perform this. The `causal-curve` package attempts to fill that gap, providing users with tools to generate causal dose-response curves (AKA causal curves).Roni Kobroslyhttps://doi.org/10.25080/majora-1b6fd038-01dSciPy 2021: An Accurate Implementation of the Studentized Range Distribution for PythonSciPy 2021: An Accurate Implementation of the Studentized Range Distribution for PythonAs data becomes more and more accessible, it can be tempting to misuse data analysis techniques to find statistically significant results, a practice known as 'p-hacking'. Tukey's HSD (Honestly Significant Difference) test is one of several tests that guards against this practice by using the studentized range distribution to compute p-values that account for the number of comparisons performed. Implementations of Tukey's HSD already exist within the scientific Python ecosystem, but they rely on approximations of the studentized range distribution that may not behave well outside of their intended range and, even within the intended range, are only accurate to a few digits. In this document, we present a fast, highly accurate, and direct implementation of the studentized range distribution for SciPy, and we demonstrate its speed and accuracy.Samuel Wallan, Dominic Chmiel, Matt Haberlandhttps://doi.org/10.25080/majora-1b6fd038-01eCell Tracking in 3D using Deep Learning SegmentationsCell Tracking in 3D using Deep Learning SegmentationsLive-cell imaging is highly used technique to study cell migration and dynamics over tile. Automated analysis of florescently membrane-labelled cells can be highly challenging due to their irregular shape, variability in size and dynamic movement across Z planes making it difficult to detect and track them. Ze introduce a detailed analysis pipeline to perform segmentation with accurate shape information, combined with BTrackmate, a customized codebase of popular ImageJ/Fiji software Trackmate, to perform cell tracking inside the tissue of interest. We also created an interface in Napari to visualize the tracks along a chosen view making it possible to follow a cell along the plane of motion. We provide a detailed protocaol to implement this pipeline in a new dataset, together with the required Jupyter notebooks.Varun Kapoor, Claudia Carabanahttps://doi.org/10.25080/majora-1b6fd038-01fSciPy Tools Plenaries¶Awkward ArrayAwkward ArrayTools update on Awkward Array.Jim Pivarskihttps://doi.org/10.25080/majora-1b6fd038-024SciPy Tools Plenary on MatplotlibSciPy Tools Plenary on MatplotlibMatplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.Elliott Sales de Andradehttps://doi.org/10.25080/majora-1b6fd038-025NumPy – Annual UpdateNumPy – Annual UpdatePresentation about the highlights and milestones of the NumPy project in 2020-2021Inessa Pawsonhttps://doi.org/10.25080/majora-1b6fd038-026SciPy Tools Plenary: Jupyter UpdatesSciPy Tools Plenary: Jupyter UpdatesProject Jupyter creates open source software, standards, and services for interactive computing. This presentation covers recent milestones and ideas for people to contribute across the Jupyter ecosystem.Isabela Presedo-Floyd, Matthias Bussonnierhttps://doi.org/10.25080/majora-1b6fd038-027Scientific Python Ecosystem CoordinationScientific Python Ecosystem CoordinationPlanning for the Next Decade of Scientific Python: outline of first phaseK. Jarrod Millman, Stéfan van der Walthttps://doi.org/10.25080/majora-1b6fd038-028SciPy: SciPy 2021 Tools TrackSciPy: SciPy 2021 Tools Track2021 updates and outlooks in SciPyPamphile T. Royhttps://doi.org/10.25080/majora-1b6fd038-029SciPy Tools Plenary: scikit-image annual updateSciPy Tools Plenary: scikit-image annual updateA brief update on recent improvements and future plans for scikit-image.Gregory R. Leehttps://doi.org/10.25080/majora-1b6fd038-02aLightning Talks¶Social Media Analysis using Natural Language Processing TechniquesSocial Media Analysis using Natural Language Processing TechniquesDemonstration of social media noise and cleaning methods, followed by trend analysis on YouTube with NLP and statistics using pyYouTubeAnalysis.Jyotika Singhhttps://doi.org/10.25080/majora-1b6fd038-015seaborn-image : image data visualization in Pythonseaborn-image : image data visualization in PythonHigh level API for attractive and descriptive image visualization in Python built on top of matplotlibSarthak Jariwalahttps://doi.org/10.25080/majora-1b6fd038-016Proceedings of the 20th Python in Science ConferenceOrganizationProceedings of the 20th Python in Science ConferenceSponsored Students