Contents
Proceedings of SciPy 2021
SciPy 2021, the 20th annual Scientific Computing with Python conference, was a virtual conference held July 12-18, 2021. 20 peer reviewed articles were published in the conference proceedings. Full proceedings, posters and slides, and organizing committee can be found at https://
PyBMRB: Data visualization tool for BioMagResBank
PyBMRB: Data visualization tool for BioMagResBank
The Biological Magnetic Resonance Data Bank (BioMagResBank or BMRB https://bmrb.io), founded in 1988, is the international, open archive for data generated by Nuclear Magnetic Resonance (NMR) spectroscopy of biological systems.
Kumaran Baskaran, Jonathan R Wedell, Eldon L. Ulrich, +2
https://doi.org/10.25080/majora-1b6fd038-00a
Social Media Analysis using Natural Language Processing Techniques
Social Media Analysis using Natural Language Processing Techniques
Social media is very popularly used every day with daily content viewing and/or posting that in turn influences people around this world in a variety of ways. Social media platforms, such as YouTube, have a lot of activity that goes on every day in terms of video posting, watching and commenting.
Jyotika Singh
https://doi.org/10.25080/majora-1b6fd038-009
PyCID: A Python Library for Causal Influence Diagrams
PyCID: A Python Library for Causal Influence Diagrams
Why did a decision maker select a certain decision? What behaviour does a certain objective incentivise? How can we improve this behaviour and ensure that a decision-maker chooses decisions with safer or fairer consequences? This paper introduces the Python package PyCID, built upon pgmpy, that implements (causal) influence diagrams, a widely used graphical modelling framework for decision-making problems.
James Fox, Tom Everitt, Ryan Carey, +3
https://doi.org/10.25080/majora-1b6fd038-008
CLAIMED, a visual and scalable component library for Trusted AI
CLAIMED, a visual and scalable component library for Trusted AI
CLAIMED is a component library for artificial intelligence, machine learning, \textquotedbl{}extract, transform, load\textquotedbl{} processes and data science. The goal is to enable low-code/no-code rapid prototyping by providing ready-made components for various business domains, supporting various computer languages, working on various data flow editors and running on diverse execution engines.
Romeo Kienzler, Ivan Nesic
https://doi.org/10.25080/majora-1b6fd038-007
Natural Language Processing with Pandas DataFrames
Natural Language Processing with Pandas DataFrames
Most areas of Python data science have standardized on using Pandas DataFrames for representing and manipulating structured data in memory. Natural Language Processing (NLP), not so much.
We believe that Pandas has the potential to serve as a universal data structure for NLP data.
Frederick Reiss, Bryan Cutler, Zachary Eichenberger
https://doi.org/10.25080/majora-1b6fd038-006
MPI-parallel Molecular Dynamics Trajectory Analysis with the H5MD Format in the MDAnalysis Python Package
MPI-parallel Molecular Dynamics Trajectory Analysis with the H5MD Format in the MDAnalysis Python Package
Molecular dynamics (MD) computer simulations help elucidate details of the molecular processes in complex biological systems, from protein dynamics to drug discovery. One major issue is that these MD simulation files are now commonly terabytes in size, which means analyzing the data from these files becomes a painstakingly expensive task.
Edis Jakupovic, Oliver Beckstein
https://doi.org/10.25080/majora-1b6fd038-005
Accelerating Spectroscopic Data Processing Using Python and GPUs on NERSC Supercomputers
Accelerating Spectroscopic Data Processing Using Python and GPUs on NERSC Supercomputers
The Dark Energy Spectroscopic Instrument (DESI) will create the most detailed 3D map of the Universe to date by measuring redshifts in light spectra of over 30 million galaxies. The extraction of 1D spectra from 2D spectrograph traces in the instrument output is one of the main computational bottlenecks of DESI data processing pipeline, which is predominantly implemented in Python.
Daniel Margala, Laurie Stephey, Rollin Thomas, +1
https://doi.org/10.25080/majora-1b6fd038-004
signac: Data Management and Workflows for Computational Researchers
signac: Data Management and Workflows for Computational Researchers
The signac data management framework (https://signac.io) helps researchers execute reproducible computational studies, scales workflows from laptops to supercomputers, and emphasizes portability and fast prototyping.
Bradley D. Dice, Brandon L. Butler, Vyas Ramasubramani, +7
https://doi.org/10.25080/majora-1b6fd038-003
Modernizing computing by structural biologists with Jupyter and Colab
Modernizing computing by structural biologists with Jupyter and Colab
Protein crystallography produces most of the protein structures used in structure-based drug design. The process of protein structure determination is computationally intensive and error-prone because many software packages are involved.
Blaine H. M. Mooers
https://doi.org/10.25080/majora-1b6fd038-002
Using Python for Analysis and Verification of Mixed-mode Signal Chains
Using Python for Analysis and Verification of Mixed-mode Signal Chains
Any application involving sensitive measurements of the physical world starts with accurate, precise, and low-noise signal chain. Modern, highly integrated data acquisition devices can often be directly connected to sensor outputs, performing analog signal conditioning, digitization, and digital filtering on a single silicon device, greatly simplifying system electronics.
Mark Thoren, Cristina Suteu
https://doi.org/10.25080/majora-1b6fd038-001
How PDFrw and fillable forms improves throughput at a Covid-19 Vaccine Clinic
How PDFrw and fillable forms improves throughput at a Covid-19 Vaccine Clinic
PDFrw was used to prepopulate Covid-19 vaccination forms to improve the efficiency and integrity of the vaccination process in terms of federal and state privacy requirements. We will describe the vaccination process from the initial appointment, through the vaccination delivery, to the creation of subsequent required documentation.
Haw-minn Lu, José Unpingco
https://doi.org/10.25080/majora-1b6fd038-000
Cell Tracking in 3D using deep learning segmentations
Cell Tracking in 3D using deep learning segmentations
Live-cell imaging is a highly used technique to study cell migration and dynamics over time. Although many computational tools have been developed during the past years to automatically detect and track cells, they are optimized to detect cell nuclei with similar shapes and/or cells not clustering together.
Varun Kapoor, Claudia Carabaña
https://doi.org/10.25080/majora-1b6fd038-014
CNN Based ToF Image Processing
CNN Based ToF Image Processing
In this paper a Time of Flight (ToF) camera specific data processing pipeline is presented, followed by real life applications using artificial intelligence. These applications include use cases such as gesture recognition, movement direction estimation or physical exercises monitoring.
Marian-Leontin Pop, Szilard Molnar, Alexandru Pop, +3
https://doi.org/10.25080/majora-1b6fd038-013
Multithreaded parallel Python through OpenMP support in Numba
Multithreaded parallel Python through OpenMP support in Numba
A modern CPU delivers performance through parallelism. A program that exploits the performance available from a CPU must run in parallel on multiple cores. This is usually best done through multithreading.
Todd Anderson, Tim Mattson
https://doi.org/10.25080/majora-1b6fd038-012
Training machine learning models faster with Dask
Training machine learning models faster with Dask
Machine learning (ML) relies on stochastic algorithms, all of which rely on gradient approximations with \textquotedbl{}batch size\textquotedbl{} examples. Growing the batch size as the optimization proceeds is a simple and usable method to reduce the training time, provided that the number of workers grows with the batch size.
Joesph Holt, Scott Sievert
https://doi.org/10.25080/majora-1b6fd038-011
Monitoring Scientific Python Usage on a Supercomputer
Monitoring Scientific Python Usage on a Supercomputer
In 2021, more than 30\% of users at the National Energy Research Scientific Computing Center (NERSC) used Python on the Cori supercomputer. To determine this we have developed and open-sourced a simple, minimally invasive monitoring framework that leverages standard Python features to capture Python imports and other job data via a package called \textquotedbl{}Customs\textquotedbl{}.
Rollin Thomas, Laurie Stephey, Annette Greiner, +1
https://doi.org/10.25080/majora-1b6fd038-010
Classification of Diffuse Subcellular Morphologies
Classification of Diffuse Subcellular Morphologies
Characterizing dynamic sub-cellular morphologies in response to perturbation remains a challenging and important problem. Many organelles are anisotropic and difficult to segment, and few methods exist for quantifying the shape, size, and quantity of these organelles.
Neelima Pulagam, Marcus Hill, Mojtaba Fazli, +6
https://doi.org/10.25080/majora-1b6fd038-00f
PyRSB: Portable Performance on Multithreaded Sparse BLAS Operations
PyRSB: Portable Performance on Multithreaded Sparse BLAS Operations
This article introduces PyRSB, a Python interface to the LIBRSB library. LIBRSB is a portable performance library offering so called Sparse BLAS (Sparse Basic Linear Algebra Subprograms) operations for modern multicore CPUs.
Michele Martone, Simone Bacchio
https://doi.org/10.25080/majora-1b6fd038-00e
Programmatically Identifying Cognitive Biases Present in Software Development
Programmatically Identifying Cognitive Biases Present in Software Development
Mitigating bias in AI-enabled systems is a topic of great concern within the research community. While efforts are underway to increase model interpretability and de-bias datasets, little attention has been given to identifying biases that are introduced by developers as part of the software engineering process.
Amanda E. Kraft, Matthew Widjaja, Trevor M. Sands, +1
https://doi.org/10.25080/majora-1b6fd038-00c
Conformal Mappings with SymPy: Towards Python-driven Analytical Modeling in Physics
Conformal Mappings with SymPy: Towards Python-driven Analytical Modeling in Physics
This contribution shows how the symbolic computing Python library SymPy can be used to improve flow force modeling due to a Couette-type flow, i.e. a flow of viscous fluid in the region between two bodies, where one body is in tangential motion relative to the other.
Zoufiné Lauer-Baré, Erich Gaertig
https://doi.org/10.25080/majora-1b6fd038-00b