ProceedingsSciPy ProceedingsContent License: Creative Commons Attribution 3.0 Unported (CC-BY-3.0)Credit must be given to the creatorProceedings of the 19th Python in Science ConferenceSciPy 2020, Austin, Texas July 6 - July 12July 6, 2020https://doi.org/10.25080/Majora-342d178e-02bDownload PDFDownload BibtexBack to ArticlePosters and SlidesDownload ArticleContentsProceedings of the 19th Python in Science ConferenceOrganizationPosters and SlidesSponsored StudentsSupporting DocumentsOrganizationPosters and SlidesSponsored StudentsPosters and SlidesAccepted Paper Slides¶Treating gridded geospatial data as point data to simplify analyticsTreating gridded geospatial data as point data to simplify analyticsGridded geospatial remote sensing (satellite) data has traditionally been stored in file-based multidimensional arrays to preserve the locality of data. Measurements from locations that are physically next to each other on earth remain next to each other in the arrays. Maintaining this locality is useful when running calculations like reprojection, but unnecessary for many other calculations. This talk will go through a real world example of a tool redesign at the Goddard Earth Sciences Data and Information Services Center (GES DISC), showing the advantages of using the data frame model for calculating summary statistics, where measurement proximity is unimportant.Christine Smit, Hailiang Zhang, Mahabaleshwara Hegde, +2https://doi.org/10.25080/Majora-342d178e-019Arkouda: Terascale Data Science at Interactive RatesArkouda: Terascale Data Science at Interactive RatesThis talk describes Arkouda, a Python package that we have developed for doing exploratory analysis on massive data sets at interactive rates. Arkouda's API is based on NumPy/Pandas, yet its arrays can be transparently distributed across the compute nodes of a cluster or supercomputer to support large-scale analytics. In our work, we have run Arkouda operations from Jupyter notebooks on TB-sized data sets in seconds to small numbers of minutes—achieving scalability and performance that we have not observed with competing technologies.Benjamin Albrecht, Michael Merrill, William Reus, +1https://doi.org/10.25080/Majora-342d178e-01aBoost-histogram: High-Performance Histograms as ObjectsBoost-histogram: High-Performance Histograms as ObjectsBoost-histogram is a new Python library that provides Histograms that can be filled, manipulated, sliced, and projected as objects.Henry Schreiner, Hans Dembinski, Jim Pivarski, +1https://doi.org/10.25080/Majora-342d178e-01bOpen-source bioimage analysis software to accelerate drug discoveryOpen-source bioimage analysis software to accelerate drug discoveryAnne Carpenterhttps://doi.org/10.25080/Majora-342d178e-01ccuSignal - GPU Accelerating SciPy Signal with Numba and CuPycuSignal - GPU Accelerating SciPy Signal with Numba and CuPycuSignal is a GPU accelerated signal processing library built around a SciPy Signal-like API, CuPy, and custom Numba and CuPy CUDA kernels. cuSignal is written exclusively in Python and demonstrates GPU speeds without a C++ software layer.Adam Thompson, Matt Nicely, Graham Markall, +1https://doi.org/10.25080/Majora-342d178e-01dFrictionless Data for Reproducible BiologyFrictionless Data for Reproducible BiologyThis talk discusses how biologists can make their data more reproducible using Frictionless Data's open source Python librariesLilly Winfreehttps://doi.org/10.25080/Majora-342d178e-01eInteractive Supercomputing with Jupyter at the National Energy Research Scientific Computing CenterInteractive Supercomputing with Jupyter at the National Energy Research Scientific Computing CenterAt the National Energy Research Scientific Computing (NERSC) Center, interactive access to high-performance computing and data through Jupyter is a priority. We will discuss the nuts and bolts of how Jupyter is deployed at NERSC, and how we've adapted to engage the Jupyter ecosystem and open-source community to deliver this key capability to our users. Jupyter is a major component in our Superfacility initiative, which aims to connect experimental and observational big data facilities (telescopes, microscopes, genome sequencers, light sources, etc.) with next-generation supercomputing and data capabilities at NERSC.Rollin Thomas, Shane Canon, Shreyas Cholia, +7https://doi.org/10.25080/Majora-342d178e-01fProject Mjolnir: A Modular, Open-source Platform for Developing Scientific IoT Sensor NetworksProject Mjolnir: A Modular, Open-source Platform for Developing Scientific IoT Sensor NetworksFrom a humble beginning as a side effort using a Raspberry Pi to talk to lightning instruments, Project Mjolnir is evolving into a modular, open source client-server platform for developing scientific IoT sensor networks. Its goal is to enable scientists of many disciplines to employ low-cost hardware to robustly ingest, log and uplink periodic and on-demand science and engineering data and commands, controlled either autonomously or centrally, all with little or no bespoke code. The talk will discuss Mjolnir’s development and future, present examples of current projects built on it, and explore how to leverage it for new applications.C.A.M. Gerlachhttps://doi.org/10.25080/Majora-342d178e-020Pandera: Statistical Data Validation of Pandas DataframesPandera: Statistical Data Validation of Pandas DataframesThis talk introduces pandera, an open source Python package for pandas data validation. It covers data validation in theory and practice, and goes through a case study analysis of the Fatal Encounters dataset to demonstrate how pandera can be used to make data analysis and machine learning more reproducible, robust, and reliable.Niels Bantilanhttps://doi.org/10.25080/Majora-342d178e-021Molecular infrastructure for modeling viruses with pythonic-mediated packages: pyF4allMolecular infrastructure for modeling viruses with pythonic-mediated packages: pyF4allWe model full viruses by coupling short highly-detailed molecular dynamics simulations with lower-resolution (but faster) continuum electrostatic models. Such multiscale approach enables to model a full virus in a desktop/small cluster-level infrastructure, which are available for most researchers. Here, we propose a first interfacing of the pythonic-like packages in a multiscale approach that automatizes the access to state-of-the-art biomolecular simulations via Jupyter Notebooks.Horacio V. Guzmanhttps://doi.org/10.25080/Majora-342d178e-022pyhf: a pure Python statistical fitting library with tensors and autogradpyhf: a pure Python statistical fitting library with tensors and autogradpyhf is a pure-Python implementation of the HistFactory statistical model for multi-bin histogram-based analysis with asymptotic interval estimation, and part of the Scikit-HEP project ecosystem. pyhf supports modern computational graph libraries as computational backends in order to make use of features such as auto-differentiation and GPU acceleration. Additionally, the statistical models are defined in a declarative JSON schema, readily enabling preservation and distribution through services such as the Durham High-Energy Physics Database (HEPData).Matthew Feickerthttps://doi.org/10.25080/Majora-342d178e-023Bringing GPU Support to Datashader: A RAPIDS Case StudyBringing GPU Support to Datashader: A RAPIDS Case StudyA case study on using RAPIDS technologies to add GPU support to the Datashader Python libraryJon Measehttps://doi.org/10.25080/Majora-342d178e-024Learning from evolving data streamsLearning from evolving data streamsA brief introduction to machine learning for evolving data streams. In this field data is assumed infinite and can change over time. scikit-multiflow, a package for stream learning in Python is also presented.Jacob Montielhttps://doi.org/10.25080/Majora-342d178e-025Spatial Algorithms at Scale with spatialpandasSpatial Algorithms at Scale with spatialpandasHow do you analyze 1 trillion rows of geospatial point data? We recently solved this problem using spatialpandas, dask, and parquet file format to efficiently build and execute spatial algorithms at scale. We compare the spatialpandas solution's performance with other cases, and discuss the tradeoffs with various approaches.Dharhas Pothina, Kim Pevey, Adam Lewishttps://doi.org/10.25080/Majora-342d178e-026Accepted Posters¶Decentralized, Deterministic Robot Swarm Control using Blob Methods for PDEsDecentralized, Deterministic Robot Swarm Control using Blob Methods for PDEsA Jupyter notebook about robot swarm control, simulation, digital experiments, and computational considerations, , , +1https://doi.org/10.25080/Majora-342d178e-018SciPy Tools Plenaries¶HoloViz: What’s new and what’s nextHoloViz: What’s new and what’s nextUpdates and roadmaps for Panel, hvPlot, HoloViews, GeoViews, Datashader, Param, and Colorcet. The HoloViz suite of tools together form a unified approach for visualization from exploration to sharing applications and dashboards, building on the SciPy ecosystem to support easy visualization of large multidimensional or columnar datasets.https://doi.org/10.25080/Majora-342d178e-028SciPy Tools Plenary on MatplotlibSciPy Tools Plenary on MatplotlibMatplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. This presentation summarizes changes over the past year, new features, and future plans.https://doi.org/10.25080/Majora-342d178e-029SciPy Tools Plenary on NumbaSciPy Tools Plenary on NumbaNumba is a just-in-time compiler for a subset of Python. This is a short presentation of Numba updates for 2019-2020. https://doi.org/10.25080/Majora-342d178e-02aLightning Talks¶Building an AutoML System for Fun and Non-profitBuilding an AutoML System for Fun and Non-profitThis talk introduces metalearn, a MetaRL-based AutoML system that learns to learn how to propose hyperparameter selections that produce high validation scores on meta-test datasets.https://doi.org/10.25080/Majora-342d178e-027Proceedings of the 19th Python in Science ConferenceOrganizationProceedings of the 19th Python in Science ConferenceSponsored Students