Proceedings of SciPy 2018
SciPy 2018, the 17th annual Scientific Computing with Python conference, was held July 9-15, 2018 in Austin, Texas. 24 peer reviewed articles were published in the conference proceedings. Full proceedings and organizing committee can be found at https://
Yaksh is a free and open-source online evaluation platform. At its core, Yaksh focuses on problem-based learning and lets teachers create practice exercises and quizzes which are evaluated in real-time.
Computational research requires versatile data and workflow management tools that can easily adapt to the highly dynamic requirements of scientific investigations. Many existing tools require strict adherence to a particular usage pattern, so researchers often use less robust ad hoc solutions that they find easier to adopt.
Deep learning techniques have greatly advanced the performance of the already rapidly developing field of computer vision, which powers a variety of emerging technologies—from facial recognition to augmented reality to self-driving cars.
This work began when the two authors met at a software development meeting. Konstantinos was building Bayesian models in his research and wanted to learn how to better manage his research process. Marianne was working on data analysis workflows in industry and wanted to learn more about Bayesian statistics.
In this work, we describe the code structure, implementation, and usage of a Python-based, open-source framework, pyPRISM, for conducting polymer liquid-state theory calculations. Polymer Reference Interaction Site Model (PRISM) theory describes the equilibrium spatial-correlations, thermodynamics, and structure of liquid-like polymer systems and macromolecular materials.
The neighborhood effects literature represents a wide span of the social sciences broadly concerned with the influence of spatial context on social processes. From the study of segregation dynamics, the relationships between the built environment and health outcomes, to the impact of concentrated poverty on social efficacy, neighborhoods are a central construct in empirical work.
Binder is an open source web service that lets users create sharable, interactive, reproducible environments in the cloud. It is powered by other core projects in the open source ecosystem, including JupyterHub and Kubernetes for managing cloud resources.
Oceanographic expeditions commonly generate millions of data points for various chemical, biological, and physical features, all in different formats. Scientific Python tools are extremely useful for synthesizing this data to make sense of major trends in the changing ocean environment.
We present the software tool pyPAHdb to the scientific astronomical community, which is used to characterize emission from one of the most prevalent types of organic molecules in space, namely polycyclic aromatic hydrocarbons (PAHs).
The focus of this paper is on teaching real-time digital signal processing to electrical and computer engineers using the Jupyter notebook and the code module `pyaudio_helper`, which is a component of the package scikit-dsp-comm.
This paper describes a Python computational tool for exploring the use of the extended Kalman filter (EKF) for position estimation using the Global Positioning System (GPS) pseudorange measurements. The development was motivated by the need for an example generator in a training class on Kalman filtering, with emphasis on GPS.
Nonlinear multidimensional spectroscopy (MDS) is a powerful experimental technique used to interrogate complex chemical systems. MDS promises to reveal energetics, dynamics, and coupling features of and between the many quantum-mechanical states that these systems contain.
Plotly.js is a declarative JavaScript data visualization library built on D3 and WebGL that supports a wide range of statistical, scientific, financial, geographic, and 3-dimensional visualizations. Support for creating Plotly.
This paper is about sparse multi-dimensional arrays in Python. We discuss their applications, layouts, and current implementations in the SciPy ecosystem along with strengths and weaknesses. We then introduce a new package for sparse arrays that builds on the legacy of the scipy.
Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use.
In machine learning tasks, it is common to handle missing data by removing observations with missing values, or replacing missing data with the mean value for its feature. To show why this is problematic, we use listwise deletion and mean imputing to recover missing values from artificially created datasets, and we compare those models against ones with full information.
Building environmental simulation workflows is typically a slow process involving multiple proprietary desktop tools that do not interoperate well. In this work, we demonstrate building flexible, lightweight workflows entirely in Jupyter notebooks.
Packages developed under the auspices of the Astropy Project (astropy2013, astropy2018) address many common problems faced by astronomers in their computational projects. In this paper we describe how capabilities provided by Astropy have been employed in two current projects.
Increased prevalence of smartphones and wearable devices has facilitated the collection of triaxial accelerometer data for numerous Human Activity Recognition (HAR) tasks. Concurrently, advances in the theory and implementation of long short-term memory (LSTM) recurrent neural networks (RNNs) has made it possible to process this data in its raw form, enabling on-device online analysis.
The Economics Algorithmic Repository and toolKit (Econ-ARK) aims to become a focal resource for computational economics. Its first ‘framework,’ the Heterogeneous Agent Resources and Toolkit (HARK), provides a modern, robust, transparent set of tools to solve a class of macroeconomic models whose usefulness has become increasingly apparent both for economic policy and for research purposes, but whose adoption has been limited because the existing literature derives from idiosyncratic, hand-crafted, and often impenetrable legacy code.
Python is popular among scientific communities that value its simplicity and power, especially as it comes along with numeric libraries such as NumPy, SciPy, Dask, and Numba. As CPU core counts keep increasing, these modules can make use of many cores via multi-threading for efficient multi-core parallelism.
We seek to understand the current state of equity, scalability, and sustainability of data science education infrastructure in both the U.S. and Canada. Our analysis of the technological, funding, and organizational structure of four types of institutions shows an increasing divergence in the ability of universities across the United States to provide students with accessible data science education infrastructure, primarily JupyterHub.
The use of fluorescence microscopy has catalyzed new insights into biological function, and spurred the development of quantitative models from rich biomedical image datasets. While image processing in some capacity is commonplace for extracting and modeling quantitative knowledge from biological systems at varying scales, general-purpose approaches for more advanced modeling are few.
We introduce Cloudknot, a software library that simplifies cloud-based distributed computing by programmatically executing user-defined functions (UDFs) in AWS Batch. It takes as input a Python function, packages it as a container, creates all the necessary AWS constituent resources to submit jobs, monitors their execution and gathers the results, all from within the Python environment.