From the 2023 Proceedings
Data Reduction Network
Data Reduction Network
Multidimensional categorical data is widespread but not easily visualized using standard methods. For example, questionnaire data generally consists of questions with categorical responses. Popular methods of handling categorical data include one-hot encoding and enumeration, which applies an unwarranted and potentially misleading notional order to the data. To address this, we introduce a novel visualization method named Data Reduction Network.
Haoyin Xu, Haw-minn Lu, José Unpingco
https://doi.org/10.25080/gerudo-f2bc6f59-012
libyt: a Tool for Parallel In Situ Analysis with yt
libyt: a Tool for Parallel In Situ Analysis with yt
In the era of exascale computing, storage and analysis of large scale data have become more important and difficult. We present libyt, an open source C++ library, that allows researchers to analyze and visualize data using yt or other Python packages in parallel during simulation runtime.
Shin-Rong Tsai, Hsi-Yu Schive, Matthew J. Turk
https://doi.org/10.25080/gerudo-f2bc6f59-011
Pandera: Going Beyond Pandas Data Validation
Pandera: Going Beyond Pandas Data Validation
Data quality remains a core concern for practitioners in machine learning, data science, and data engineering, and many specialized packages have emerged to fulfill the need of validating and monitoring data and models. This paper outlines pandera’s motivation and challenges that took it from being a pandas-only data validation framework to one that is extensible to other non-pandas-compliant dataframe-like libraries.
Niels Bantilan
https://doi.org/10.25080/gerudo-f2bc6f59-010
aPhyloGeo-Covid: A Web Interface for Reproducible Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake
aPhyloGeo-Covid: A Web Interface for Reproducible Phylogeographic Analysis of SARS-CoV-2 Variation using Neo4j and Snakemake
The gene sequencing data, along with the associated lineage tracing and research data generated throughout the Coronavirus disease 2019 (COVID-19) pandemic, constitute invaluable resources that profoundly empower phylogeography research. To optimize the utilization of these resources, we have developed an interactive analysis platform called aPhyloGeo-Covid.
Wanlin Li, Nadia Tahiri
https://doi.org/10.25080/gerudo-f2bc6f59-00f
The annual SciPy Conferences allows participants from academic, commercial, and governmental organizations to:
- showcase their latest Scientific Python projects,
- learn from skilled users and developers, and
- collaborate on code development.
The conferences generally consists of multiple days of tutorials followed by two-three days of presentations, and concludes with 1-2 days developer sprints on projects of interest to the attendees.
- (N.d.). 10.25080/issn.2575-9752