
Empowering Learners - Teaching Reproducible Research with Open-Source Tools
Abstract¶
Reproducibility and open science are increasingly recognized as essential components of rigorous geoscientific research. However, gaps in training often hinder early-career scientists from fully adopting best practices in computational workflows and FAIR (Findable, Accessible, Interoperable, Reusable) publishing. The Facilitating Reproducible Open Geoscience (FROGS) initiative addresses this need by offering an ongoing series of integrated courses and self-paced online modules designed to build capacity in Python and R programming, time series analysis, and open science publishing within the geosciences. To date, three sequential courses have been completed, and this paper reports on these initial offerings. During the courses, participants engaged in hands-on learning activities combining synchronous instruction with asynchronous exercises, supported by the LeapFROGS platform. Survey results demonstrate high participant satisfaction, increased confidence in applying reproducible research methods, and active contributions to open source geoscience projects. This paper details the course design, curriculum content, participant outcomes, and the broader impact of FROGS in fostering a sustainable community of practice dedicated to open and reproducible geoscience. Our findings underscore the critical role of integrated training programs in advancing open science and highlight strategies for scaling such initiatives across scientific domains.
1Introduction and Motivation¶
Sharing research data, software, and workflows enhances reproducibility, collaboration, and the directions of future research Gil et al., 2016 and is fundamental to building a FAIR open science ecosystem. The desire to share and reuse scientific data and software has grown in recent years as funders e.g. Nugroho et al., 2015Eynden et al., 2016Zuiderwijk & Janssen, 2014 and publishers e.g. Fox et al., 2021Erdmann et al., 2021Springer Science,Business Media LLC, 2019 have introduced open science policies emphasizing reproducibility, and as scientists increasingly recognize the benefits of open science Lowndes et al., 2017McKiernan et al., 2016. Consequently, the last decade has seen a proliferation of frameworks to promote open source resources, collaboration among scientists, and sharing of scientific research products. One critical element to the success and long-term sustainability of these resources is the training of scientists, especially in their early-career, in their use. This training needs to take place across scientific practice and publishing.
Facilitating Reproducible Open Geoscience (FROGS) is a U.S. National Science Foundation (NSF) funded initiative designed to speed up the integration of open source tools into the research and publication workflows of geoscientists. FROGS has offered three seven-week long courses on geoscience analysis in Python and R and FAIR publishing principles. The courses emphasized science publishing as an integral part of a research product. Each course included a synchronous workshop that introduced overarching concepts, followed by asynchronous, detailed online materials combined with biweekly office hours designed to help participants integrate the resources into their practice. The asynchronous component was facilitated by the LeapFROGS learning platform Khider et al., 2025, which provided summaries of the key concepts, curated additional resources from the open science community, and offered challenges for learners to assess their comprehension.
This paper provides an overview of the training activities in Section 2, detailing the goals and design of the LeapFROGS platform, before discussing educational outcomes and contributions to open science in Section 3.
2Overview of training activities¶
2.1LeapFROGS¶
The LeapFROGS platform combines lecture content with self-graded exercises to deliver self-paced modules covering diverse aspects of (geo)scientific research. Built using Gatsby and supported by a myBinder backend Jupyter et al., 2018, it offers a Python sandbox environment for hands-on learning, bypassing environment setup difficulties that are often a limiting step for first-timers. The platform features seven modules that explore different facets of scientific practice and publishing. These modules leverage open source tools from the scientific Python community and are specifically tailored to geoscience applications. The platform’s primary aim is to serve as a gateway into the extensive Python educational ecosystem.
The first module, Introduction to Python, covers fundamental concepts including numbers, variables, logic, strings, lists, tuples, sets, dictionaries, conditionals, loops, functions, and classes. The lessons reference the Trinket Python course Trinket, 2015, while additional coding exercises on LeapFROGS offer geoscience-relevant challenges to reinforce learning. The coding exercises are designed as fill-in-the-blank tasks (Figure 1). A key advantage of the platform is its ability to provide hints and reveal solutions to learners who encounter difficulties.

Figure 1:Example of training model on the LeapFROGS platform. (a) An exercise on numbers in Python with links to the Trinket Python course, and (b) an tutorial on Pandas using materials developed by Project Pythia. LeapFROGS tests understanding of these concepts by asking to fill in the blank in a code cell than can be executed on myBinder so learners can run their code to check against the solution. Learners can also ask for helpful hints or see the solution.
The second module, The Scientific Python Stack, introduces libraries common to scientific Python such as Jupyter Kluyver et al., 2016, NumPy Harris et al., 2020, pandas The Pandas Development Team, 2020McKinney, 2010, Matplotlib Hunter, 2007, Cartopy Met Office, 2010, Seaborn Waskom, 2021, statsmodels Seabold & Perktold, 2010, scikit-learn Pedregosa et al., 2011Buitinck et al., 2013, PyTorch Paszke et al., 2019, and Xarray Hoyer & Hamman, 2017. This module adopts the same approach as the Introduction to Python module, featuring lessons from Project Pythia Foundations Rose et al., 2023, alongside tutorials from library developers and coding exercises tailored to reinforce learning within a geoscience context (Figure 1).
The third module, Timeseries Analysis, introduces key concepts and practicums about timeseries analysis, a common task in geoscience studies. The lecture materials were adapted from Emile-Geay’s class at the University of Southern California (USC) on data analysis in the Earth and Environmental Sciences and its accompanying e-book Emile-Geay, 2017. The module covers common data processing techniques, measures of association between timeseries, the use of surrogates for significance testing, and spectral and wavelet analyses. The module is supported by a Jupyter Book Emile-Geay, 2025 that makes use of the Pyleoclim software package Khider et al., 2022.
The fourth module, The Scientific Paper of The Future, introduces concepts in FAIR scientific publishing. As its name indicates, this module presents an updated version Khider & Gil, 2024 of the Geoscience Paper of the Future (GPF) Gil et al., 2016. The GPF represented one of the earliest community efforts to fully document, share, and cite all research products, including data, software, and computational provenance (i.e., describing the workflow used in a study, which links together data and software with their associated parameter values for reproducibility). Although the GPF was published before the FAIR principles Wilkinson et al., 2016Lamprecht et al., 2019Goble et al., 2020, the guidelines outlined in this paper allow for compliance with FAIR. The updated materials reflect this compliance and provide additional guidance on the technologies such as Binder Jupyter et al., 2018 that support FAIR, open and reproducible science. Learners on the platform are able to test their understanding of these concepts using a multiple choice quiz (Figure 2).
The fifth module, Using GitHub for Your Research, focuses on GitHub as a platform to collaborate, share scientific workflow and software, and support project management. The module covers the basics of repositories, forks, pull requests, understanding and setting up actions for automation, and linking a GitHub repository to Zenodo to obtain a persistent identifier. This module provides a high-level overview of GitHub Actions, including their purpose and the basic structure of a workflow file, while more detailed applications—such as containerization and continuous integration—are addressed in the final two modules. Lessons from this chapter were summarized from various sources including the GitHub documentation, blogs from Medium and other sources, and Project Pythia Foundations Rose et al., 2023.

Figure 2:Example of training model on the LeapFROGS platform. Learners can test their understanding using a self-paced quiz that provides feedback in case of an incorrect answer.
The sixth module, Sharing Reproducible Workflows, provides information about using Docker and myBinder to share reproducible science results.
Finally, the last module, Packaging Your Software for Sharing walks through instructions and tutorials on creating Python packages, publishing it on software registries (e.g., The Python Package Index, PyPI), on creating documentation using Sphinx Brandl, 2025 and publishing it on readthedocs.org, on unit testing, on setting up GitHub actions for continuous integration, and on publishing on PyPI. Most of these tutorials were based on the materials developed by the PyOpenSci community.
2.2Training activities¶
We facilitated three complementary training workshops designed to build geoscientists’ skills in computational research, open science publishing, and software development. Participants in each workshop were selected to maximize diversity across geoscientific disciplines and backgrounds. A breakdown of participants by career stage and geoscientific discipline is given in Figure 3.

Figure 3:Distribution of participants by geoscience disciplines and career stage across all three training workshops.
2.2.1PyRATES: Python and R Analysis of Time Series¶
Held in-person in June 2024 at the USC Information Sciences Institute (ISI), PyRATES offered foundational training in Python and R tailored to geoscientists, with an emphasis on time series analysis — a core expertise of the principal investigators that spans multiple geoscience domains. This workshop targeted early-career researchers with little prior programming experience and aimed to introduce the basics of scientific Python and R, statistical concepts in time series analysis, and FAIR science publishing principles. The workshop gathered 18 participants and covered modules 1 through 4 on the LeapFROGS platform. At the end of the asynchronous period, participants were asked to submit a reproducible notebook applying FAIR principles.
2.2.2FAIRLeap: FAIR Publishing in the Geosciences¶
FAIRLeap was designed for researchers actively engaged in geoscience projects, including those preparing manuscripts or reproducibility studies. The workshop introduced FAIR publishing concepts, GitHub for project and software management, and tools such as Docker, Binder, and myBinder for sharing reproducible workflows (modules 4-6 on the LeapFROGS platform). Held virtually in February 2025, FAIRLeap gathered 21 participants, attending mostly asynchronously.
2.2.3Open Geoscience Hackathon¶
The Open Geoscience Hackathon targeted researchers interested in open-source software development and contributions. The workshop focused on practical skills including opening pull requests, software packaging, unit testing, and continuous integration (module 7 on LeapFROGS). Held in person at USC ISI in May 2025, the hackathon gathered 17 participants. Morning sessions were reserved for high-level lectures and guided homework on the PyCatSim Khider, 2025 toy package. This Python package, which simulates cat ownership, was specifically designed for this hackathon. The functions were straightforward to implement and served to illustrate object-oriented programming principles, methods for integrating data into packages, and error handling, all while demonstrating the process of contributing to an open-source project. In the afternoons, participants worked on their own package. By the end of the workshop, participants were able to share the foundation of their package on GitHub.
3Contributions to open science¶
FROGS has contributed to open science in two main ways. First, it has trained the next generation of scientists in Python programming, data science libraries, time series analysis, geoscience data science, and open science sharing and publishing practices. In turn, these scientists have produced exemplars of reproducible studies within their respective communities.
3.1Training the next generation of scientists¶
Exit surveys conducted following the three workshops demonstrated a highly favorable reception among participants. Specifically, 95% of respondents indicated they would recommend the workshops to colleagues, 90% reported that the activities enhanced their understanding of key concepts in geoscience practice and publishing, and 94% expressed increased confidence in applying these principles within their own research endeavors. Participants consistently highlighted the effectiveness of the workshops’ hands-on, project-oriented approach, emphasizing the value of integrating lectures with practical coding exercises, such as contributing to exemplar packages like PyCatSim. Instructional modules on automation, software testing, documentation, and version control via GitHub were particularly well received.
Nevertheless, participant feedback also identified areas for refinement. A notable challenge was accommodating the wide range of prior experience among attendees; some less-experienced participants encountered difficulties with advanced topics, including object-oriented programming, underscoring the need for preparatory materials or more structured introductory sessions. Suggestions were also made to extend the duration of workshops or incorporate additional applied exercises and case studies to foster deeper engagement. Furthermore, enhanced guidance on software environment setup and package publication workflows was recommended to mitigate technical barriers.
Taken together, these findings indicate that the workshops effectively impart practical skills and foster confidence in open science methodologies, while providing actionable insights to improve accessibility and pedagogical design for diverse participant cohorts.
3.2Contributed workflows and software¶
In addition to providing comprehensive training, a significant outcome of the workshops was the active contribution of participants to open science projects. These contributions not only reinforced the practical skills acquired during the sessions but also fostered engagement within the open source geoscience community. The repositories developed or enhanced by participants span a diverse range of geoscience domains and are publicly available on GitHub. Table 1 summarizes these contributions, listing the associated scientific domains alongside the corresponding GitHub repositories.
Table 1:Open science projects contributed by participants, with repository names shown as user/repository_name on GitHub.
Research Area | Repository |
|---|---|
Paleoceanography and Paleoclimatology | frozenarchives/pyRATES |
Volcanology, Geochemistry, and Petrology | ruixiabai/XYplot |
Ocean Sciences | ksc005/pyrates/ |
Earth and Planetary Surface Processes | Dewan-cpu/Decoding-Landslide-Hazard-Assessment |
Atmospheric Sciences | sputhiyamadam/PYRATES_workshop |
Geodesy | chongjh11/pyrates2024 |
Geoinformatics | IGCCP/mindat-locality |
Seismology | vikkybass/PyRates-reproduc |
Paleoceanography and Paleoclimatology | kurtlindberg/leafwaxtools |
Cryosphere Sciences | Duyi-Li/multipol |
GeoHealth | jeanmico/structuralnoisebarriers |
Hydrology | preetika11 |
Paleoceanography and Paleoclimatology | tanaya-g/sedproxy_python |
Hydrology | surabhiupadhyay/pyclimproj |
Hydrology | jsacerot/Rpftools |
Science and Society | nqulizada835/geocleaner |
Earth and Planetary Surface Processes | lefitzpatrick/nbspredictor |
Atmospheric Sciences | bosup/toy_package |
Hydrology | zuhlmann/dataFrameGis |
GeoInformatics | PratyushTripathy/pyrsgisHydrology |
4Conclusions¶
In total, the workshops engaged 56 participants spanning over a dozen geoscience subfields and multiple career stages. Participants produced 8 reproducible notebooks and contributed to 9 open-source repositories for software development, underscoring tangible skill acquisition and active engagement with open science practices. These contributions not only exemplify the immediate outputs of the training but also serve as building blocks for a sustainable community of practice that extends beyond the duration of the workshops. By fostering collaborative development and dissemination of geoscience software and workflows, FROGS has catalyzed lasting shifts in research norms toward greater openness and reproducibility.
FROGS has demonstrated that targeted training in computational methods and FAIR publishing principles can effectively embed open science practices into routine geoscience research workflows. Beyond individual capacity building, the initiative has nurtured a vibrant community of practice that continues to support collaborative development and knowledge sharing. Future efforts will focus on scaling these training models, integrating them with institutional curricula, and expanding partnerships with related open science initiatives to amplify impact. Continued evaluation of long-term participant engagement and contributions will guide ongoing refinement and ensure that FROGS remains responsive to the evolving needs of the geoscience community.
Deborah Khider and Julien Emile-Geay are supported by the U.S. National Science Foundation (NSF) under award number RISE#2324732. David Edge and Nicholas McKay are supported by the U.S. NSF under award number RISE#2324733.
Copyright © 2025 Khider et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creators.
- FAIR
- Findable, Accessible, Interoperable, and Reusable
- FROGS
- Facilitating Reproducible Open Geoscience
- GPF
- Geoscience Paper of the Future
- ISI
- Information Sciences Institute
- NSF
- National Science Foundation
- PyPI
- Python Package Index
- USC
- University of Southern California
- Gil, Y., David, C. H., Demir, I., Essawy, B. T., Fulweiler, R. W., Goodall, J. L., Karlstrom, L., Lee, H., Mills, H. J., Oh, J., Pierce, S. A., Pope, A., Tzeng, M. W., Villamizar, S. R., & Yu, X. (2016). Toward the Geoscience Paper of the Future: Best practices for documenting and sharing research from data to software to provenance. Earth and Space Science, 3(10), 388–415. 10.1002/2015ea000136
- Nugroho, R. P., Zuiderwijk, A., Janssen, M., & de Jong, M. (2015). A comparison of national open data policies: lessons learned. Transforming Government: People, Process and Policy, 9(3), 286–308. 10.1108/tg-03-2014-0008
- Eynden, V. V. D., Knight, G., Vlad, A., Radler, B., Tenopir, C., Leon, D., Manista, F., Whitworth, J., & Corti, L. (2016). Survey of Wellcome researchers and their attitudes to open research. 10.6084/M9.FIGSHARE.4055448.V1
- Zuiderwijk, A., & Janssen, M. (2014). Open data policies, their implementation and impact: A framework for comparison. Government Information Quarterly, 31(1), 17–29. 10.1016/j.giq.2013.04.003
- Fox, P., Erdmann, C., Stall, S., Griffies, S. M., Beal, L. M., Pinardi, N., Hanson, B., Friedrichs, M. A. M., Feakins, S., Bracco, A., Pirenne, B., & Legg, S. (2021). Data and Software Sharing Guidance for Authors Submitting to AGU Journals. Zenodo. 10.5281/ZENODO.5124741
- Erdmann, C., Stall, S., Gentemann, C., Holdgraf, C., Fernandes, F. P. A., Gehlen, K. P., & Corvellec, M. (2021). Guidance for AGU Authors - Jupyter Notebooks. Zenodo. 10.5281/ZENODO.4774440
- (2019). Nature Methods, 16(3), 207–207. 10.1038/s41592-019-0350-x
- Lowndes, J. S. S., Best, B. D., Scarborough, C., Afflerbach, J. C., Frazier, M. R., O’Hara, C. C., Jiang, N., & Halpern, B. S. (2017). Our path to better science in less time using open data science tools. Nature Ecology & Evolution, 1(6). 10.1038/s41559-017-0160
- McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., Lin, J., McDougall, D., Nosek, B. A., Ram, K., Soderberg, C. K., Spies, J. R., Thaney, K., Updegrove, A., Woo, K. H., & Yarkoni, T. (2016). How open science helps researchers succeed. eLife, 5. 10.7554/elife.16800
- Khider, D., Emile-Geay, J., & Edge, D. (2025). LeapFROGS: A learning platform for Python, time series analysis and open science publishing (v0.1.0) [Computer software]. 10.5281/zenodo.14783497
- Project Jupyter, Matthias Bussonnier, Jessica Forde, Jeremy Freeman, Brian Granger, Tim Head, Chris Holdgraf, Kyle Kelley, Gladys Nalvarte, Andrew Osheroff, Pacer, M., Yuvi Panda, Fernando Perez, & Benjamin Ragan Kelley. (2018). Binder 2.0 - Reproducible, interactive, sharable environments for science at scale. In Fatih Akici, David Lippa, Dillon Niederhut, & Pacer M (Eds.), Proceedings of the 17th Python in Science Conference (pp. 113–120). 10.25080/Majora-4af1f417-011
- Trinket. (2015-2024). Getting started with Python. https://docs.trinket.io/getting-started-with-python#/welcome/where-we-ll-go
- Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., & Jupyter development team. (2016). Jupyter Notebooks - a publishing format for reproducible computational workflows. In F. Loizides & B. Scmidt (Eds.), Positioning and Power in Academic Publishing: Players, Agents and Agendas (pp. 87–90). IOS Press. https://eprints.soton.ac.uk/403913/
- Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2
- The Pandas Development Team. (2020). pandas-dev/pandas: Pandas (latest). Zenodo. 10.5281/zenodo.3509134