NIH Workshop on the Role of Generalist and Institutional Repositories to Enhance Data Discoverability and Reuse

UPDATE: On July 16, 2020, the workshop co-chairs and participating generalist repositories published a generalist repository comparison chart.

The Office of Data Science Strategy at the National Institutes of Health (NIH) and the National Library of Medicine hosted a workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse on Feb. 11–12, 2020. The workshop was held at the Lister Hill Auditorium on the NIH main campus in Bethesda, MD, and a workshop summary is available.

The primary goals of the workshop were for participants to:

Learn how generalist repositories see themselves in the larger biomedical data repository landscape.
Understand how institutional data repositories are creating suites of solutions for their researchers and how they see generalist repositories fitting into this landscape.
Consider desired characteristics of data repositories and how they relate to institutional expectations of data storage and preservation solutions.
Explore adoption of common infrastructure, standards, and federated search solutions to enable greater discoverability of NIH research data across federated data repositories.
Address the role of data curators in ensuring that data and metadata are sufficiently well curated to enhance discovery and enable reuse.

Recordings are available for each day of the workshop (Day 1 and Day 2). Available presentations are accessible by clicking the name of the presentation in the agenda below.

Agenda
Day 1
Day 2
Setting the Stage Patricia Flatley Brennan, National Library of Medicine Introduction Maryann Martone, University of California, San Diego, Workshop Co-Chair Shelley Stall, American Geophysical Union, Workshop Co-Chair
Keynote Address A Blueprint for the Research Data Landscape Sayeed Choudhury, Johns Hopkins University
Session 1: Introducing the Generalist and Institutional Repository Landscape
This session provided a quick introduction to multiple generalist repositories to help set a common understanding of how they operate and so on. Each speaker introduced their platform and described certain characteristics.
Vivli: A Global Clinical Trial Data Sharing Platform Ida Sim, Vivli Mendeley Data: Enhancing Data Discovery, Sharing, and Reuse Anita de Waard, Elsevier Building Policy-Compliant Infrastructure for Research Data Mark Hahnel, Figshare Community-Minded Data Publishing at Dryad Daniella Lowenberg, California Digital Library Zenodo: Specialists Welcome! Tim Smith, CERN Dataverse: A Software, a Community, a Network of Repositories Mercè Crosas, Harvard Dataverse
Session 2: Enabling Data Discovery
Given such a complicated repository landscape with data sets potentially located in any one of thousands of existing repositories—often with little metadata—discovering data sets can be difficult. Although users may know of relevant subject-specific repositories, the discoverability challenge is compounded when data sets are located in generalist or institutional repositories where a user might not think to look. This session explored techniques for enabling discovery of data in generalist and institutional repositories, including the development of a common metadata model for data, expert curation to enhance metadata, and linking of digital research objects through identifiers.
Dataset Metadata Model (DATMM): A Common Model to Drive Discovery and Adoption Pete Seibert, National Library of Medicine The Role of Institutional Repositories in Data Discovery Lisa Johnston, University of Minnesota PID Graphs: Muggle Scientists Develop Harry Potter “Marauder’s Map” Technology Luc Boruta, Thunken
Session 3: Enabling Data Reuse
This session considered several aspects of data reuse, including two different “levels” of reuse and implications for how much pre-work needs to be done to the data: (1) reusing to repeat findings in a publication with which the data are associated and (2) reusing the data to address new scientific questions. This use case also often requires that data be combined with data from other sources—sometimes of a similar type and sometimes of a different type.
What Researchers Need When Deciding Whether to Reuse Data: Experiences from Three Disciplines Ixchel Faniel, OCLC, Inc. Collaboration and Re-Use: Experiences with Institutional Data Catalogs Nicole Contaxis, New York University What Role Can Publishers Play in the Open Data Ecosystem? Varsha Khodiyar, Springer Nature
Breakout Groups: Identifying Common Practices in Discoverability and Reusability
Groups were asked to address specific challenges such as: What information needs to be included with datasets to determine if data is fit for use with future research in a similar or different area? Are there solutions that can be implemented now to support federated queries to improve data discovery? Are there current solutions to discover if any appropriate data repository already exists for a given dataset type? What is the minimal set of functions that a generalist repository should support to participate in the biomedical ecosystem? How might all data associated with specific NIH grant funding be discovered, accessed, and understood? What licensing is recommended that encourages research data to be as open as possible but as protected as necessary? How is it possible to link different datasets that have been preserved in different data repositories so they can be discovered?
Recap of Day 1 and Recess

Report Back from Day 1 Breakouts on Data Discovery and Data Reuse
Session 4: Facilitating Reproducibility
This session focused on how generalist and institutional repositories support reproducibility of the findings of particular experiments and publications as another major use case for effective sharing of data.
Librarian Role in Facilitating Reproducibility through Repositories Melissa Rethlefsen, University of Florida Reproducible and Rigorous Research on Open Science Framework (OSF) Nici Pfeiffer, Center for Open Science Perspectives on Reuse and Reproducibility from a Commercial Research Repository Travis Richardson, Flywheel
Session 5: Managing Technical and Cultural Change in Research
Data sharing in generalist and institutional repositories will become increasingly important as the NIH and other funders begin to require data sharing, but for some researchers, this is a significant change in how they work with their data. This session addressed the challenges in changing how scientific resources are managed, supported, and used—considering personal and institutional incentives and how to align goals with such drivers of behavior and perspective.
Managing Technical and Cultural Change in Research John Chodacki, California Digital Library Top Down, Bottom Up, and Everything In Between: ORCID’s Multifaceted Approach to Technical and Cultural Change Liz Krznarich, ORCID Operationalization of Open Science at the Montreal Neurological Institute—Lessons Learned Viviane Poupon, Tanenbaum Open Science Institute Generalist Repositories: NSF Policy and Perspective Beth Plale, National Science Foundation
Closing Remarks and Adjournment Susan Gregurick, Office of Data Science Strategy Wrap-up Slides Maryann Martone, University of California, San Diego, Workshop Co-Chair Shelley Stall, American Geophysical Union, Workshop Co-Chair