NIH Workshop on the Role of Generalist and Institutional Repositories to Enhance Data Discoverability and Reuse

UPDATE: On July 16, 2020, the workshop co-chairs and participating generalist repositories published a generalist repository comparison chart.

The Office of Data Science Strategy at the National Institutes of Health (NIH) and the National Library of Medicine hosted a workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse on Feb. 11–12, 2020. The workshop was held at the Lister Hill Auditorium on the NIH main campus in Bethesda, MD, and a workshop summary is available.

The primary goals of the workshop were to:

  • Learn how generalist repositories see themselves in the larger biomedical data repository landscape.
  • Understand how institutional data repositories are creating suites of solutions for their researchers and how they see generalist repositories fitting into this landscape.
  • Consider desired characteristics of data repositories and how they relate to institutional expectations of data storage and preservation solutions.
  • Explore adoption of common infrastructure, standards, and federated search solutions to enable greater discoverability of NIH research data across federated data repositories.
  • Address the role of data curators in ensuring that data and metadata are sufficiently well curated to enhance discovery and enable reuse.

Recordings are available for each day of the workshop (Day 1 and Day 2). Available presentations are accessible by clicking the name of the presentation in the agenda below.


February 11, 2020

9:00 a.m. – 9:30 a.m.

Setting the Stage
Patricia Flatley Brennan, National Library of Medicine


Maryann Martone, University of California, San Diego, Workshop Co-Chair
Shelley Stall, American Geophysical Union, Workshop Co-Chair
9:30 a.m. – 10:15 a.m. Keynote Address
A Blueprint for the Research Data Landscape
Sayeed Choudhury, Johns Hopkins University
10:15 a.m. – 11:45 a.m. Session 1: Introducing the Generalist and Institutional Repository Landscape
This session will give a quick introduction to multiple generalist repositories to help set a common understanding of how they operate and so on. Each speaker will be given a chance to introduce their platform and describe certain characteristics.

Vivli: A Global Clinical Trial Data Sharing Platform
Ida Sim, Vivli

Mendeley Data: Enhancing Data Discovery, Sharing, and Reuse
Anita de Waard, Elsevier

Building Policy-Compliant Infrastructure for Research Data
Mark Hahnel, Figshare

Community-Minded Data Publishing at Dryad
Daniella Lowenberg, California Digital Library

Zenodo: Specialists Welcome!
Tim Smith, CERN

Dataverse: A Software, a Community, a Network of Repositories
Mercè Crosas, Harvard Dataverse

11:45 a.m. – 1:00 p.m. Lunch
1:00 p.m. – 2:00 p.m. Session 2: Enabling Data Discovery
Given such a complicated repository landscape with data sets potentially located in any one of thousands of existing repositories—often with little metadata—discovering data sets can be difficult. Although users may know of relevant subject-specific repositories, the discoverability challenge is compounded when data sets are located in generalist or institutional repositories where a user might not think to look. This session will explore techniques for enabling discovery of data in generalist and institutional repositories, including the development of a common metadata model for data, expert curation to enhance metadata, and linking of digital research objects through identifiers.

Dataset Metadata Model (DATMM): A Common Model to Drive Discovery and Adoption
Pete Seibert, National Library of Medicine

The Role of Institutional Repositories in Data Discovery
Lisa Johnston, University of Minnesota

PID Graphs: Muggle Scientists Develop Harry Potter “Marauder’s Map” Technology
Luc Boruta, Thunken

2:00 p.m. – 3:15 p.m. Session 3: Enabling Data Reuse
This session will consider several aspects of data reuse, including two different “levels” of reuse and implications for how much pre-work needs to be done to the data: (1) reusing to repeat findings in a publication with which the data are associated and (2) reusing the data to address new scientific questions. This use case also often requires that data be combined with data from other sources—sometimes of a similar type and sometimes of a different type.

What Researchers Need When Deciding Whether to Reuse Data: Experiences from Three Disciplines
Ixchel Faniel, OCLC, Inc.

Collaboration and Re-Use: Experiences with Institutional Data Catalogs
Nicole Contaxis, New York University

What Role Can Publishers Play in the Open Data Ecosystem?
Varsha Khodiyar, Springer Nature

3:15 p.m. – 3:30 p.m. Break
3:30 p.m. – 4:45 p.m. Breakout Groups: Identifying Common Practices in Discoverability and Reusability

Groups will be asked to address specific challenges such as:

  • What information needs to be included with datasets to determine if data is fit for use with future research in a similar or different area?
  • Are there solutions that can be implemented now to support federated queries to improve data discovery?
  • Are there current solutions to discover if any appropriate data repository already exists for a given dataset type?
  • What is the minimal set of functions that a generalist repository should support to participate in the biomedical ecosystem?
  • How might all data associated with specific NIH grant funding be discovered, accessed, and understood?
  • What licensing is recommended that encourages research data to be as open as possible but as protected as necessary?
  • How is it possible to link different datasets that have been preserved in different data repositories so they can be discovered?
4:45 p.m. – 5:00 p.m. Recap of Day 1 and Recess

February 12, 2020

8:30 a.m. – 9:30 a.m.

Report Back from Day 1 Breakouts on Data Discovery and Data Reuse

9:30 a.m. – 10:45 a.m. Session 4: Facilitating Reproducibility
This session will focus on how generalist and institutional repositories support reproducibility of the findings of particular experiments and publications as another major use case for effective sharing of data.

Librarian Role in Facilitating Reproducibility through Repositories
Melissa Rethlefsen, University of Florida

Reproducible and Rigorous Research on Open Science Framework (OSF)
Nici Pfeiffer, Center for Open Science

Perspectives on Reuse and Reproducibility from a Commercial Research Repository
Travis Richardson, Flywheel

10:45 a.m. – 11:00 a.m. Break
11:00 a.m. – 12:15 p.m. Session 5: Managing Technical and Cultural Change in Research
Data sharing in generalist and institutional repositories will become increasingly important as the NIH and other funders begin to require data sharing, but for some researchers, this is a significant change in how they work with their data. This session will address the challenges in changing how scientific resources are managed, supported, and used—considering personal and institutional incentives and how to align goals with such drivers of behavior and perspective.

Managing Technical and Cultural Change in Research
John Chodacki, California Digital Library

Top Down, Bottom Up, and Everything In Between: ORCID’s Multifaceted Approach to Technical and Cultural Change
Liz Krznarich, ORCID

Operationalization of Open Science at the Montreal Neurological Institute—Lessons Learned
Viviane Poupon, Tanenbaum Open Science Institute

Generalist Repositories: NSF Policy and Perspective
Beth Plale, National Science Foundation

12:00 p.m. – 12:30 p.m.

Closing Remarks and Adjournment
Susan Gregurick, Office of Data Science Strategy

Wrap-up Slides
Maryann Martone, University of California, San Diego, Workshop Co-Chair
Shelley Stall, American Geophysical Union, Workshop Co-Chair

