Biomedical Data Repositories and Knowledgebases

About Biomedical Data Repositories and Knowledgebases

Accessible, well-maintained, and efficiently operated data resources are critical enablers of modern biomedical research. Data resources, through good data management practices, are the key to data and knowledge discovery, integration, and data reuse, as outlined by the FAIR Data Principles. To better support such a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While each activity is important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success and sustainability needs.

Funding Opportunities

NIH released two funding opportunities on Jan. 17 to support biomedical data repositories and knowledgebases:

NIH Institutes and Centers have issued related notices:

Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets. Both data repositories and knowledgebases contribute to the NIH data resource ecosystem.

Sustaining a healthy and productive data resource ecosystem means that each component:

  • delivers scientific impact to the communities that they serve.
  • employs and promotes good data management practices and efficient operation for quality and services.
  • engages with the user community and continuously address their needs.
  • supports a process for data life-cycle analysis, long-term preservation, and trustworthy governance. 

 

Data Repositories
  • Store, organize, validate, and make accessible the core data related to a particular system or systems
  • Example: core data might include genome, transcriptome, and protein sequences, imaging or spectroscopic data
  • Curation mostly focuses on QA/QC
Knowledgebases
  • Accumulate, organize, and link growing bodies of information related to core datasets
  • Example: information about expression patterns, splicing variants, localization, protein-protein interaction and pathway networks related to an organism or set of organisms; publication information
  • Traditionally require significant levels of human curation
about

Data Repositories

Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases on the other hand extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets.

Domain-Specific vs. Generalist

The landscape of biomedical data repositories is vast and evolving. Currently, NIH supports many repositories for sharing biomedical data. These data repositories focus on either data type (e.g., sequence data, protein structure, continuous physiological signals, etc.) or biomedical research discipline (cancer, immunology, or clinical research data associated with a specific institute or center), and often form a nexus of resources for their research communities. These domain-specific, open-access data sharing repositories, whether funded by NIH or other sources, are a good first choice for researchers, and NIH encourages their use.

Read more about the importance of data repositories for enhanced data access and sharing in a blog post by Dr. Susan Gregurick, Associate Director for Data Science and Director, Office of Data Science Strategy at NIH.

There are, however, instances in which researchers are unable to find a domain-specific repository applicable to their research project. In these cases, a generalist repository that accepts data regardless of data type or discipline may be a good fit. While domain specific repositories focus on careful and detailed data curation, generalists’ repositories tend to focus on robust findability and accessibility of the data.

The goal in either case is to provide researchers with an appropriate solution for making their data FAIR.

Exploring a Generalist Repository with Figshare

NIH is testing a pilot program with an existing generalist repository, Figshare, to determine how biomedical researchers may use a generalist repository for sharing and reusing NIH-funded data and the NIH Figshare instance is currently available for use.

As part of the pilot project, the NIH Figshare instance offers some exclusive features:

  • Public open access to NIH-funded data on Figshare.
  • Support for larger datasets and data files of any type.
  • Detailed, NIH-specific metadata to improve discoverability of your research and direct links to NIH funding sources and publications.
  • User support from a Figshare team member with expertise in data curation and biomedical research, including complimentary review of data files and description to ensure highest quality and greatest discoverability.

This project will be evaluated in 2020 to determine whether NIH should add permanent generalist repositories to its supported repository landscape. Regardless of the outcome, researchers will still be able to access their published data via the main Figshare platform.

To learn more about the NIH Figshare instance, read the FAQs and view this flier. Specific information for researchers and librarians is also available, including a researcher-focused webinar and a librarian-focused webinar.

Three case studies are available:

For technical questions about NIH Figshare, email nihsupport@figshare.com.

For more information about the NIH Figshare pilot or to share your questions, ideas, or suggestions, email the Office of Data Science Strategy at datascience@nih.gov.

repositories

News and Events

news

This page last reviewed on March 24, 2020