Biomedical Data Repositories and Knowledgebases

About Biomedical Data Repositories and Knowledgebases

To better support a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While both are important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success, and sustainability needs.

Sustaining a healthy and productive data resource ecosystem means that each component:

  • Delivers scientific impact to the communities that they serve
  • Employs and promotes good data management practices and provides efficient operation for quality and services
  • Engages with the user community and continuously addresses their needs
  • Supports a process for data life-cycle analysis
  • Engrosses exploration of the current landscape of biomedical data repository metrics to to NIH in better understanding how datasets and repositories are used
  • Provides long-term preservation and trustworthy governance

Both data repositories and knowledgebases contribute to the NIH data resource ecosystem

Data Repositories

  • Biomedical data repositories accept the submission of relevant data from the research community to store, organize, validate, archive, preserve, and distribute data in compliance with the FAIR Data Principles.
  • Curation focuses on quality assurance and quality control.
  • Example: core data might include genome, transcriptome, and protein sequences or imaging or spectroscopic data


  • Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to, and relies on, core datasets.
  • Significant levels of human curation are traditionally required.
  • Example: information about expression patterns, splicing variants, localization, protein-protein interaction, and pathway networks related to an organism or set of organisms; publication information

View Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) Data Sharing Resources.

Metrics and Lifecycle

Data repositories and knowledgebases exist on a spectrum of ability and readiness to adopt the desirable characteristics aligned with FAIR and TRUST principles. Due to the critical nature of research data resources, repositories, and datasets, the development of metrics to evaluate the usage, utility, and impact of a given repository is essential. To that end, NIH conducted a survey and organized a workshop to better understand both existing and desired lifecycle metrics. The NIH then issued a report which presents the findings to better understand metrics currently used within the biomedical repository community, which can inform future NIH efforts to help develop this space and to understand patterns of use across datasets and repositories.

Funding Opportunities

  • (Open) Enhancement and Management of Established Biomedical Data Repositories and Knowledgebases (PAR-23-237)
  • (Open) Early-stage Biomedical Data Repositories and Knowledgebases (PAR-23-236)
  • (Closed) Biomedical Data Repository (PAR-20-089) January 17, 2020
  • (Closed) Biomedical Knowledgebase (PAR-20-097) January 17, 2020
  • FAQs for PAR-20-089 and PAR-20-097
  • (Closed) Support for existing data repositories to align with FAIR and TRUST principles and evaluate usage, utility, and impact (NOT-OD-23-044) FAQs January 5, 2023
  • (Closed) Support for existing data repositories to align with FAIR and TRUST principles and evaluate usage, utility, and impact (NOT-OD-21-089April 6, 2021
  • (Closed) Support for existing data repositories to align with FAIR and TRUST principles and evaluate usage, utility, and impact (NOT-OD-22-069January 31, 2022

Funded Awards


Recent News

This page last reviewed on September 1, 2023