Biomedical Data Repositories and Knowledgebases

About Biomedical Data Repositories and Knowledgebases

To better support a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While both are important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success, and sustainability needs.

Sustaining a healthy and productive data resource ecosystem means that each component:

  • Delivers scientific impact to the communities that they serve
  • Employs and promotes good data management practices and provides efficient operation for quality and services
  • Engages with the user community and continuously addresses their needs
  • Supports a process for data life-cycle analysis
  • Engrosses exploration of the current landscape of biomedical data repository metrics to to NIH in better understanding how datasets and repositories are used
  • Provides long-term preservation and trustworthy governance

Both data repositories and knowledgebases contribute to the NIH data resource ecosystem

Data Repositories

  • Biomedical data repositories accept the submission of relevant data from the research community to store, organize, validate, archive, preserve, and distribute data in compliance with the FAIR Data Principles.
  • Curation focuses on quality assurance and quality control.
  • Example: core data might include genome, transcriptome, and protein sequences or imaging or spectroscopic data

Knowledgebases

  • Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to, and relies on, core datasets.
  • Significant levels of human curation are traditionally required.
  • Example: information about expression patterns, splicing variants, localization, protein-protein interaction, and pathway networks related to an organism or set of organisms; publication information

View Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) Data Sharing Resources.

Metrics and Lifecycle

Data repositories and knowledgebases exist on a spectrum of ability and readiness to adopt the desirable characteristics aligned with FAIR and TRUST principles. Due to the critical nature of research data resources, repositories, and datasets, the development of metrics to evaluate the usage, utility, and impact of a given repository is essential. To that end, NIH conducted a survey and organized a workshop to better understand both existing and desired lifecycle metrics. The NIH then issued a report which presents the findings to better understand metrics currently used within the biomedical repository community, which can inform future NIH efforts to help develop this space and to understand patterns of use across datasets and repositories.

Open Funding Opportunities

  • (Open) Enhancement and Management of Established Biomedical Data Repositories and Knowledgebases (PAR-23-237) August 31, 2023
  • (Open) Early-stage Biomedical Data Repositories and Knowledgebases (PAR-23-236) August 31, 2023
  • FAQs for PAR-23-237 and PAR-23-236

Closed Funding Opportunities

  • (Closed) Support for existing data repositories to align with FAIR and TRUST principles and evaluate usage, utility, and impact (NOT-OD-23-044) FAQs January 5, 2023
  • (Closed) Support for existing data repositories to align with FAIR and TRUST principles and evaluate usage, utility, and impact (NOT-OD-22-069January 31, 2022
  • (Closed) Administrative Supplements Available to Strengthen NIH-Funded Biomedical Data Repositories (NOT-OD-21-089), April 6, 2021
  • (Closed) NIH released two funding opportunities to support biomedical data repositories and knowledgebases, January 17, 2020

Funded Awards

PAR-20-089 and PAR-20-097 Award Recipients
Grant NumberAward ICPrincipal InvestigatorProject Title
GM148372-01NIGMSNuno BandeiraGlobal proteomics mass spectrometry data sharing infrastructure
LM013115-04NLMAdam R. FergusonPan-Neurotrauma Data Commons
GM150793-01NIGMSJeffrey C HochBiological Magnetic Resonance Data Bank Base
NS132940-01NINDSJonathan RosandAn Imaging Repository for the Cerebrovascular Disease Knowledge Portal (iCDKP)
AA029959-01NIAAASamuel S. WuSouthern HIV and Alcohol Research Consortium Biomedical Data Repository
GM144308-01NIGMSAnita Elzbieta BandrowskiFrom RRID to Resource Watch: A Knowledgebase of Biomedical Research Resources
ES035386-01NIEHSDinesh BarupalExposome Correlation and Interpretation Database (ECID)
HG007822-09NHGRIAlex BatemanUniProt: A Protein Sequence and Function Resource for Biomedical Science
AI177622-01NIAIDLindsay G Cowelli-AKC: Integrated AIRR Knowledge Commons
GM144232-01NIGMSMichael K. GilsonBindingDB: An Open Knowledgebase of Protein-Small Molecule Interactions
CA275783-01NCIMalachi GriffithCreation of a knowledgebase of high quality assertions of the clinical actionability of somatic variants in cancer
GM142435-01NIGMSMarc S. HalfonREDfly: The regulatory sequence resource for Drosophila and other insects
HG012556-01NHGRICarol Marie HamiltonEstablishing the PhenX Toolkit as a Biomedical Knowledgebase
AI171008-01NIAIDYongqun He VIOLIN 2.0: Vaccine Information and Ontology LInked kNowledgebase
GM150703-01NIGMSPeter D KarpKnowledgebase of Escherichia coli Genome and Metabolism
HG010615-05NHGRITeri Ellen KleinPharmGKB
AI162625-03NIAIDElliot J. LefkowitzVirus Taxonomy: A Community Knowledgebase Supporting Virus Research
ES033155-01NIEHSCarolyn J. MattinglyComparative Toxicogenomics Database (CTD)
HG012750-01NHGRINicola MulderAfrican Genomics Data Hub Biomedical Knowledgebase
GM143402-02NIGMSMark A. MusenBioPortal: An Expansive Knowledgebase of Biomedical Entities and Relations
HG006370-11NHGRIHelen ParkinsonStrengthening community knowledge bases for genetic association studies and polygenic scores, the GWAS and PGS Catalogs
HG012557-01NHGRILynn Marie SchrimlThe Human Disease Ontology: An integrated, mechanistic knowledge resource for biomedical research.
HG012198-01NHGRILincoln D. SteinReactome: An Open Knowledgebase of Human Pathways.
HG002223-24NHGRIPaul W SternbergWormBase: a core data resource for C. elegans and other nematodes
HG012212-01NHGRIPaul D. ThomasGene Ontology Consortium and Knowledgebase
GM146616-01-02NIGMSMichael Tiemeyer GlyGen growth and evolution into a central resource for glycans and glycoconjugates
ES035214-01NIEHSAlexander Tropsha Supporting Biomedical Discovery with the ROBOKOP Graph Knowledgebase.
CA265879-01NCIJeremy Lyle WarnerEnhancing the HemOnc Knowledgebase of Chemotherapy Drugs and Regimens

Engage with the community by joining drkb@list.nih.gov listserv. Instructions on how to join can be found here.

This page last reviewed on April 11, 2024