Biomedical Data Repositories and Knowledgebases

About Biomedical Data Repositories and Knowledgebases

Accessible, well-maintained, and efficiently operated data resources are critical enablers of modern biomedical research. Data resources, through good data management practices, are the key to data and knowledge discovery, integration, and data reuse, as outlined by the FAIR Data Principles. To better support such a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While each activity is important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success and sustainability needs.

Funding Opportunities

NIH released two funding opportunities on Jan. 17 to support biomedical data repositories and knowledgebases:

NIH Institutes and Centers have issued related notices:

Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets. Both data repositories and knowledgebases contribute to the NIH data resource ecosystem.

Sustaining a healthy and productive data resource ecosystem means that each component:

  • delivers scientific impact to the communities that they serve.
  • employs and promotes good data management practices and efficient operation for quality and services.
  • engages with the user community and continuously address their needs.
  • supports a process for data life-cycle analysis, long-term preservation, and trustworthy governance. 

 

Data Repositories
  • Store, organize, validate, and make accessible the core data related to a particular system or systems
  • Example: core data might include genome, transcriptome, and protein sequences, imaging or spectroscopic data
  • Curation mostly focuses on QA/QC
Knowledgebases
  • Accumulate, organize, and link growing bodies of information related to core datasets
  • Example: information about expression patterns, splicing variants, localization, protein-protein interaction and pathway networks related to an organism or set of organisms; publication information
  • Traditionally require significant levels of human curation
about

Data Repositories

Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases on the other hand extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets.

Domain-Specific vs. Generalist

The landscape of biomedical data repositories is vast and evolving. Currently, NIH supports many repositories for sharing biomedical data. These data repositories focus on either data type (e.g., sequence data, protein structure, continuous physiological signals, etc.) or biomedical research discipline (cancer, immunology, or clinical research data associated with a specific institute or center), and often form a nexus of resources for their research communities. NIH encourages researchers to use domain-specific, open-access data sharing repositories – whether funded by NIH or other choices – whenever possible. When a domain-specific option is not available, researchers are encouraged to use an institutional repository.

Read more about the importance of data repositories for enhanced data access and sharing in a blog post by Dr. Susan Gregurick, Associate Director for Data Science and Director, Office of Data Science Strategy at NIH.

There are, however, instances in which researchers are unable to find a domain-specific or institutional repository applicable to their research project. In these cases, researchers are encouraged to use a generalist repository that accepts data regardless of data type or discipline may to share their data. While domain-specific repositories focus on careful and detailed data curation, generalist repositories tend to focus on robust findability and accessibility of the data. The goal in either case is to provide researchers with an appropriate solution for making their data FAIR.

To better understand the landscape of data repositories – specifically how generalist repositories can and should be used to share and reuse NIH-funded data – NIH completed a one-year pilot program with an existing generalist repository. Visit "Exploring a Generalist Repository for NIH-funded Data" to learn more.

repositories

News and Events

news

This page last reviewed on July 17, 2020