
Biomedical Data Repositories and Knowledgebases
About Biomedical Data Repositories and Knowledgebases
Accessible, well-maintained, and efficiently operated data resources are critical enablers of modern biomedical research. Data resources, through good data management practices, are the key to data and knowledge discovery, integration, and data reuse, as outlined by the FAIR Data Principles. To better support such a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While each activity is important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success and sustainability needs.
Key Elements to Consider in Preparing a Data Sharing Plan Under NIH Extramural Support
Research results developed with NIH funding should be broadly available to the research community for furthering research. Learn more.
RELATED: Read the Final NIH Policy for Data Management and Sharing (NOT-OD-21-013), released on Oct. 29, 2020, and the supplemental materials:
- Elements of an NIH Data Management and Sharing Plan (NOT-OD-21-014)
- Allowable Costs for Data Management and Sharing (NOT-OD-21-015)
- Selecting a Repository for Data Resulting from NIH-Supported Research (NOT-OD-21-016)
Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets. Both data repositories and knowledgebases contribute to the NIH data resource ecosystem.
Sustaining a healthy and productive data resource ecosystem means that each component:
- delivers scientific impact to the communities that they serve.
- employs and promotes good data management practices and efficient operation for quality and services.
- engages with the user community and continuously address their needs.
- supports a process for data life-cycle analysis, long-term preservation, and trustworthy governance.
- Store, organize, validate, and make accessible the core data related to a particular system or systems
- Example: core data might include genome, transcriptome, and protein sequences, imaging or spectroscopic data
- Curation mostly focuses on QA/QC
- Accumulate, organize, and link growing bodies of information related to core datasets
- Example: information about expression patterns, splicing variants, localization, protein-protein interaction and pathway networks related to an organism or set of organisms; publication information
- Traditionally require significant levels of human curation
Data Repositories
Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases on the other hand extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets.
Domain-Specific vs. Generalist
The landscape of biomedical data repositories is vast and evolving. Currently, NIH supports many repositories for sharing biomedical data. These data repositories focus on either data type (e.g., sequence data, protein structure, continuous physiological signals, etc.) or biomedical research discipline (cancer, immunology, or clinical research data associated with a specific institute or center), and often form a nexus of resources for their research communities. NIH encourages researchers to use domain-specific, open-access data sharing repositories – whether funded by NIH or other choices – whenever possible. When a domain-specific option is not available, researchers are encouraged to use an institutional repository.
Read more about the importance of data repositories for enhanced data access and sharing in a blog post by Dr. Susan Gregurick, Associate Director for Data Science and Director, Office of Data Science Strategy at NIH. Read the follow-up blog post discussing insights on the roles and uses of generalist repositories.
There are, however, instances in which researchers are unable to find a domain-specific or institutional repository applicable to their research project. In these cases, researchers are encouraged to use a generalist repository that accepts data regardless of data type or discipline may to share their data. While domain-specific repositories focus on careful and detailed data curation, generalist repositories tend to focus on robust findability and accessibility of the data. The goal in either case is to provide researchers with an appropriate solution for making their data FAIR.
To better understand the landscape of data repositories – specifically how generalist repositories can and should be used to share and reuse NIH-funded data – NIH completed a one-year pilot program from July 2019-July 2020 with an existing generalist repository. Visit "Exploring a Generalist Repository for NIH-funded Data" to learn more.
Funding Opportunities
NIH released two funding opportunities on Jan. 17, 2020, to support biomedical data repositories and knowledgebases:
- Biomedical Data Repository (PAR-20-089)
- Biomedical Knowledgebase (PAR-20-097)
NIH Institutes and Centers have issued related notices:
- NIGMS (NOT-GM-20-014)
- NHGRI (NOT-HG-20-017)
News and Events
- July 28, 2020: Associate Director for Data Science Susan Gregurick shares "Some Insights on the Roles and Uses of Generalist Repositories" in a blog post
- June 25, 2020: NIH Figshare Instance Pilot to Conclude July 15
- June 4, 2020: Figshare hosts webinar titled "Publishing research in the NIH Figshare Instance: A data repository resource for NIH-funded researchers"
- May 12, 2020: Figshare hosts webinar titled "Publishing research in the NIH Figshare Instance: A data repository resource for NIH-funded researchers"
- April 24, 2020: NIH hosts webinar on "Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories"
- March 24, 2020: Figshare hosts a webinar titled "Publishing Your Datasets in the NIH Figshare Instance: An Introduction for Biomedical Researchers"
- March 6, 2020: Figshare hosts a webinar for NIH intramural researchers
- Feb. 19, 2020: NIH Hosts Virtual Workshop on Data Metrics
- Feb. 11-12, 2020: NIH Hosts Workshop on Role of Generalist Repositories to Enhance Data Discoverability and Reuse
- Jan. 17, 2020: NIH announces two new funding opportunities related to biomedical data repositories (PAR-20-089) and knowledgebases (PAR-20-097)
- Jan. 17, 2020: The Office of Science Technology and Policy issues a request for public comment on draft desirable characteristics of repositories for managing and sharing data resulting from federally funded research
- Jan. 15, 2020: Figshare hosts a webinar for librarians to learn about the NIH Figshare instance
- Nov. 20, 2019: Figshare hosts a webinar for NIH-funded researchers to learn about the NIH Figshare instance
- Sept. 18, 2019: Associate Director for Data Science Susan Gregurick shares thoughts on "Enhancing Data Sharing, One Dataset at a Time" in blog post
- July 23, 2019: NIH-funded Researchers Invited to Use NIH Figshare
- April 8-9, 2019: NIH Hosts TRUST-worthy Data Repositories Workshop
This page last reviewed on March 2, 2021