Biomedical Data Repositories and Knowledgebases
About Biomedical Data Repositories and Knowledgebases
Accessible, well-maintained, and efficiently operated data resources are critical enablers of modern biomedical research. Data resources, through good data management practices, are the key to data and knowledge discovery, integration, and data reuse, as outlined by the FAIR Data Principles. To better support such a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While each activity is important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success and sustainability needs.
Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets. Both data repositories and knowledgebases contribute to the NIH data resource ecosystem.
Sustaining a healthy and productive data resource ecosystem means that each component:
- delivers scientific impact to the communities that they serve.
- employs and promotes good data management practices and efficient operation for quality and services.
- engages with the user community and continuously address their needs.
- supports a process for data life-cycle analysis, long-term preservation, and trustworthy governance.
- Store, organize, validate, and make accessible the core data related to a particular system or systems
- Example: core data might include genome, transcriptome, and protein sequences, imaging or spectroscopic data
- Curation mostly focuses on QA/QC
- Accumulate, organize, and link growing bodies of information related to core datasets
- Example: information about expression patterns, splicing variants, localization, protein-protein interaction and pathway networks related to an organism or set of organisms; publication information
- Traditionally require significant levels of human curation
Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases on the other hand extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets.
Related: The White House Office of Science and Technology Policy is seeking public comments on a draft set of desirable characteristics of data repositories used to locate, manage, share, and use data resulting from federally funded research. Learn more and comment by March 6.
Domain-Specific vs. Generalist
The landscape of biomedical data repositories is vast and evolving. Currently, NIH supports many repositories for sharing biomedical data. These data repositories focus on either data type (e.g., sequence data, protein structure, continuous physiological signals, etc.) or biomedical research discipline (cancer, immunology, or clinical research data associated with a specific institute or center), and often form a nexus of resources for their research communities. These domain-specific, open-access data sharing repositories, whether funded by NIH or other sources, are a good first choice for researchers, and NIH encourages their use.
Read more about the importance of data repositories for enhanced data access and sharing in a blog post by Dr. Susan Gregurick, Associate Director for Data Science and Director, Office of Data Science Strategy at NIH.
There are, however, instances in which researchers are unable to find a domain-specific repository applicable to their research project. In these cases, a generalist repository that accepts data regardless of data type or discipline may be a good fit. While domain specific repositories focus on careful and detailed data curation, generalists’ repositories tend to focus on robust findability and accessibility of the data.
The goal in either case is to provide researchers with an appropriate solution for making their data FAIR.
Exploring a Generalist Repository with Figshare
NIH is testing a pilot program with an existing generalist repository, Figshare, to determine how biomedical researchers may use a generalist repository for sharing and reusing NIH-funded data and the NIH Figshare instance is currently available for use.
As part of the pilot project, the NIH Figshare instance offers some exclusive features:
- Support for larger datasets.
- Updated metadata to improve discoverability of research.
- Courtesy data validity checks and review of dataset title and associated text by the Figshare team.
- A view of NIH-funded data on Figshare.
This project will be evaluated in 2020 to determine whether NIH should add permanent generalist repositories to its supported repository landscape. Regardless of the outcome, researchers will still be able to access their published data via the main Figshare platform.
For technical questions about NIH Figshare, email email@example.com.
For more information about the NIH Figshare pilot or to share your questions, ideas, or suggestions, email the Office of Data Science Strategy at firstname.lastname@example.org.
News and Events
- UPCOMING March 6, 2020: Figshare to host a webinar for NIH intramural researchers
- UPCOMING Feb. 19, 2020: NIH to Host Virtual Workshop on Data Metrics
- Feb. 11-12, 2020: NIH to Host Workshop on Role of Generalist Repositories to Enhance Data Discoverability and Reuse
- Jan. 17, 2020: NIH announces two new funding opportunities related to biomedical data repositories (PAR-20-089) and knowledgebases (PAR-20-097)
- Jan. 17, 2020: The Office of Science Technology and Policy issues a request for public comment on draft desirable characteristics of repositories for managing and sharing data resulting from federally funded research
- Jan. 15, 2020: Figshare hosts a webinar for librarians to learn about the NIH Figshare instance
- Nov. 20, 2019: Figshare hosts a webinar for NIH-funded researchers to learn about the NIH Figshare instance
- July 23, 2019: NIH-funded Researchers Invited to Use NIH Figshare
This page last reviewed on February 14, 2020