Biomedical Data Repositories and Knowledgebases
About Biomedical Data Repositories and Knowledgebases
Accessible, well-maintained, and efficiently operated data resources are critical enablers of modern biomedical research. Data resources, through good data management practices, are the key to data and knowledge discovery, integration, and data reuse, as outlined by the FAIR Data Principles. To better support such a modern data resource ecosystem, NIH makes a distinction between data repositories and knowledgebases. While each activity is important for advancing biomedical research, data repositories and knowledgebases can have unique functions, metrics for success and sustainability needs.
NIH released two funding opportunities on Jan. 17 to support biomedical data repositories and knowledgebases:
NIH Institutes and Centers have issued related notices:
Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets. Both data repositories and knowledgebases contribute to the NIH data resource ecosystem.
Sustaining a healthy and productive data resource ecosystem means that each component:
- delivers scientific impact to the communities that they serve.
- employs and promotes good data management practices and efficient operation for quality and services.
- engages with the user community and continuously address their needs.
- supports a process for data life-cycle analysis, long-term preservation, and trustworthy governance.
- Store, organize, validate, and make accessible the core data related to a particular system or systems
- Example: core data might include genome, transcriptome, and protein sequences, imaging or spectroscopic data
- Curation mostly focuses on QA/QC
- Accumulate, organize, and link growing bodies of information related to core datasets
- Example: information about expression patterns, splicing variants, localization, protein-protein interaction and pathway networks related to an organism or set of organisms; publication information
- Traditionally require significant levels of human curation
Biomedical data repositories accept submission of relevant data from the community to store, organize, validate, archive, preserve and distribute the data, in compliance with the FAIR Data Principles. Biomedical knowledgebases on the other hand extract, accumulate, organize, annotate, and link the growing body of information that is related to and relies on core datasets.
Domain-Specific vs. Generalist
The landscape of biomedical data repositories is vast and evolving. Currently, NIH supports many repositories for sharing biomedical data. These data repositories focus on either data type (e.g., sequence data, protein structure, continuous physiological signals, etc.) or biomedical research discipline (cancer, immunology, or clinical research data associated with a specific institute or center), and often form a nexus of resources for their research communities. These domain-specific, open-access data sharing repositories, whether funded by NIH or other sources, are a good first choice for researchers, and NIH encourages their use.
Read more about the importance of data repositories for enhanced data access and sharing in a blog post by Dr. Susan Gregurick, Associate Director for Data Science and Director, Office of Data Science Strategy at NIH.
There are, however, instances in which researchers are unable to find a domain-specific repository applicable to their research project. In these cases, a generalist repository that accepts data regardless of data type or discipline may be a good fit. While domain specific repositories focus on careful and detailed data curation, generalists’ repositories tend to focus on robust findability and accessibility of the data.
The goal in either case is to provide researchers with an appropriate solution for making their data FAIR.
Exploring a Generalist Repository with Figshare
NIH is testing a pilot program with an existing generalist repository, Figshare, to determine how biomedical researchers may use a generalist repository for sharing and reusing NIH-funded data and the NIH Figshare instance is currently available for use.
As part of the pilot project, the NIH Figshare instance offers some exclusive features:
- Public open access to NIH-funded data on Figshare.
- Support for larger datasets and data files of any type.
- Detailed, NIH-specific metadata to improve discoverability of your research and direct links to NIH funding sources and publications.
- User support from a Figshare team member with expertise in data curation and biomedical research, including complimentary review of data files and description to ensure highest quality and greatest discoverability.
This project will be evaluated in 2020 to determine whether NIH should add permanent generalist repositories to its supported repository landscape. Regardless of the outcome, researchers will still be able to access their published data via the main Figshare platform.
To learn more about the NIH Figshare instance, read the FAQs and view this flier. Specific information for researchers and librarians is also available, including a researcher-focused webinar and a librarian-focused webinar.
Four case studies are available:
- Storing and sharing x-ray scattering data on the NIH Figshare instance
For technical questions about NIH Figshare, email firstname.lastname@example.org.
For more information about the NIH Figshare pilot or to share your questions, ideas, or suggestions, email the Office of Data Science Strategy at email@example.com.
News and Events
- June 4, 2020: Figshare to host webinar titled "Publishing research in the NIH Figshare Instance: A data repository resource for NIH-funded researchers"
- May 12, 2020: Figshare hosts webinar titled "Publishing research in the NIH Figshare Instance: A data repository resource for NIH-funded researchers"
- April 24, 2020: NIH hosts webinar on "Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories"
- March 24, 2020: Figshare hosts a webinar titled "Publishing Your Datasets in the NIH Figshare Instance: An Introduction for Biomedical Researchers"
- March 6, 2020: Figshare hosts a webinar for NIH intramural researchers
- Feb. 19, 2020: NIH Hosts Virtual Workshop on Data Metrics
- Feb. 11-12, 2020: NIH Hosts Workshop on Role of Generalist Repositories to Enhance Data Discoverability and Reuse
- Jan. 17, 2020: NIH announces two new funding opportunities related to biomedical data repositories (PAR-20-089) and knowledgebases (PAR-20-097)
- Jan. 17, 2020: The Office of Science Technology and Policy issues a request for public comment on draft desirable characteristics of repositories for managing and sharing data resulting from federally funded research
- Jan. 15, 2020: Figshare hosts a webinar for librarians to learn about the NIH Figshare instance
- Nov. 20, 2019: Figshare hosts a webinar for NIH-funded researchers to learn about the NIH Figshare instance
- July 23, 2019: NIH-funded Researchers Invited to Use NIH Figshare
- April 8-9, 2019: NIH Hosts TRUST-worthy Data Repositories Workshop
This page last reviewed on May 22, 2020