Introducing the 2025-2030 NIH Strategic Plan for Data Science: What Researchers Need to Know

Wednesday, June 4, 2025

By: Dr. Susan Gregurick, Associate Director of Data Science, NIH

Exciting news from NIH! The final 2025-2030 Strategic Plan for Data Science has just been released, charting the course for how biomedical data will transform health research over the next five years. As former NIH Director Dr. Monica Bertagnolli notes in her opening letter, "The NIH Biomedical Data Ecosystem will bring increasingly effective data and tools that enable the broadest research community possible to contribute to our mission to bring better health to all people."

However, discoveries and innovations in health would be impossible without the many dedicated researchers and scientists who collaborate and partner with or work for NIH. If that is you, thank you for your contributions to our mission! Let's dive into the five key goals that will shape your research data landscape:

Goal 1: Improve Data Management and Sharing Capabilities

Remember the NIH Data Management and Sharing Policy that went into effect last year? NIH is doubling down on supporting and effectively implementing the policy. This goal focuses on three critical objectives: Supporting the biomedical community in managing, sharing, and sustaining data; enhancing FAIR (Findable, Accessible, Interoperable, and Reusable) data principles and harmonization; and strengthening the NIH data repository ecosystem.

Expect new tools for preparing and annotating data, improved metadata quality standards, and a data steward program to guide sharing practices. NIH will also work with Tribal communities to develop appropriate data governance frameworks that respect Indigenous data sovereignty through CARE (Collective benefit, Authority to control, Responsibility, and Ethics) principles. For researchers working with sensitive data, streamlined processes for controlled data access are under development.

Goal 2: Enhance Human-Derived Data for Research

Clinical and real-world data offer incredible opportunities but are notoriously tricky to work with. This goal tackles improving access to clinical data sources, adopting health IT standards like Fast Healthcare Interoperability Resources (FHIR®) and the Trusted Exchange Framework and Common Agreement (TEFCA™), enhancing environmental and lifestyle data integration (the "exposome"), and providing cross-disciplinary training.

You will see new suggested methods for collecting informed consent when combining data from multiple sources, home health care device data standards, and federated frameworks allowing sensitive data use in clinical research. The plan specifically mentions developing governance frameworks for data linkages and real-world pilots integrating environmental factors with clinical common data elements (CDEs)—particularly valuable for understanding certain determinants of health.

Goal 3: Advance Software, Computational Methods, and Artificial Intelligence

Biomedical research generates massive amounts of data, and NIH wants to ensure you have cutting-edge tools to analyze it all. This goal balances investments across software development, computational methods, and AI applications. You'll see enhanced support for community-developed software tools with better visualization capabilities and established sustainability metrics following FAIR principles. You’ll also see callouts to programs like NCI's Information Technology for Cancer Research (ITCR), which have funded support for tools across their entire lifecycle—helping ensure that the software you rely on doesn't disappear when grant funding ends!

Beyond traditional analysis approaches, the plan explores exciting computational frontiers like digital twins modeling, privacy-preserving computing, and integrating theory-based modeling with data-driven insights. For those interested in AI applications, the AIM-AHEAD program will continue building nationwide networks to democratize computational capabilities across institutions nationwide. Whether you're a computational expert or just beginning to incorporate advanced analytical methods into your research, NIH is working to provide accessible and sustainable tools that meet the growing complexity of biomedical data challenges.

Goal 4: Support a Federated Biomedical Research Data Infrastructure

Are you tired of data silos? NIH is working toward a federated data ecosystem where researchers can more easily connect disparate datasets across platforms like NHLBI's BioData Catalyst®, NCI's Cancer Research Data Commons (CRDC), the All of Us program, and the NIH database of Genotypes and Phenotypes (dbGaP) through the NIH Cloud Platform Interoperability (NCPI) program. This approach maintains institutional control of data while standardizing access processes and interfaces.

The implementation will focus on creating a robust connected data resource ecosystem with improved interoperability, developing new search and discovery capabilities through enhanced metadata standards, and exploring new computing paradigms. The Researcher Auth Service (RAS) initiative will expand single sign-on capabilities across NIH data resources, streamlining your access to data while maintaining privacy and security standards.

Goal 5: Strengthen the Data Science Community

Data science skills are increasingly essential in all areas of biomedical research. This goal addresses expanding data science expertise at every level—from pre-college students to established investigators. It includes increasing training opportunities, expanding the data science workforce, enhancing collaboration within NIH's Intramural Research Program, and building capacity for every researcher who works with or for NIH.

Look for expanded cross-disciplinary training programs, new mentorship initiatives, and greater integration of data science into existing research training. The successful DATA Scholars program will continue growing NIH's internal data science capacity, while partnerships with programs like the Native American Research Centers for Health (NARCH) will help democratize data science expertise across institutions nationwide.

In conclusion…

This strategic plan builds on significant progress made since the first Data Science Strategic Plan, with a renewed focus on partnership, capacity-building, and responsible innovation. As the research landscape evolves with unprecedented speed, NIH is working to ensure these powerful data tools and technologies benefit all Americans through more comprehensive scientific discoveries.

Want to learn more? Check out the full plan at the NIH Office of Data Science Strategy website!

June Data Sharing and Reuse Seminar

Friday, June 13, 2025

Fuhai Li, Ph.D., will present "Transformative AI for Deep Mining of Omics and Literature Data" from 12:00 p.m.–1:00 p.m. EDT.

About the Seminar

Transformative AI models are powerful tools for large-scale mining of biomedical data.  In this talk, I will present novel approaches that we have developed to combine large language models (LLMs) with graph-based AI to integrate and analyze vast omics datasets for identifying disease targets, mapping signaling pathways, and predicting effective drug combinations. The key component of this novel AI system is the text-numeric graph (TNG), a structure in which graph entities and associations carry both textual and numeric attributes. I will also introduce an AI multi-agent system that we have developed to accelerate biomedical discovery by unifying omics data analysis, literature-based deep search, and reasoning to generate novel scientific hypotheses. I will then showcase the applications of these novel AI tools with analysis of heterogeneous pharmacogenomics data for cancer research.

About the Speaker

Fuhai Li, Ph.D. Associate Professor, School of Medicine and Computer Science & Engineering, Washington University

Dr. Fuhai is an Associate Professor in the Institute for Informatics (I2), School of Medicine and Computer Science & Engineering (CSE), Washington University in St. Louis (WashU).  His research interests lie in developing large-scale and complex graph- and AI-based approaches to integrating and mining massive diverse and heterogeneous multi-modality data for identifying biomarkers, uncovering signaling mechanisms of diseases and discovering novel synergistic drugs and combinatory therapies. Before joining WashU in 2018, Dr. Li was an assistant professor at Ohio State University Department of Biomedical Informatics. He received his Ph.D. in applied mathematics in Beijing University and completed his postdoc training at Harvard Medical School in computational biology.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight examples of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.