Today I am excited to announce the next phase of NIH investment in Data Science!
As you know, NIH is committed to supporting data science to enable biomedical researchers to capitalize on the transformative power of big data. To date, much of our efforts have been focused on developing tools to improve researchers’ ability to mine increasingly complex data sets, workforce training to ensure a pipeline of skilled data scientists, and fostering open science to increase the utility of data through sharing. These efforts have been multi-pronged—data science issues that are common across all of NIH’s 27 Institutes and Centers (ICs) have been managed through a trans-NIH effort within the NIH Office of the Director, overseen by Phil Bourne, the former Associate Director for Data Science (ADDS). With Phil’s impending departure to a leadership position at the University of Virginia, I am pleased to be stepping into this role as the interim ADDS (iADDS). In addition, the NIH ICs have continued to address data science issues that are unique to their research priorities.
The amount and diversity of data generated by NIH-funded research programs continues to grow rapidly; safe, scalable storage solutions, new analytic approaches, and an adaptable workforce are urgently needed. As this unprecedented revolution in biomedical information unfolds and NIH looks to the future of data science, some pitfalls remain! We must ensure that researchers have the ability to make meaningful use of this increasingly massive biomedical data resource. It is timely and critical for NIH to identify and implement new strategies to improve data discoverability, utility, and sustainability, including moving many large data sets into the cloud and making them adherent to the FAIR Principles—Findable, Accessible, Interoperable, and Reusable. Success in meeting this challenge will require leveraging the findings from the Big Data to Knowledge (BD2K) program, and a major infusion of resources.
Over the next several months, I will be working with NIH’s 27 ICs to develop efficient strategies to improve data discoverability, utility, and sustainability for the biomedical research community. I will work with the Division of Program Coordination, Planning, and Strategic Initiatives to engage the various pilot projects that are identifying critical points of success for future data science efforts. As part of this effort, the second phase of NIH’s cornerstone data science initiative, the Big Data to Knowledge (BD2K) program, will include investments to accelerate progress in the development of these new strategies through a pilot program for an NIH data Commons. The first phase of BD2K, which launched in FY2014, invested $200 million in grant awards to address some major data science challenges and stimulate data-driven discovery. These awards will continue through award end dates, and lessons from this initial investment will help inform the Commons pilot as well as data science efforts supported across the NIH from FY2018-FY2021. Previously issued BD2K funding opportunities with application due dates in August 2017 and beyond will be re-scoped to include funding opportunities that will be issued to support the implementation of the new strategies.
At the same time, the NLM Board of Regents is devising its strategic plan to address our future, including how to become the intellectual and programmatic hub for data science at the NIH, per the recommendations from the Advisory Committee to the (NIH) Director. NLM’s efforts will complement the trans-NIH effort to develop a sustainable data science infrastructure for biomedical research and implement the FAIR Principles. The NLM will also lead efforts to support data sharing and open science, and foster the development of the next generation of professionals in data science.
The NIH ICs will continue to pursue data science programs to support their research priorities. For example, the National Cancer Institute (NCI) has stood up a series of genomics cloud pilots designed to create a cost-effective way to provide computational support to the cancer research community and democratize access to NCI-generated genomic and related data. The National Institute of General Medical Sciences is supporting a series of initiatives for early stage development of technologies in biomedical computing, informatics, and big data science. Similar efforts at the National Human Genome Research Institute are addressing the challenges posed by the generation and analyses of the prodigious amounts of genome sequence data now being routinely generated. The National Institute of Neurological Disorders and Stroke and National Institute of Mental Health-led BRAIN Initiative is partnering with the National Science Foundation, Defense Advanced Research Projects Agency (DARPA), and U.S. Food and Drug Administration to accelerate the development and application of innovative, data-driven technologies to produce a revolutionary new dynamic picture of the brain. The National Institute of Allergy and Infectious Diseases has made a significant investment in data science programs and has successfully partnered with the scientific community to provide critical data sets, advanced computational tools and workspaces for scientists studying infectious and immune-mediated diseases around the world. Examples of these specific programs include the ImmPort, an immunology database; Bioinformatic Resources Centers for Infectious Diseases-diverse omics data sets; the Immune Epitope Database; and the Immune Tolerance Network’s TrialShare-repository for enabling public access to de-identified patient level clinical trial data. The All of Us Research Program, part of the Precision Medicine Initiative (PMI), is following the PMI Data Security Principles and Framework to develop state-of-the-art approaches to ensure that data collected from one million or more participants are secure from unintended release. These approaches will also make the data readily available to researchers studying a wide range of health topics and to participants themselves. These are just a few of the many data science efforts underway and moving forward at the NIH.
We need your ideas, your guidance, and your wish-list—how should NIH tackle the promise and the pitfalls of data science?