By Elizabeth Kittrie, Senior Advisor for Open Innovation & Policy and Joe Bonner, Health Scientist-AAAS Fellow
The Open Data Science Symposium on December 1, 2016 will showcase an approach to scientific inquiry that illustrates the power of open data in developing innovative technologies and services to benefit health and biomedical research. Open Data Science promotes the sharing of data and corresponding tools, allowing communities to freely collaborate on scientific endeavors. It leverages crowdsourcing and open innovation, and complements an existing suite of platforms, tools, and funding initiatives established by the Office of the Associate Director for Data Science (ADDS) over the past two years. This symposium will convene researchers and engage stakeholders to explore ways in which data can be shared, applied, and exploited for knowledge.
As the volume and complexity of health data grows, these data are increasingly important to every aspect of the biomedical research enterprise. Biomedical researchers at NIH, and beyond, are producing massive amount of data. It is estimated that the total amount of data currently funded by NIH’s intramural and extramural communities is approximately 650 petabytes, and that number is growing very fast. To give you a sense of just how much data this is, consider that the entire holdings of the Library of Congress is estimated at 5 petabytes!
When health data is openly available, and the infrastructure and tools exist to take advantage of this data, this becomes a valuable resource to the biomedical research community as well as innovators in healthcare industry. A McKinsey Study estimated that harnessing open data in healthcare could help generate $300 to $450 billion per year in value to the U.S. economy.
The ADDS Office is charged with developing and facilitating data science activities across the 27 Institutes and Centers at the National Institutes of Health (NIH) as well as funding extramural data science research through the Big Data to Knowledge (BD2K) Initiative. Data science is the development or use of technologies (algorithms, software, repositories, etc.) to extract new findings from data. This data can be generated by new research or by combining previous datasets into new, larger datasets. As datasets are shared and combined they give rise to “Big Data,” which in turn requires the development of new platforms, computational tools, and analytic capabilities at a sweeping scale to fully exploit the capacity of big data to derive insights faster and more reliably.
So why should the research community care about Open Data Science and how can this expanding phenomenon benefit the NIH and beyond? Open Data Science is more than Open Data or Open Science -- it is about how teams come together to utilize data and tools in new ways to advance scientific inquiry. The Human Genome Project is a quintessential example of Open Data Science in which teams worked together to decode a problem that no one lab could solve alone, accelerated our understanding of disease etiology, and expanded the universe of genomic discovery. Open Data Science, done at scale, can maximize the value of Big Data. It has the potential to accelerate the rate of discovery, broaden the dissemination of innovation, and enhance biomedical research.
Technological advances in cloud-computing, social networking, natural language processing, artificial intelligence, and semantic web are powering Open Data Science in a broad range of fields --- from finance to retail to biopharmaceutical research. These same forces are impacting the biomedical research enterprise, providing unprecedented opportunities for traditional scientists and non-traditional scientists solvers to use data in new ways and to create synergies among teams in ways that were never possible before. These trends allow us to:
- Ask powerful questions leading to fresh discoveries and new insights.
- Utilize existing data in novel ways by combining seemingly disparate data sources and allowing domains that were previously siloed to cross-pollinate.
- Link expertise from complementary fields that might not traditionally work together, but whose cumulative knowledge can shed new light on problems.
To advance understanding of Open Data Science, the Open Data Science Symposium will feature a number of discussions and panels, featuring leading experts in biomedical research and open science. This includes thought leaders, such as NIH Director Dr. Francis Collins who led the Human Genome Project as well as former NIH Director Dr. Harold Varmus who led the development of PubMed Central at NIH, a free full-text archive of biomedical and life sciences journal literature. Also on display, will be the results of the Open Science Prize, a collaborative initiative led by the NIH in partnership with the Wellcome Trust (UK) and the Howard Hughes Medical Institute (Chevy Chase, MD), in an effort to explore what solvers around the world could do with openly available biomedical data to advance discovery and human health.