To harness the full potential of Big Data scientists must be able to readily find, cite, and access existing data and other digital objects, such as software. There is no existing infrastructure or incentive that enables this. These basic goals maximize data use, enable sharing, limit duplication of effort, and allow areas of sparse research coverage to be more readily identified. To advance the infrastructure and policies needed to meet these goals, awards in this area address the challenges of resource discovery, citation, and access.
The Data Discovery Index concept furthers BD2K’s goal of improving the sharing of biomedical data. It will enable researchers to make better use of what already exists. It will also allow them to produce datasets that complement existing data for greater analytic potential.
In 2014, BD2K awarded a Data Discovery Index Coordination Consortium grant to the BioCADDIE project. BD2K also made a series of Data Discovery Index Supplement awards. These grants permit existing NIH-funded projects to join the consortium activities.
Data Discovery Index Coordination Consortium (DDICC) Award
Biological and HealthCare Data Discovery and Indexing Ecosystem (bioCADDIE)
bioCADDIE seeks to develop a prototype DDI that will enable finding, accessing and citing biomedical big data. bioCADDIE has a Community Engagement mandate that seeks to work with the broader biomedical community to better identify data, and other digital objects, so that they may find shared data in ways that allow for extracting maximal knowledge.
Data Discovery Index (DDI) Supplement Awards
The Cardiovascular Research Grid
Johns Hopkins University
PI: Raimond Lester Winslow
Grant Number: 3R24HL085343-08S1
The Cardiovascular Research Grid (CVRG) Project is a national resource providing the capability to store, manage, and analyze data on the structure and function of the cardiovascular system in health and disease. The CVRG will develop new tools that will enhance the ability of researchers to explore and analyze their data to understand the cause and treatment of heart disease.
Computational tools for the analysis of high-throughput immunoglobulin sequencing
PI: Steven H. Kleinstein
Grant Number: 1R01AI104739-01A1
This project will develop and validate computational methods to analyze large-scale immunology sequencing data sets. These methods will provide insights into the mechanisms underlying autoimmune disease, as well as biomarkers for susceptibility to infection or vaccination response.
Discovering and Applying Knowledge in Clinical Databases
Columbia University Health Sciences
PI: George M. Hripcsak
Grant Number: 3R01LM006910-15S1
This project uses data mining and knowledge engineering studies the electronic health record in order to better understand how health care processes cause systematic bias and other problems in the data which complicate incorporation into scientific studies. By avoiding or correcting those problems, we hope to improve reuse of the data for purposes such as clinical research and quality improvement.
fMRI-based Biomarkers for Multiple Components of Pain
University of Colorado
PI: Tor Dessart Wager
Grant Number: 3R01DA035484-02S1
Current treatments for pain are only modestly effective, in large part because pain is created through a complex set of brain processes and can be measured only by patients' self-reports, which presents a serious barrier to effective research and treatment. This project capitalizes on recent breakthroughs in measuring human brain activity and using it to objectively assess the brain processes that underlie pain experience, which could transform the way pain is measured and new treatments are developed.
Generation of a centralized and integrated resource for exposure data
North Carolina State University – Raleigh
PI: Carolyn J. Mattingly
Grant Number: 3R01ES019604-04S1
Most human diseases involve interactions between genetic and environmental factors; however, the basis of these complex interactions is not well understood. This project will enhance the capacity for prediction, analysis and interpretation of environment-disease networks by developing novel analysis and visualization tools that include exposure data. These tools will leverage the public Comparative Toxicogenomics Database (CTD), which aims to promote understanding about environment-disease relationships.
A Hub for the Nuclear Receptor Signaling Atlas
Baylor College of Medicine
PIs: Bert W. O’Malley, Ronald Evans, and Neil McKenna
Grant Number: 3U24DK097748-03S1
Nuclear receptors (NRs) and their coregulators are important therapeutic targets in many different disease states including cancer, obesity, diabetes, inflammation, neurological disorders and senescent diseases. This project will produce a NR research community resource hub for information and data analysis tools and will provide community research grants to generate datasets to populate the hub. These initiatives will have tangible benefits for the progress of research in the field towards developing novel NR- and coregulator-based therapeutics.
Natural language processing for clinical and translational research
The Mayo Clinic – Rochester
PIs: Hongfang Liu, Serguei Pakhomov, and Hua Xu
Grant Number: 3R01GM102282-02S1
Rapid growth in the clinical implementation of large electronic medical records (EMRs) has led to an unprecedented expansion of datasets for clinical and translational research. This project will develop a novel natural language processing framework to enable the use of information embedded in clinical narratives for research.
Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance
PI: Andrew Robert Cohen
Grant Number: 5R01NS076709-04
This project will develop and evaluate methods to identify automatically biologically plausible adverse drug events found within clinical patient records, using knowledge extracted from the biomedical literature. If successful, these methods will provide the means for earlier detection of harmful drug effects, limiting consequent morbidity and mortality.