Data Science at the NIH ICs


The Office of the Associate Director of Data Science may be new, but Data Science activities have been going on at the NIH for much longer than this Office. Resources, research, and funding programs that impact the biomedical Data Science community exist all across the Institute and Centers.  The breadth of the NIH involvement in Data Science reveals the strong commitment to the potential and impact of Big Data methodologies on biomedicine research in the Institutes and Centers. 

Each of these activities has a home in at least one Institute or Center but sometimes they are hard to find if you don’t already know about them. The Office of the Associate Director believes that highlighting Data Science activities across the NIH is an important step for the growth of the NIH biomedical Data Science community. There is so much that we can learn from each other.

Highlighted NIH Biomedical Data Science Activities

NCI Cancer Genomics Cloud Pilots

Sustainable database architecture and open accessibility are necessary for biomedical Big Data projects to succeed. The National Cancer Institute recently awarded Cancer Genomics Cloud pilots to develop infrastructure and test the efficacy of the cloud compute model for the access and use of high-content genomics data from the Cancer Genome Atlas (TCGA) resource.

NIA Accelerating Medicines Partnership
Alzheimer’s Disease Knowledge Portal

The National Institute on Aging’s Alzheimer’s Disease Knowledge Portal is a product of a public-private partnership under the Accelerating Medicines Partnership (AMP) program. It is focused on using combined government, industry, and academic resources to speed target discovery and biomarker development. This portal supports data from human samples, animal, and cell models developed by members of the AMP-AD Target Discovery Consortium and the partner organizations, as well as, legacy data supporting their work.

NIH Biowulf HPC Cluster

Big Data research requires computing! The Biowulf cluster is a trans-NIH Intramural resource for High-Performance Computing.  The cluster is a GNU/Linux parallel processing system that is managed by a team of professional IT staff. It is pre-loaded with many bioinformatics applications and databases.


