Democratization of NIDDK Knowledgebases

Institute or Center: National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

Project: Democratization of NIDDK Knowledgebases

Skills sought:

Data engineering skills related to cloud networking, infrastructure, and security for the purposes of cloud migration
Experience and/or interest in integration of multiple data types including genomic/omics, clinical, and imaging data
Experience developing API’s and computational tools for systematic analysis and advanced query frameworks of large, heterogenous data sets
Experience as a technical liaison among multiple stakeholders

About the position: The NIDDK seeks a DATA scholar to develop computational pipelines and strategies, and data management frameworks to migrate large NIDDK public datasets to a cloud-environment that will enable FAIR principles to democratize data analysis and data usage to accelerate discovery. The scholar will

Develop well-defined data management frameworks and standards for migration of public data from various NIDDK consortia to a cloud-based environment using re-hosting and re-platforming strategies
Develop approaches to ensure data security, data integrity and compliance
Develop API’s and computational tools for systematic analysis and advanced query frameworks of large, heterogenous data sets
Develop novel standardized approaches to enable secure coordination of data integration across multiple sources. This project will allow the scholar an opportunity to develop standards on migration of multiple NIDDK datasets to the cloud environment

About the work: The NIDDK supports several research consortia and projects that are employing state-of-the-art technologies to generate large, diverse, and complex datasets, consisting of genomics/omics, clinical, and imaging data. This project will develop the framework and standards to understand the computational challenges that require migration and integration of large and diverse public datasets to a cloud-based infrastructure environment to support a shared ecosystem that maximizes access and reuse of biomedical research data.

Datasets included: The scholar will pilot this effort using the ReBuilding a Kidney (RBK) and GenitoUrinary Development Molecular Anatomy Project (GUDMAP) knowledgebases. In doing so, the scholar will pave the way for future harmonization of other kidney and urologic datasets, including the Kidney Precision Medicine Project (KPMP), Human Biomolecular Atlas Program (HuBMAP), and Human Cell Atlas (HCA), and larger integration efforts across the NIDDK (Accelerating Medicines Partnership-Common Metabolic Diseases, AMP-CMD) and NIH.

Why this project matters: The migration of GUDMAP and RBK data to the cloud environment democratizes data access and data analyses allowing researchers access to unique and shared computational resources to leverage computational analysis at scale. This project is in alignment with the overall long-term vision of the NIH in providing the research community access to datasets and that fostering increased data sharing, analysis, and reuse of data.

Work Location: Bethesda, MD

Work environment: The scholar will work with Drs. Chris Ketchum (deputy director), Eric Brunskill (data scientist), and Robert Star (division director). The scholar will be primarily mentored by Dr. Star (Dr. Ketchum co-mentor), who will design an individualized leadership/management trajectory and identify needed resources. The scholar will also work with key project scientists within KUH and NIDDK. To extend the interactions of the scholar beyond the NIDDK and gain additional computational and cloud-environment expertise, the scholar will also be mentored by and consult with Dr. Erika Kim from the NCI, who is a program manager in CBIIT’s Informatics and Data Science Program.

To apply to this or other DATA Scholar positions, please see instructions here: datascience.nih.gov/data-scholars-2022.