Enhancing Interoperability of Multi-modality Medical Image and Multi-format Clinical Data Repositories for AI/ML Algorithm Development for Clinical Applications

Institute or Center: National Institute of Biomedical Imaging & Bioengineering (NIBIB)

Project: Enhancing Interoperability of Multi-modality Medical Image and Multi-format Clinical Data Repositories for AI/ML Algorithm Development for Clinical Applications

Skills sought:

  • Experience integrating multiple data types including medical images (e.g., DICOM), electronic health record (EHR), genomics, and pathology data.
  • Prior experience with data resource interoperability, Global Unique Identifiers (GUIDs) or other Privacy Preserving Record Linkage (PPRL) tools across multiple repositories is a plus.
  • Experience with large datasets and statistical methods for imputation, dealing with missing or incomplete data, particularly for EHR.
  • Programming (Python and R preferred) and cloud experience (e.g. AWS, GCP, Azure).

About the position: The NIBIB seeks a data scientist to integrate large, diverse types and formats of data for its Medical Imaging and Data Resource Center (MIDRC). The scholar should be able to:

  • Define practical and adequate clinical use case(s) for collaborating NIH and external repositories.
  • Develop a strategy to measure or define record completeness across multiple data types and recommend how to improve data completeness within selected repositories. Outline a framework to remediate missing data and bias and develop tools /software to address them.
  • Pursue and implement pilot interoperability studies addressing issues such as data governance (data access, limitations), privacy preserving record linkage (PPRL).
  • Present strategies and recommendations to leadership and initiate implementation.
  • Demonstrate successful completion of a pilot study, deploying solutions at scale, creating multi-repository AI/ML-ready imaging and non-imaging datasets for algorithm development.
  • Quantify real-world performance of AI/ML algorithms trained with multi-repository datasets; Compare performance obtained with individual datatypes with that following data synergy.
  • Disseminate learning and research materials to benefit the academic community.

About the work: Imaging contributes critical information for clinical decision making. Artificial intelligence (AI) methods combining imaging and clinical data are critical for addressing clinical needs.  A key limitingfactor is that image datasets are rarely co-present with clinical datasets. Interoperability, linking image and clinical data across repositories, is critical. MIDRC collaborates with NIH repositories such as N3C (NCATS), BioData Catalyst (NHLBI), and All of Us.

Datasets included: Data includes medical images (Xray, CT, MRI, Ultrasound, Nuclear Medicine) through MIDRC and electronic health records from a variety of sources. After identifying practical use cases, key contributions expected of the scholar are to establish data interoperability and extract enhanced value, beyond what is present in each individual dataset, from the combined datasets.

Why this project matters: Looking at health as a dynamic life-time process has direct public health benefits, from accelerating diagnoses and predicting outcomes to optimizing health management. MIDRC must serve in a new era in which researchers, health care providers, technology experts, community partners, and the public work together to develop individualized health care.

Work Location: Bethesda, MD

Work environment: The scholar will report to Dr. Krishna Kandarpa, NIBIB Director of Research Sciences & Strategic Directions. The scholar will interact with leaders and program staff within the Divisions of Science & Technology (DAST) and Health Information Technology (DHIT), who would serve as the NIBIB ‘technical contacts’ and provide guidance relevant to navigating the NIH. The Scholar will interact regularly and frequently with all collaborators within MIDRC, NIH and non-NIH repositories.

To apply to this or other DATA Scholar positions, please see instructions here: datascience.nih.gov/data-scholars-2022.

This page last reviewed on April 7, 2022