Creating Multi-modal Cancer Data Integration Solutions from Cross-atlas Datasets

Institute or Center: National Cancer Institute (NCI) in collaboration with the Office of the Director (OD) and the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK)

Project: Creating Multi-modal Cancer Data Integration Solutions from Cross-atlas Datasets

Skills sought:

  • Expertise in cloud computing and programming (Python, R, etc.)
  • Development of artificial intelligence and machine learning models
  • Expertise in applying advanced analytics and visualization solutions
  • Experience with multi-omics, single-cell, and imaging data analysis

About the position:  The NCI seeks a data scientist to integrate large-scale single-cell omics and imaging data from multiple NIH platforms and to develop novel tools for analysis and visualization of cross-atlas data.  The DATA Scholar will

  • perform integration of large-scale single-cell omics and imaging data sets from multiple NIH platforms
  • develop proof-of-concept or prototype tools for visualization of cross-atlas data
  • document cross-atlas integration and development strategies, including for potential publication

About the work:  Pathogenesis within cancer and other diseases involves complex interactions between cells within the microenvironment. Recent revolutions in single-cell technologies in multi-omics and imaging provide unprecedented opportunity to interrogate this complexity at single-cell resolution. This project will leverage the DATA Scholar’s expertise to create novel 2D and 3D multi-modal data integration solutions from several important NIH atlases containing single cell data. To accomplish this goal the DATA Scholar will utilize the power of the NCI’s Cancer Research Data Commons (CRDC), and they will refine their skills in cross-atlas analysis and visualization.

Datasets included:  Several large-scale initiatives have focused on generating ‘atlases’ that integrate genetic, gene expression, molecular pathway, cell-state and composition, tissue morphology, patient phenotypic, and other data types to facilitate our understanding of health and disease, including NCI’s Human Tumor Atlas Network (HTAN), Common Fund’s Human BioMolecular Atlas Program (HuBMAP) and NIDDK’s Kidney Precision Medicine Program (KPMP).  The DATA Scholar will also have access to diverse data modalities within the CRDC (e.g., TCGA, CPTAC), precision medicine datasets from across the NIH (e.g. GTEx, TOPMed, Kids First), and NIH-wide initiatives (e.g., All of Us Program, NCPI).

Why this project matters: The DATA Scholar will play a key role in design, development, and implementation of high impact analytical and visualization solutions to catalyze translational research focused on better understanding initiation, transition, and progression of cancer along the entire spectrum of healthy to diseased states (e.g., pre-cancer to metastatic), applicable even beyond cancer.

Work Location: Rockville, MD (as conditions allow).  Fully remote scholars will be considered.

Work environment: The DATA Scholar will join an NCI team experienced in biomedical informatics, data science, and informatics project management. The DATA Scholar will engage in independent work on multi-Institute interoperability projects (e.g., NCPI) and will work under the direction of CRDC leadership as well as work closely with the extramural program leadership from NCI’s HTAN, Common Fund’s HuBMAP, and NIDDK’s KPMP programs as project collaborators. The DATA Scholar will be supported through the CRDC budget, and travel funds will be available for the DATA Scholar to attend conferences and professional development trainings.

To apply to this or other DATA Scholar positions, please see instructions here: datascience.nih.gov/data-scholars-2022.

This page last reviewed on April 7, 2022