Supporting Cancer Knowledge Extraction Through the Cancer Research Data Commons

Institute or Center: National Cancer Institute (NCI)

Project: Supporting Cancer Knowledge Extraction Through the Cancer Research Data Commons (CRDC)

Skills sought:

  • Background in Computer Science/Data Science/Biomedical Informatics
  • Expertise in Cloud Computing, Artificial Intelligence (AI), Knowledge Graph/Graph Database
  • Experience working with large, complex datasets and an interest in working cancer research data
  • Excellent oral and written communication skills

About the position: NCI is seeking a DATA Scholar who can identify critical questions in translating data to knowledge and define detailed technical solutions utilizing CRDC resources.

Tying together CRDC themes of harmonization, integration, and aggregation, the Scholar will work with the Cancer Data Aggregator (CDA), a to-be-developed CRDC component. They will provide innovative perspectives and solutions to downstream integrative data analysis that CDA will enable once available. The DATA Scholar will perform aggregations and help develop the CDA API, working closely with the CDA team to beta test and perform downstream analysis. The driving use case for the CDA is the Human Tumor Atlas Network (HTAN) project, an important component of the Cancer MoonshotTM.

The DATA Scholar will use knowledge graph, machine learning, and other advanced methodologies to support the increasing demands for data management, sharing, and analysis solutions that are accelerated by large and complex data generated by large NCI projects. This position offers the unique opportunity to help NCI shape and build an integral component of the CRDC.

About the work: The vision for the NCI CRDC is a virtual, expandable infrastructure that provides secure access to diverse data types across scientific domains and the ability to perform cross-domain analysis of large datasets that can ultimately lead to new discoveries in cancer prevention, treatment, and diagnosis. “Domain-specific” nodes (e.g., NCI’s Genomic, Proteomics, and Imaging Data Commons) act as data repositories and three NCI Cloud Resources provide data analysis/computation in the cloud. Currently, NCI is adding core components to the CRDC infrastructure to enable aggregation of multi-domain data for integrative analysis to efficiently extract knowledge from large scale datasets.

Datasets involved:

Why this project matters: The goal of the CRDC is to provide cancer researchers access to valuable cancer data generated from basic research, clinical trials, and population-level studies, along with state-of-the art analysis tools and a flexible cloud-based infrastructure. Success of the program requires addressing many data-related and technical challenges, including the aggregation and harmonization of multi-petabyte datasets across scientific domains and technology platforms and the integration of a broad array of visualization and analysis tools and pipelines. The outcome of the Scholar’s work will increase the overall value of CRDC for the cancer research community and ultimately help to improve outcomes for cancer patients.

Work Location: Rockville, MD

Work environment: The DATA Scholar will work closely with CRDC program leadership, including the Director of the NCI Center for Biomedical Informatics & Information Technology (CBIIT), and participate in bi-weekly CRDC team meetings. The DATA Scholar will work most closely with the CBIIT, CRDC, and CDA teams and will have opportunities to participate in NIH-wide working groups and projects. The Scholar, together with their supervisor and the branch chief, will be responsible for assessing how to meet his/her goals, including immediate and long-term planning and evaluating the results.

To apply to this or other DATA Scholar positions, please see instructions here:

This page last reviewed on January 29, 2020