Implementing Machine Learning Algorithms to Improve Cancer Surveillance Data

Institute or Center: National Cancer Institute (NCI)

Project: Implementing Machine Learning Algorithms to Improve Cancer Surveillance Data

Skills sought:

  • Artificial intelligence (AI) and/or engineering operational research
  • Algorithm development and deployment in a clinical setting
  • Software management or engineering
  • Natural language processing (NLP)
  • Experience as a technical liaison among multiple stakeholders

About the position: NCI seeks an AI operational engineer to develop a framework for deploying and integrating machine learning (ML) algorithms into the workflow of NCI’s cancer registries. This will include:

  • Establishing quality assurance processes for monitoring algorithm performance and procedures for fine-tuning the algorithms to continue meeting accuracy thresholds.
  • Developing iterative feedback loops to allow for continued performance improvements to the algorithms.
  • Determining methods for assessing usability and obtaining feedback from users.
  • Developing and monitoring metrics to assess the impact of the algorithms on production workflows and efficiency.

The Scholar will develop standard operating procedures that the NCI can continue to utilize as it considers how to continually improve the collection of cancer surveillance data using ML techniques. Additionally, the Scholar will oversee the deployment and integration of at least one ML algorithm in the NCI cancer registries, as well as dissemination to an external organization, such as the Department of Veterans Affairs or Centers for Disease Control and Prevention.

About the work: The Scholar will be in the unique position to work with the NCI Surveillance, Epidemiology, and End Results (SEER) cancer surveillance infrastructure and oversee the deployment and integration of ML algorithms in this infrastructure. The Scholar will disseminate algorithms to the larger cancer surveillance community and other federal agencies—and possibly with academic cancer centers. The Scholar will develop novel ML algorithms and deploy them in a clinical setting.

Datasets involved: Cancer surveillance data is collected by cancer registries under public health reporting mandates and are drawn from a variety of data sources, including pathology reports, radiology reports, and biomarker reports. Cancer surveillance data is also constantly expanding to include new data sources—such as claims or pharmacy data, genomic reports, and pathology and radiology images. The de-identified data is made available to the public via the SEER program and is used by researchers as well as policymakers and the public.

Why this project matters: This project directly supports the Precision Cancer Surveillance Pilot, which is a cross-cutting partnership between NCI and the Department of Energy (DOE). It aims to develop scalable NLP and ML tools for deep text comprehension of unstructured clinical text to enable accurate, automated capture of reportable cancer surveillance data elements. This partnership will move cancer surveillance data reporting closer to real time. Translating the ML algorithms into the actual clinical workflow of the NCI cancer registries is imperative, and integrating ML these algorithms in the NCI cancer registries will improve population-based cancer surveillance data. 

Work Location: Rockville, MD

Work environment: The Scholar will be working as part of a team with wide-ranging domain expertise as described above and from a mixture of sectors, including internal NCI staff, other government agencies, industry professionals, and extramural researchers. The Scholar will have access to expertise in both the cancer domain (via the NCI) as well as data and computational science (via the DOE). The NCI-DOE collaboration involves teams of domain experts from both agencies as well as the NCI SEER registries, which collect and report the information on cancer cases. The Scholar will also work with the NCI contractor that acts as an honest broker to securely store the identifiable data and is responsible for data management and developing tools to support cancer registry functions.

To apply to this or other DATA Scholar positions, please see instructions here:

This page last reviewed on February 22, 2021