Application of Data Science to EHR Data from the All of Us Research Program (OD/AoU)

Project Point of Contact: Lew Berman, PhD, MS, Branch Chief (NIH / OD / All of Us Research Program / Division of Technology and Platform Development / Digital Health and Data Branch)

Goals and Objectives: Develop standardized analytic and AI/ML tools to expand the analysis, use, and integration of electronic health record (EHR) data in the All of Us Researcher Workbench.

Provide training and technical assistance through sample code and documentation in using these data and tools among consortium staff and users of the Researcher Workbench and, by doing so, help democratize data analysis and usage to accelerate discovery using All of Us data.

Significance: The overarching goal of the All of Us Research Program is to transform our understanding of the factors that contribute to health and disease. All of Us is on a trajectory to enroll 1 million individuals over the next few years, reflecting the rich diversity of the United States. These individuals will vary by age, cultural background, digital access, educational attainment, health status, socioeconomic status, and technical ability. The Division of Technology & Platform Development (DTPD) is responsible for delivering key capabilities that enable the All of Us research program mission to collect, curate, and protect data from engaged, diverse communities; provide access on a platform with tools, demonstrations, and training. Accordingly, the incumbent will contribute to the division's effort to reach out to the scientific and non-scientific community by simplifying and streamlining the approach to using All of Us EHR data.

Description: The All of Us data environment presents a complex computational challenge to researchers. Given the relative newness of the secondary use of EHR data at scale for research, concern about data quality impacting generalizability and accuracy of conclusions is of the utmost importance. Moreover, new methods in AI/ML are important to advance analytic use of the All of Us EHR data.

The incumbent will need expertise in EHR data and in developing, standardizing, and deploying analytic tools for exploring biomedical clinical, environmental, and scientific data in a cloud environment. The incumbent will help the program advance EHR phenotype data quality control and quality assurance grounded in transparency and scientifically accurate reporting appropriately addressing internal and external validity.

Experience with AI/ML methods, with a focus on analyzing EHR data, is required. The tools developed will be incorporated into easily accessible online tutorials, templated for multiple analytical uses, and generated as a Jupyter notebook supporting Python, R, and SAS. Functionalities of these tools include data analysis, exploration, visualization, and other areas that will advance EHR analyses and computational processing. Accordingly, the incumbent should have some experience with statistical analyses and in developing documentation with research data and tools.

Data set(s) involved: The All of Us Research Program has extensive EHR data captured from three primary sources. These sources include EHR systems at enrollment sites, record sharing on Apple smartphones through the Apple Health Kit, and "Right of Access (RoA)" methods.

Anticipated outcomes of the project:

  • Templated EHR analytic tools for quality assurance / quality control with explicit annotation and standardization
  • Development of an online tutorial for quality assurance / quality control of All of Us EHR data
  • Development of a manuscript illustrating AI/ML techniques to All of Us EHR data
  • Presentations of analytic work

Required skills of the DATA Scholar: 

  • The incumbent should demonstrate experience using big data for developing and deploying tools for analyzing biomedical clinical, environmental, and scientific data in a cloud environment using Jupyter notebooks and coding in Python or R.  Experience with SAS is not required but may be helpful
  • The incumbent should demonstrate statistical analyses and analytics expertise for complex biomedical, clinical, environmental, and scientific data
  • The incumbent has experience in applying AI/ML techniques to EHR data
  • The incumbent has demonstrated experience providing documentation, training, and technical assistance using operational and research data and tools
  • The incumbent has demonstrated experience in the visualization of complex health data
  • The incumbent has demonstrated experience reviewing quality assurance and quality control data across multiple biomedical, environmental, and scientific domains
  • The incumbent has demonstrated experience and understanding of the complexity and research use associated with EHR data
  • The incumbent has an understanding of standardized medical terminology or Observational Medical Outcomes Partnership (OMOP)

Expected/preferred length of DATA Scholar appointment: 2 years.

Expected/preferred time effort commitment of the DATA Scholar: Full time (100%)

Remote work preference: Hybrid preferred.

ICO support: Several resources will be available to support and mentor the incumbent. Specifically, Lew Berman, Ph.D., M.S  Branch Chief, will be directly responsible for mentoring the incumbent. Also, Ami Ostchega, Ph.D., RN, will assist and extensively be involved in mentoring the incumbent. Other staff within the All of Us program will assist in mentoring, including additional data scientists and clinicians.

Additional activities: The division is extensively involved in innovative ways to increase the visibility and usability of the All of Us data; the incumbent will be involved in meeting with academic and industry partners, participating in the development of manuscripts and data briefs where appropriate, and attending meetings and workshops on a variety of topics related to All of Us, EHR, among other foci. 

Career or professional development opportunities: Efforts related to this work will present the incumbent with a rich opportunity for professional development, including:

  • Exposure to learnings from the operations and deployment of a large-scale longitudinal study, and adherence to the primary principles of inclusion and scientific rigor
  • Understanding of the challenges involved with releasing comprehensive data about personal health into a secure enclave and the regulatory and policy issues surrounding such activities
  • Understanding of how All of Us EHR data is reviewed for quality and in the development of phenotype quality profiles
  • Develop AI/ML best practices and procedures to inform individuals on how to use these advanced computational techniques to analyze All of Us EHR data 
  • Opportunity to network with staff within the All of Us Research Program, scientific staff at NIH, and All of Us Research Program partners

Opportunity to participate in scientific meetings, colloquia, and conferences or workshops where applicable.

To apply to this or other DATA Scholar positions, please see instructions here:


This page last reviewed on April 17, 2023