Jumpstart Executive Summary | Data Science at NIH

Progress Towards Developing Data Infrastructure for COVID-19

Executive Summary

In July 2020, the Office of the Director funded $8.45M in demonstration projects and other activities (termed Jumpstart) to help the NIH gain insights in near real-time patient-level data to address the COVID-19 pandemic. The primary purpose of Jumpstart was to accelerate and supplement NIH-wide research and data science efforts to more effectively study the SARS-CoV-2 infection and its associated COVID-19 disease. At the beginning of the COVID-19 pandemic, researchers found that COVID-19 related data in a wide range of places including public and private institutions and inaccessible electronic health records (EHRs). Jumpstart pursued lowering data barriers to research by offering financial supplements to nine pilot efforts focused on making clinical and related COVID-19 data more accessible to researchers studying the pandemic.

Nine pilot projects explored solutions to several key issues related to extracting near real-time patient data, including assessing institutional burden, streamlining data sharing agreements, developing efficient data mapping strategies, and piloting strategies for aggregating COVID data across data repositories. Jumpstart and its affiliated pilot projects set out to evaluate the facets of extracting clinical data from EHRs, clinical warehouses, case report forms, and databases associated with clinical trials. When matched, each of these extraction platforms has the potential to offer insights into observational, clinical trial, and imaging data while providing insights to near real-time patient-level data.

All Jumpstart projects aligned with at least one of the programmatic goals: 1) exploring data capture from different clinical institutions in a way that enabled access to near real-time data without unduly burdening the host institutions; 2) mapping data to a common data model; and 3) demonstrating that data resources established by a project team could be effectively used by new, unaffiliated researchers. Most projects funded as part of Jumpstart accomplished their goals within their first year.

The Jumpstart funding achieved significant accomplishments. The All of Us® program created the COVID-19 Participant Experience (COPE) survey to learn more about the way the pandemic is affecting people’s lives, and nearly 100,000 participants responded to 1 or more of the 6 repeated COPE surveys. All of Us also piloted a Health Provider Organization (HPO)-Lite recruitment strategy, utilizing three sites from the existing NIH-funded Clinical and Translational Science Awards (CTSA) network, to more easily increase the geographic diversity of the existing AoU program. Through Jumpstart funding, the National Institute of General Medical Sciences (NIGMS) supported the Clinical and Translational Research (CTR) network’s participation in the National COVID Clinical Consortium (N3C), expanding tools for data extraction and harmonization into the N3C platform. This effort also supported expanded training in clinical data science with clinical investigators that resulted in newly published papers. The Jumpstart funding also refined and validated an algorithm across both U.S. and international sites for identifying COVID-19 patients with severe disease, by expansion of the existing Consortium for Clinical Characterization of COVID-19 by EHR, known as the 4CE initiative. Another notable achievement was the creation of COVID-19 Common Data Elements (CDEs) (see the NIH CDE Repository for COVID-19 and other CDEs), a set of key variables intended that served as a set of reference variables to improve consistency of data collected across COVID-related studies. CDEs were foundational to accelerating reproducibility in research and supported integrated data collection efforts in the Rapid Acceleration of Diagnostics (RADx®) program. Finally, three significant data repositories, All of Us, the N3C, and a COVID clinical study database (Biodata Catalyst), piloted an approach to link records from a single individual across multiple sites (such as a hospital record system or clinical trials sites). This pilot demonstrated a privacy preserving approach based on encryption, tokenization, and probabilistic matching of participant records. While awaiting performance assessment of both the tokenization and matching process, it shows promise for use in future research for integrating participant data across multiple studies.

In addition to the nine pilot projects, Jumpstart established collaboration forums for data scientists, researchers, ethicists, and others through webinars and workshops such as the NIH Workshop on the Policy and Ethics of Record Linkage Workshop. The large workshop attendance and ongoing engagement in a monthly webinar series attested to the significant interest in COVID-19 related data science. While the COVID-19 pandemic created a unique sense of urgency to understand the patient experience in near real time, Jumpstart accelerated broader NIH interests in more efficient data access and data linkage.

The Jumpstart program faced significant challenges in assuring that sufficient policy and governance support existed for near-real-time access to data and patient consent to data linkage. Despite these challenges, Jumpstart successfully demonstrated significant progress for data science at NIH. Jumpstart’s outcomes not only demonstrate new approaches to data collection, harmonization, and linkage, but also highlight the NIH’s ability to advance existing priorities through innovative investments and swiftly convene experts to work toward a common goal.