Project Point of Contact: Andrew Weitz, Program Director/Behrouz Shabestari, Division Director
Goals and Objectives: The COVID-19 pandemic spawned a paradigm shift in diagnostic testing. Whereas testing has traditionally been conducted in laboratories and point-of-care (POC) facilities, most people now self-test in the privacy of their homes. Mass adoption of at-home testing has created challenges for the government’s ability to conduct disease surveillance: Most individuals choose not to report their results, and those who do often report incomplete metadata. This has caused our nation’s covid testing data to become sparser and less reliable. Still, the federal government is amassing a massive database of COVID-19 self-test results: more than 16 million as of March 2023 and growing by thousands every day.
The government’s OTC testing dataset offers great potential for tracking COVID-19 spread and informing our nation’s pandemic response. For example, maybe it could be analyzed to reveal emerging COVID-19 surges or identify which areas of the country lack sufficient access to home testing. A deep analysis and modeling of the dataset is needed to understand its true potential. According to a recent publication by CDC (https://www.cdc.gov/mmwr/volumes/71/wr/mm7132a1.htm), “continued development of infrastructure and methods to … analyze self-test data could improve their value for surveillance purposes during future public health emergencies.”
We propose to have a DATA Scholar perform a full analysis of the government’s OTC testing dataset, ultimately leading to a set of recommendations, opportunities, and limitations regarding use of self-test data for public health surveillance. Objectives would include performing QA/QC on the data, assessing the degree to which OTC data can replace or augment the role of laboratory/POC testing data, characterizing human behaviors around self-test reporting, answering health equity questions around self-testing, and building predictive models.
Significance: The shift of diagnostic testing into the home setting has created new opportunities and challenges for public health surveillance. Public health agencies can no longer rely on laboratory and POC tests as a comprehensive source of truth. Although this challenge is currently unique to COVID-19, mass adoption of self-testing is expected to penetrate other markets in the coming months and years. Companies are already developing at-home, OTC tests for influenza, sexually transmitted infections, and other diseases. As more testing occurs outside of traditional healthcare settings, it is imperative to understand the possibilities and limitations for using self-reported testing data in public health surveillance.
By conducting a comprehensive analysis of our nation’s OTC COVID-19 dataset—the first dataset of its kind—the DATA Scholar can help shape our nation’s approach to leveraging self-reported test results, both for present (COVID-19) and future (other diseases) applications.
Description: Through this project, a DATA Scholar will perform a comprehensive analysis of the federal government’s OTC COVID-19 testing dataset, ultimately leading to a set of recommendations, opportunities, and limitations regarding use of self-test data for public health surveillance. The Scholar will leverage the testing data in the HHS Protect database (https://www.cdc.gov/orr/topics-programs/hhs-protect.html) to address multiple interrelated questions, including:
- To what extent can OTC testing data replace or augment the role that laboratory/POC testing data have traditionally played in disease surveillance and pandemic response?
- Can computational models be built from the OTC testing data and used for forecasting, for example to predict COVID-19 surges in advance?
- Can OTC testing data be combined with other data sources, such as population wearable device data, to make more accurate predictions (see https://www.thelancet.com/journals/landig/article/PIIS2589-7500(22)00156-X/fulltext)?
- How do UX/UI design choices in mobile apps impact human behaviors related to test result reporting?
- What correlations exist between the OTC testing data and US census variables? For example, do reporting behaviors correlate with social vulnerability?
- What temporal testing patterns are most common among individuals who have reported multiple test results? Are individuals complying with FDA guidance for serial antigen testing?
By conducting a comprehensive analysis of our nation’s OTC COVID-19 dataset—the first dataset of its kind—the DATA Scholar can help shape our nation’s approach to leveraging self-reported test results for disease surveillance. This has far-reaching implications, not just for COVID-19, but also for other diseases that will soon be testable at home (influenza, sexually transmitted infections, etc.).
Data set(s) involved: To receive Emergency Use Authorization (EUA) for an OTC SARS-CoV-2 test, the FDA requires test manufacturers to develop a mobile app or website that individuals can use to voluntarily report their test results to state and federal public health agencies. At the federal level, all reported test results are de-identified and housed a database called HHS Protect (https://www.cdc.gov/orr/topics-programs/hhs-protect.html), maintained by Palantir. This line-level database currently contains more than 16 million test results and is constantly growing.
NIBIB’s RADx® MARS program (https://www.nibib.nih.gov/covid-19/radx-tech-program/mars) has played a key role in ensuring that the data reaching HHS Protect are clean and interoperable, to the extent possible. This was accomplished by establishing and promoting a common specification for reporting OTC test results. Most of the largest covid at-home test manufacturers have already adopted or are in the process of adopting this specification. RADx® MARS and its government partners are now interested in analyzing the OTC data being received by the federal government, to see what possibilities it may offer for disease surveillance.
The DATA Scholar will receive access to the de-identified HHS Protect dataset, which contains every OTC covid test result reported to date. These data are supplemented with rich metadata that include the test type, age and zip code of the individual, presence of symptoms, and additional demographic information. The Palantir environment provides programmatic and graphical interfaces to the data, as well as rich analytical, modeling, and dashboarding tools. Data can also be exported for analysis.
Anticipated outcomes of the project: This project has potential to shape our nation’s approach for using self-testing data in public health surveillance and pandemic response. The ultimate outcome will be a set of recommendations for doing so, including known limitations and future opportunities. Other outcomes will include publications and analytical/modeling tools for leveraging OTC testing data. The Scholar will be expected make several presentations to government partners including CDC, FDA, ASPR, and the White House – all of whom are interested in possibilities for leveraging OTC testing data.
Required skills of the DATA Scholar:
- Programming in commonly used languages in data science, such as Python
- Experience with healthcare information exchange standards (HL7v2, FHIR, etc.)
- Proficiency in building analytical and data-driven models, as well as the tools for doing so (e.g., cloud computing)
- Experience analyzing wearable device data (preferred, but not required)
- Interest in public health policy
- Expected/preferred length of DATA Scholar appointment: 1 year
- Expected/preferred time effort commitment of the DATA Scholar: Full time (100%) if 1 year. Part time (50-75%) if 2 years
Remote work preference: 100% remote allowable
ICO support: The Scholar will work as a senior special project leader within NIBIB and interact closely with senior leadership, as well as a team of experts in digital health, diagnostic testing, and data/computer sciences within the Division of Health Information Technologies (DHIT). This team will provide guidance relevant to navigating the NIH and serve as the NIBIB technical contact.
The Scholar will receive technical mentorship from Andrew Weitz, PhD, DHIT Program Director. Dr. Weitz has extensive expertise in COVID-19 home testing and result reporting, having developed and led the RADx® MARS program. The Scholar will report to Behrouz Shabestari, PhD, Director NIBIB National Technology Centers Program. Dr. Shabestari is an expert in health informatics with longstanding experience within the NIH and will guide the Scholar on NIH policies and help navigate the NIBIB infrastructure and systems.
Additional activities: The Scholar will attend weekly DHIT division meetings, RADx-Tech meetings, and will participate in general NIBIB matters that broadly involve the Institute’s staff (e.g. Advisory Council). The Scholar will have the opportunity, if s/he so desires, to help oversee the RADx® MARS program. This includes working with COVID-19 test manufacturers and app developers to build out mobile device reporting capabilities. In addition, the Scholar will be expected to keep NIBIB’s government partners (CDC, FDA, etc.) updated about the project’s progress.
Career or professional development opportunities: The Scholar will be provided opportunities to take NIH career-development trainings and other courses such as FAES. The Scholar will be encouraged to attend 1-2 scientific conferences per year of his/her choosing.
To apply to this or other DATA Scholar positions, please see instructions here: datascience.nih.gov/data-scholars.