Request for Information (RFI): Proposed Use of Common Data Elements (CDEs) for NIH-Funded Clinical Research and Trials

Notice Number: 

NOT-OD-24-063

Key Dates

Release Date: February 20, 2024
Response Date: April 20, 2024

Related Announcements

NOT-LM-21-005

Issued by

National Institutes of Health (NIH)

Purpose

The purpose of the Request for Information (RFI) is to solicit public input on 1) a set of minimum core common data elements (CDEs) that would be used across all NIH funded/conducted clinical studies/trials and community-based research involving human participants; 2) additional CDEs for social determinants of health (SDoH) and clinical domains including autoimmune diseases and immune-mediated diseases; and 3) technologies, tools and policies that could facilitate the use of NIH CDEs.  NIH CDEs are defined as CDEs “recommended” or “required” by an NIH body, and/or found in the NIH CDE Repository. These RFI responses will be used to inform NIH’s continuing guidance on CDE use and assist in the planning for adequate resources for CDE implementation.

Background

CDEs are a type of data standard used for collection, comparable analysis, and exchange of data in biomedical research settings. CDEs are standardized, precisely defined questions paired with a set of specific allowable responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection (https://cde.nlm.nih.gov/home). They provide a common “language” for systematic and consistent capture of research data and routinely collected real-world data. CDEs can range from single data elements such as height and weight, to a bundle of questions that evaluate concepts such as depression and quality of life. A glossary of terms relevant to CDEs can be found on the RFI response website (https://datascience.nih.gov/cde-rfi) to provide further background on CDE use within NIH ecosystems.

Data consistency is a key factor contributing to its interoperability, which is one of the FAIR data principles guiding scientific data management and stewardship. Biomedical data are often collected in different ways for various study purposes, using different data models, which presents significant challenges for collaborative research, meta-analysis, and management/sharing of data. Use of CDEs makes health data “speak the same language” and become interoperable, both structurally and semantically. Since CDEs can be linked across common data models (CDMs) and standard vocabularies/terminologies used in healthcare, such as SNOMED CT, LOINC, RxNORM, and UMLS (among others catalogued in public repositories such as the National Library of Medicine’s Value Set Authority Center), they provide means to align clinical research studies with real-world data from electronic health records, healthcare coverage claims, patient-generated data streams, and patient-reported outcomes. CDEs can be expressed in machine computable formats (as defined in the Glossary) to enable mapping, transforming, and combining of existing data, and in turn, create big data resources by readily integrating data across disparate sources. Implementation of CDEs has potential to accelerate knowledge discovery by harnessing the power of innovative data methods such as machine learning and artificial intelligence.

Resources established by NIH cross-cutting initiatives such as the  Rapid Acceleration of Diagnostics (RADx) COVID-19 initiative, and the NIH CDE Repository have recently raised general awareness and facilitated use of CDEs in NIH intramural and extramural research communities. The successful adoption of CDEs in NIH institutes’ programs has accelerated the pace of new scientific breakthroughs. These resources also highlight the need to standardize a minimum core set of CDEs across NIH Institutes and Centers.

The NIH Scientific Data Council (SDC), an internal NIH committee made up of senior NIH Institute and Center (IC) leaders and data scientists, has established a governance process to designate CDEs that meet criteria (such as human & machine readability, semantically clear definitions of variable, measure prompt and response) as “NIH-endorsed” and publish them in the NIH CDE Repository, but no minimum core set of CDEs has been established for use across all clinical studies/trials supported or conducted by ICs.

Beyond NIH, a consortium of mental health research funders and journals has launched the Common Measures in Mental Health Science Initiative to identify common measures for mental health conditions that funders and journals can require all researchers to collect, in addition to any other measures they require for their specific study. For example, mCODE™ (Minimal Common Oncology Data Elements) allows oncology electronic health records (EHRs) exchange between health systems and enables comparative effectiveness analysis (CEA) of cancer treatments through assembling a core set of structured data elements. While the NCI is participating in this initiative in an attempt to harmonize cancer CDEs in EHRs and cancer research, without an effort to standardize a minimum core set of CDEs for use across the NIH, these and other important data initiatives miss the opportunity for data to be more easily integrated and analyzed.

The 21st Century Cures Act highlights “the need for a core set of common data elements and associated value sets.” Development of a core set of CDEs will greatly enhance data interoperability. Recently, the NIH SDC has directed a new CDE working group to provide recommendations on a consistent set of minimum core CDEs that could be utilized across NIH clinical research/trials. The minimum core CDEs would not preclude the use of additional CDEs that are specific for clinical studies/trials. Social determinants of health (SDoH) core CDEs have been identified as priorities, because of increased awareness that social, economic, and environmental factors influence health equity. This RFI seeks feedback on the development and implementation of CDEs including a set of minimal core CDEs across the NIH programs.

Despite all the efforts and progress, wide adoption of CDEs across various clinical domains is not without challenges. For example, the presence of numerous duplicative CDE sources in some clinical domains costs researchers extra time and effort in selecting the appropriate CDEs for use, especially when looking to integrate responses with real-world data. Technologies and tools are needed to map CDEs, to transform data, and to align CDEs with controlled vocabularies, terminologies, and existing data management systems. This RFI is also an NIH effort to understand these challenges and opportunities, to inform appropriate NIH guidance and mechanisms to lower the barriers to CDE use and improve the ability to aggregate and integrate CDE based data.

Note: Any Personally Identifiable Information or Protected Health Information will be restricted in its direct use to those interacting with participants (though aggregate-level measures may be derived for use in study datasets). All patient data to be used for study must be consented by the participant before the data can be used.

Information Requested
Specifically, NIH seeks comments on any or all of the following topics:

1. Recommended CDEs for NIH-funded clinical research/trials, including a set of minimal core CDEs. 
Development of CDEs will facilitate data interoperability across NIH programs. Due to the heterogeneous nature of the data collected in various clinical domains, one viable approach to determining a set of recommended CDEs is using CDEs that are important for identifying cohorts for study, e.g., in categories (akin to Classification Schemes as outlined in International Standard 11179 where questions of a similar nature are grouped together). This approach allows more detailed, study-specific data elements to be added in each category as needed. 

NIH is seeking comments on a set of minimum core CDEs in the demographics/personal characteristics category. We are also seeking comments on recommended CDEs in the clinical domains including autoimmune diseases and immune-mediated diseases, and high level (potential screening-purpose) CDEs for the SDoH domain as shown below.
 

  • Minimal core CDEs required for all NIH-funded clinical research/trials 

    Category: Demographics/personal characteristics
    • Age at enrollment (or at consent)
    • Gender
    • Sex assigned at birth
    • Race/ethnicity (based on current OMB definitions)
    • Disability status (CDC/ The Americans with Disabilities Act definition)
  • CDEs for autoimmune diseases
     
  • CDEs for immune-mediated diseases
     
  • CDEs for clinical and/or research domains in categories 
    Some examples of the categories, beyond those above, are Allergies, Adverse Events, Biospecimens, Clinical Tests, Informed Consent, Demographics, Diagnosis, Enrollment, Equipment, Health Assessments, Vital Status, Genomics, Imaging, Immunizations, Laboratory Tests, Language, Marital Status, Medical History, Medications, Patient/Person, Outcomes (including patient reported), Procedures, Treatment.
     
  • High-level CDEs for SDoH domain
    High-level CDEs are an approach for structuring a question to minimize the burden of data collection. This approach still captures vital information about the social and environmental factors recognized as important to assessing SDoH. It bundles a number of factors that might be asked individually into one high-level question and the questions would be asked with a specific set of permitted responses. Some examples of high-level SDoH CDEs are shown below.
     
    1. Would you say that your life has been impacted adversely (negatively, badly) by any of the following (current or past)?  Please check all that apply.
      • Adverse or traumatic childhood events (negative life events)
      • Racism or any other form of discrimination
      • Stress of poverty
      • Belief you cannot do well because of economic or social stressors or lack of opportunities
      • Lack of support from family and/or friends (loneliness)
      • Toxin or pollution in your neighborhood environment
      • Lack of access to affordable quality health care
      • Violent or unsafe neighborhoods
      • Fear of authorities (police, immigration, employer, landlords, etc.)
      • Physical, mental, or emotional abuse (actual or fear of such abuse)
    2. In the past year, have you – or any family members you live with – been unable to get any of the following when it was really needed? Select all that apply.
      • Childcare
      • Clothing
      • Food
      • Housing
      • Internet/Broadband
      • Phone (e.g., mobile or landline)
      • Transportation (e.g., private or public)
      • Utilities (e.g., gas, electric, propane, natural gas, water/sewer/septic)
      • Medicine or any health care (medical, dental, mental health, vision)
      • Other/specify:  __________________________

2. Technology standards for using NIH CDEs. NIH seeks broad input on tools and technologies that could enhance the use of NIH CDEs. NIH CDEs are defined as CDEs “recommended” or “required” by an NIH body, and/or found in the NIH CDE Repository.

  • For those who have their own recommended variables and values, please suggest types of support needed to map these data elements, including definitions, and allowed values or vocabularies, to existing NIH CDEs.
  • Specifically, how NIH could  facilitate access to authoritative and validated ontologies/crosswalks between the commonly used healthcare and terminology standards such as SNOMED CT, ICD, LOINC, and drug terminologies; and how these may align with efforts to make healthcare data readily integrated with CDE-based research by use of such  standards like the Health Level Seven International (HL7®) Fast Healthcare Interoperability Resources (FHIR®) standard.

3. NIH policies and governance on CDEs. NIH seeks input on policies and governance that could facilitate and incentivize broader CDE usage in research and in data sharing and management. Please provide your feedback on:

Glossary

TermDefinitionReferences
Clinical Research

As defined by NIH, clinical research is human subjects research, including:

  • Patient-oriented research — research conducted with human subjects (or on material of human origin such as tissues, specimens, and cognitive phenomena) for which an investigator directly interacts with human subjects
  • Epidemiological and behavioral studies
  • Outcomes research and health services research
https://orwh.od.nih.gov/womens-health-equity-inclusion/clinical-research-and-trials
Bundled Set of Questions, or BundleSometimes multiple questions or variables are used as a set, for research or clinical reasons. Some of these sets are called "bundled sets of questions" or "bundles," and are considered indivisible. I.e., the questions/variables in a bundle are not considered valid if used individually. 

Example: the PHQ-9 is a validated survey instrument used to measure severity of depression. PHQ-9 comprises 9 questions and a score, to indicate clinically recognized levels of depression.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1495268/ & https://cde.nlm.nih.gov/guides
Classification Scheme“descriptive information for an arrangement of objects […] into groups based on criteria such as characteristics […] which the objects have in common. Note 1: A classification scheme is a concept system that is used for classifying some objects.” Per ISO/IEC standard 11179 on Metadata Registries, which provides a foundation for a conceptual understanding of metadata and metadata registries. Corresponds with ‘Classification’ as used by groups submitting novel CDEs (objects) to the NIH CDE Repository. These are helpful to searching user-needed CDEs within the NIH CDE Repository and organizing within collections.ISO 11179 
https://cde.nlm.nih.gov/guides
CollectionA set of CDEs developed for a particular research project or purpose. Collections may be submitted for consideration for NIH-wide endorsement all at once, or piecemeal, by a recognized NIH body (Institute, Center, Office, or [Program/Project] Committee). 
Common Data ElementA standardized, precisely defined question, paired with a set of specific allowable responses, or specified format for responses, used systematically across different sites, studies, or clinical trials to ensure consistent data collection, thus enabling integration and meta-analysis of data across multiple studies or data sources.https://grants.nih.gov/grants/guide/notice-files/NOT-LM-21-005.html | https://www.nlm.nih.gov/oet/ed/cde/tutorial/index.html | https://cde.nlm.nih.gov/videos
Data Element Concept (DEC)A concept associated with a CDE. In a controlled vocabulary a concept is mapped to one or more of the words that convey its meaning.

Example: the UMLS Metathesaurus contains 4.4 million concepts. The link at right is to the concept for Age. Similar resources, such as NCI Thesaurus, could be used as well; for researchers whose trials fall under regulatory purview, such resources help mapping trial variables that are also CDEs to Therapeutic Areas within the Clinical Data Interchange Standards Consortium Terminology (CDISC) standards.
https://uts.nlm.nih.gov/uts/umls/concept/C0001779
Machine readableRefers to the format of the CDE such that a machine or computer system can process the data. This is particularly relevant to the submission format for CDEs to the NIH CDE Repository, so it can be readily assessed as meeting NIH Scientific Data Council criteria for endorsement NIH-wide by the CDE Governance Committee. Collections of CDEs whose metadata are sufficiently annotated in machine readable formats are thus considered machine computablehttps://cde.nlm.nih.gov/guides
MeasureA standard way of capturing data on a certain characteristic of a study subject.<in CDE Repository guidance>  https://cde.nlm.nih.gov/guides#nih-endorsement-and-submissions
NCI ThesaurusNCI Thesaurus (NCIT) provides reference terminology for many NCI and other systems. It covers vocabulary for clinical care, translational and basic research, and public information and administrative activitieshttps://ncithesaurus.nci.nih.gov/ncitbrowser/
NIH Common Data Element (CDE) RepositoryNIH Common Data Element (CDE) Repository, hosted and maintained by NLM, provides access to structured human- and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes, since being launched in 2015.https://cde.nlm.nih.gov/about
Unified Medical Language System (UMLS)A set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems. Hosted by NLM.https://www.nlm.nih.gov/research/umls/index.html
Variable

The underlying construct tied to a given CDE’s defined question, measure with responses to the defined prompt.

Example: the variable itself may be captured more than once within a given study, such as with height of pediatric participants over time. In this example, “What is the patient’s height?” is a question; its allowable response might be, “whole number using centimeters as the unit of measure”.

https://cde.nlm.nih.gov/guides

Submitting a Response

Comments should be submitted electronically to this webpage, or submit a PDF response by email to cde-rfi@od.nih.gov. To ensure consideration, responses must be submitted by 11:59:59 pm (ET) on April 20, 2024. Responses to this RFI are voluntary and may be submitted anonymously. You may voluntarily include your name and contact information with your response. If you choose to provide NIH with this information, NIH will not share your name and contact information outside of NIH unless required by law.  Responses from professional organizations are welcome and encouraged.

This RFI is for informational and planning purposes only and is not a solicitation for applications or an obligation on the part of the Government to provide support for any ideas identified in response to it. Please note that the Government will not pay for the preparation of any information submitted or for use of that information.

Responses may be compiled and shared publicly as unedited version in an anonymous manner after the close of the comment period. Please do not include any proprietary, classified, confidential, or sensitive information in your response. The Government reserves the right to use any non-proprietary technical information on public websites, in reports, in summaries of the state of the science, in any possible resultant solicitation(s), grant(s), or cooperative agreement(s), or in the development of future funding opportunity announcements. The NIH may use information gathered by this RFI to inform development of future guidance and policy directions.

We look forward to your input and hope you will share this RFI with your colleagues.

I am responding to this RFI:
*If submitting comments on behalf of another individual, please submit the name and function of that other individual.

Inquiries

Please direct all inquiries to:
Belinda Seto, Ph.D.
Office of Data Science Strategy
National Institutes of Health
Email: cde-rfi@od.nih.gov

CAPTCHA
14 + 4 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

This page last reviewed on February 20, 2024