NIH Workshop on the Policy and Ethics of Record Linkage: Workshop Summary


The NIH Workshop on the Policy and Ethics of Record Linkage highlighted the community’s considerations of record linkage for biomedical research with specific focus on privacy and security, participant perspectives, and ethical issues. Discussions highlighted benefits, concerns, and issues related to record linkage and called for further development of policies that would protect the rights of individuals and communities, while continuing to enable research.

Presentations and recordings from this workshop are available.

Watch Day 1  Watch Day 2  


The NIH hosted a workshop June 29-30, 2021, with more than 100 participants. The workshop’s goal was to listen to researcher, policy, and research participant perspectives about the benefits and challenges of record linkage for the biomedical research in the areas of privacy and security, participant perspectives, and related ethical issues.

Susan Gregurick, Ph.D., NIH associate director for data science and director of the Office of Data Science Strategy, opened the meeting emphasizing the importance of hearing from the community and highlighting that the workshop discussions and a related NIH Request for Information on "Streamlining Access to Controlled Data from NIH Data Repositories" will inform the NIH on technical and policy issues.

Chair Overview: Introduction Ethical and Policy Considerations in Record Linkage for Biomedical Research

The workshop was chaired by Pilar Ossorio, Ph.D., professor of law and bioethics, University of Wisconsin-Madison Law School, who provided an overview of record linkage.

  1. Background on Record Linkage: Dr. Ossorio highlighted that record linkage means bringing together information about a person from two or more data sources. Record linkage has occurred for decades (e.g., linking survey responses to government administrative records and imputing missing data for data analyses). Dr. Ossorio discussed how data bought and sold by commercial entities are routinely linked, but that “does not mean we should do it the same way or to the same extent.”
  2. Potential Benefits of Record Linkage: Dr. Ossorio listed potential benefits of record linkage including (1) creating richer data about persons, families, places, and events; (2) improving research quality; (3) better defining sampling frame to allow for validation and imputation; and (4) decreasing research burden on patients, leading to shorter surveys, fewer measures, and fewer study visits.
  3. Concerns Raised on Record Linkage: Dr. Ossorio highlighted participant record linkage concerns including loss of autonomy, dignity, sovereignty, privacy, and researcher trustworthiness. Additionally, she discussed how record linkage can create concerns about individual privacy risk or community-level disclosure risks. Specifically, record linkage can be exploitative and disrespectful to individuals or communities, especially if used without informed consent, appropriate community engagement, and/or appropriate governance.
  4. Points to Consider: Dr. Ossorio noted that data linkage can raise policy and ethical issues, especially in contexts in which people never anticipated their information captured throughout clinical care would be used in research. Additionally, she noted the need to understand when, how, and what policy implications of how linked data should be used in analyses when considering federating data choices. She also noted that although unique identifiers raise privacy concerns to participants, methods not relying on unique identifiers are more error prone.

Dr. Ossorio stated that the pandemic has created a context for justifying actions for research and that researchers may want to continue many of these actions. Dr. Ossorio called for linkage standards, reports on linkage approaches, and transparency on linkage mistakes that may raise social justice issues. She concluded by listing out several ethical issues, privacy and security concerns, and community engagement considerations for record linkage.

Pilot Projects and Record Linkage Considerations

Three presentations provided examples and a landscape assessment of record linkage.

What is Privacy Preserving Record Linkage?

Alastair Thomson, chief information officer at the National Heart, Lung, and Blood Institute discussed record linkage’s importance in de-duplicating records and connecting data across repositories. He provided descriptions of record linkage methods, their benefits, and their tradeoffs. As an example, he described privacy-protected record linkage[1] as a series of trade-offs between matching quality or potential bias; computational complexity; and risk of privacy breach, or trust. He noted that risks exist for introducing bias that require active steps to reduce to those biases.

Privacy Preserving Record Linkage and National COVID Cohort Collaborative (N3C)

Sam Michael, chief information officer of the National Center for Advancing Translational Sciences, overviewed the National COVID Cohort Collaborative (N3C). N3C is a centralized clinical data enclave and analytics platform consisting of records drawn from a subset of electronic health records (EHR) data from millions of patients and additional data sets. Michael demonstrated linkage of imaging data from The Cancer Imaging Archive and a subset of EHR data linkage that occurs in N3C; tracing from the data access request to the receipt of EHR and imaging data into the researcher’s private workspace. Michael noted the importance of being cognizant of the need to “protect the patient privacy, and their integrity.”

Privacy Preserving Patient Record Linkage (P3RL) Software Landscape and Assessment: Evaluation Process and Lessons Learned

Lynne Penberthy, M.D., associate director of the Surveillance Research Program at the National Cancer Institute, presented a landscape analysis and evaluation of P3RL software packages. She presented an assessment of 52 P3RL software packages, in which each of the P3RL software packages was tested on gold-standard data. Lessons learned included methods used by vendors, mission of the vendors, the ability for partial customization of matching process, and that use of any specific P3RL may be use case dependent.

Thomson, Michael, and Dr. Penberthy then responded to attendees’ questions about data use agreements for different data types and discussed the importance of having an adjudication process with federated systems when linking data.

Day 1: Privacy and Security of Record Linkage Session

Anthony Solomonides, Ph.D., program director for Outcomes Research and Biomedical Informatics at Northshore University Health System, provided an overview that included the history of record linkage and security. He also posed questions on ethical decision-making for data ownership and control, consent, and data privacy.

Following the presentation, workshop participants met in breakout groups to discuss privacy and security of record linkage. Upon returning to the main session, Dr. Solomonides led the report out of breakout group discussions. Key points included:

  • Research participants have the right to know how their data is being used.
  • There is importance in notifying individuals, especially non-consented individuals, when their data is linked.
  • Have an equity lens, as not all groups and communities equally experience privacy and security issues.
  • Consider the potential for incorrect and invalid linkages that can disproportionately affect certain populations.
  • There is a need for clear principles and detailed ways for identifying and measuring benefits and risks for different communities, especially those who are underserved.

The day concluded with Dr. Gregurick summarizing the key points of discussions that day:

  • Record linkage has many approaches and can create an enriched picture of a person through data from many sources.
  • Benefits and risks of linkage exist and there are key considerations around the approach, purpose, and consequence of record linkage.
  • Actions need to be taken with respect to furthering data reuse, considering human aspects of privacy and security, and deploying linkage to improve characterization of individuals and populations, especially of those often underserved.

Day 2: Participant Perspectives on Record Linkage Session

Dr. Gregurick opened Day 2 of the workshop and introduced Sharon Terry, president and CEO of Genetic Alliance and the speaker for the participant perspectives session.

Terry shared how her personal experience of having her two children diagnosed in 1994 with pseudoxanthoma elasticum led to her involvement in medical research, including creating a system and participant-owned company where individuals can choose which data they want linked. She emphasized that “patients have a right to keep their data on a string” and that research incentives are not aligned with participant needs. She asked participants to consider, “How can we recognize that each person is the expert of their own risk-benefit analysis?”

Following this presentation, workshop participants met in breakout groups to discuss participant perspectives on record linkage. Upon returning to the main session, Terry led the breakout group report-out that conveyed the following key points:

  • Provide transparency to participants about the value and risks of data linkage and how their data may be used, including dissemination of information useful to participants.
  • Solicit participant, community, and advocacy group perspectives to understand their specific concerns around linking their data.
  • Give research participants the ability to opt-in and opt-out of record linkage.
  • Reconsent research participants, have processes to review linked data, and implement standards for research participants to identify their preferences.
  • Clearly communicate the ethical justifications to perform research without informed consent during a public health emergency.

Day 2: Ethical Issues Related to Record Linkage Session

Richard Sharp, M.D., director of the Biomedical Ethics Program, Center for Individualized Medicine Bioethics Program, and the Clinical and Translational Research Ethics Program at Mayo Clinic, discussed the importance of balancing enabling systems to advance knowledge, cures, and treatments without exposing patients to inappropriate risk. He proposed a minimization of risks to participants, authorization and distribution of health benefits, alignment of patient goals and data uses, alignment of community interests and data uses, changeable data use restrictions, and transparency when considering record linkage.

Following his presentation, workshop participants met in breakout groups to discuss ethical issues related to record linkage. Dr. Sharp led the breakout group report out that conveyed the following key points:

  • Have transparency, oversight, and governance for stewardship, transactions, and misuse.
  • Articulate the research purposes for data uses and return results to participants.
  • Create relationships and policies that enable data-sharing to streamline Institutional Review Boards (IRBs) requests and agreements.
  • Obtain informed consent for record linkages from the beginning of research participant engagement and perform record linkage in ways that protect privacy.
  • Protect groups and communities that are more vulnerable. Note that disparities will be difficult to identify if biases exist in the collected and linked data.

Panel Discussion

Spero Manson, Ph.D., professor of public health and psychiatry, director of Centers for American Indian and Alaska Native Health, and associate dean of research, Colorado School of Public Health, University of Colorado Denver’s Anschutz Medical Center, moderated a panel with Terry and Drs. Ossorio, Sharp, and Solomonides. The panelists discussed how to sustain an inclusive process, held conversations about the meaning of consent across constituents and research participant protections, and considered coordination including governance beyond the workshop.

The following key insights were conveyed by the panelists:

  • People and communities should be engaged in their local settings and should be allowed to determine their own priorities for records linkage (e.g., leveraging existing patient and family advisory councils in healthcare settings).
  • Research participants should be allowed to express their permission to modify their personal preferences for record linkage over time.
  • Research questions should use the terms “participant” or “human” rather than “subject.”
  • Record linkage is currently done without consent and authorization from participants. Research participants should have the right to express and have their preferences respected.

Patricia Flatley Brennan, R.N., Ph.D., director of the National Library of Medicine, closed the workshop by emphasizing that this workshop was the start of a conversation that will continue. She highlighted the importance of addressing benefits of record linkage, distinguishing the research community’s benefits and challenges with record linkage from those in clinical care, and creating an awareness that the commoditization of clinical data and the protectiveness of institutions may be inconsistent with the goals of biomedical research risks. She noted that the workshop participants were heard; helping people “understand both the costs and the benefits” of record linkage is important. Dr. Brennan called on the community to develop ethical frameworks for record linkage and for the NIH to adopt a strong model of ethical stewardship. She concluded by stressing that “we need to listen and walk together.”

Three Key Themes

1. The Potential of Record Linkage

The workshop highlighted uses, policy, and ethical issues related to record linkage. Properly deployed record linkages can create richer data about the human experiences of individuals, families, places, and events including those often underserved. Although the value of record linkage was not disputed by workshop participants, they encouraged that when used, considerations should be deliberate and transparent about the appropriateness of the circumstances for use and its use itself.

2. Participant and Community Perspectives

Throughout the workshop, the need to attend to participant and community perspectives for research was discussed. Repeated emphasis was placed on the importance of transparency and clarity, and that research participants should understand how, when, and why their data is used. Workshop participants emphasized the importance of protecting vulnerable and underserved communities and shared that improperly linked data may obscure the ability to identify disparities.

3. Governing Decisions

The workshop highlighted the need for metrics and guidance on record linkage as well as community engagement and partnership with research participants, communities, and institutions that govern research, including IRBs and the NIH. Additionally, issues as to whether one-time consent is ethical or sustainable, research participants should consent and be notified when records are linked, and the impact of invalid or incorrect linkages should be examined.

This workshop was organized on behalf of the Office of Data Science Strategy, the Office of Science Policy, and the National Library of Medicine by Dina Paltoo/NHLBI, Valentina Di Francesco/NHGRI, Taunton Paine/OSP, and Vivian Ota-Wang/ODSS. This effort was supported by the NIH Office of the Director as part of a larger initiative under the NIH’s response to the COVID-19 pandemic.


[1] Kho AN, Cashy JP, Jackson KL, Pah AR, Goel S, Boehnke J, Humphries JE, Kominers SD, Hota BN, Sims SA, Malin BA, French DD, Walunas TL, Meltzer DO, Kaleba EO, Jones RC, Galanter WL. Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc. 2015 Sep;22(5):1072-80. doi: 10.1093/jamia/ocv038. Epub 2015 Jun 23. PMID: 26104741; PMCID: PMC5009931.

This page last reviewed on March 23, 2023