The NIH Data Sharing Policy calls for the expedited translation of NIH-funded research results into knowledge, products, and procedures to improve human health. Indeed, published studies offer useful, free information for researchers, but these data sets can be hard to find or browse. To address this need, the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) launched its Data and Specimen Hub (DASH) in August 2015.
NICHD DASH shares with the research community data from completed NICHD-funded studies that vary in population, study type, data type, or disease/syndrome. It stores individual-level participant data that have all 18 HIPAA identifiers removed, enabling investigators to browse, access, and analyze information for use in secondary research. DASH not only makes these data freely available in a central repository, it helps NICHD to maximize its investments in studies on HIV/AIDS, pregnancy, infant care, child health, and other topics.
However, not all was smooth as we developed this tool. It was a long road that presented several challenges along the way. Below are a few examples of these issues (and what we did to overcome them), which we hope helps to inform others as they develop their own data-sharing repositories.
Obtaining IRB approval for data sharing
To share study data through DASH, an Institutional Review Board (IRB) or a Privacy Board must formally attest the data to be shared is consistent with the informed consent from participants in the original study. This is challenging for multi-site studies and for completed studies in which the study IRB is no longer active . To address this, the DASH Committee developed guidance and recommendations, including advice for IRBs to consider on how data are internally managed. For example, IRBs may provide an exemption of IRB review when they consider that DASH data are completely de-identified. All data requesters must execute a Data Use Agreement with NICHD and provide local IRB approval in certain cases (if required by the data submitters).
For completed projects in which the IRBs are no longer active, the DASH Committee recommends investigators consider obtaining approval from the NICHD IRB, the NIH Human Research Protections Program, or an external commercial IRB for the necessary reviews or exemptions. On the other hand, for prospective or ongoing studies, the committee encourages investigators to ensure that language for broad data sharing is included when developing informed consent forms. The committee also suggests the principal investigators of the original study work closely with their IRB before the study closes to ensure data sharing is in line with informed consent procedures and data sharing/use agreements.
Ensuring data are organized and ready for meaningful reuse
The concept of data sharing through a centralized repository is still relatively new, and researchers are not necessarily familiar with the process or elements required to facilitate meaningful reuse of their data. Some of the common issues encountered with the data submitted to DASH include missing or incomplete study documentation, missing study schema (data collection schedule and milestones), data stored in a proprietary or unique software format, and incomplete de-identification of the 18 HIPAA identifiers.
The DASH Committee addressed these challenges by providing tools and resources to help researchers prepare data and research documentation for sharing. For example, the committee recommends standard organizational elements for study documentation (such as the study protocol and data dictionary) that are required for accurate interpretation and meaningful reuse of the data. DASH also provides guidance on data organization and formats, data de-identification, and coding and has developed an offline data preparation tool to annotate (for easy search and discovery) and package the data for submission.
Sharing biospecimens linked to data
Linking biospecimens with data is critical for maximizing reuse of biospecimens collected during the study. Unlike data, biospecimens are a finite resource, but sharing biospecimens generates additional challenges around participant privacy, specimen inventory management, and meaningful reuse of the specimens.
The challenges and the solutions associated with obtaining IRB approval for specimen sharing are like that of data as discussed above, but there are other challenges unique to biospecimens. For example, it is imperative that essential information about the specimen, including the type, date collected, quality, and the amount sent to the repository are accurately captured in the biospecimen inventory provided through DASH. Furthermore, for completed studies in which the biospecimens were stored in the biorepository while the study was active and/or prior to any plans to share data, linking the biospecimens with the same de-identified code as in the data has been a challenge.
Such linking must be performed retroactively by the investigators or by DASH staff, and the specimens must be relabeled prior to shipping. For active or prospective studies, investigators can ensure that specimens and data are coded with the same unique, de-identified code for sharing purposes prior to storing the specimens in the biorepository.
Raising awareness of and building support for DASH
Awareness and adoption of DASH by the research community as a data-sharing mechanism continues to require a concerted effort to ensure that stakeholders are aware of this resource. This promotion was spearheaded by a governance committee established during the planning phase of DASH to oversee its implementation, operations, and performance, especially for data submission and access.
The goal of the governance committee is to ensure the repository is effective, sustainable, and impactful. Promoting adoption of DASH requires engaging the research community early to build consensus around unmet needs. This means identifying and developing strategies around potential benefits and challenges relevant to them. The DASH stakeholder communications plan includes tactical approaches to ensure researchers understand the purpose of DASH, know how to use it, and realize its benefits, specifically as a mechanism for NICHD-funded investigators to comply with NIH’s Data Sharing Policy.
While we are encouraged by DASH’s initial reception by the research community, we realize there is much more to accomplish in developing this tool. Indeed, sharing data helps derive the greatest benefit from research investments, which in turn boosts their potential impact on public health. Secondary use and analysis of the data collected in NICHD-sponsored studies will result in new discoveries that may not even be imagined at the present time. We are confident NICHD DASH will become even more valuable in the months and years ahead as we add more study data from our broad and diverse research portfolio.
About the Authors:
Dr. Diana Bianchi is the director of the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), part of the National Institutes of Health. In this role, she oversees the institute's research on pediatric health and development, maternal health, reproductive health, intellectual and developmental disabilities, and rehabilitation medicine, among other areas.
Dr. Bianchi completed her residency training in pediatrics at the Children’s Hospital, Boston, and her postdoctoral fellowship training in medical genetics and neonatal-perinatal medicine, both at Harvard. She is board-certified in all three specialties and is a practicing medical geneticist with special expertise in reproductive genetics. Her translational research focuses on two broad themes: prenatal genomics with the goal of advancing noninvasive prenatal DNA screening and diagnosis, and investigating the fetal transcriptome to develop new therapies for genetic disorders that can be given prenatally. She can be reached at email@example.com.
Dr. Rohan Hazra is the chief of NICHD’s Maternal and Pediatric Infectious Disease Branch at the Eunice Kennedy Shriver National Institute of Child Health and Human Development. He oversees the Pediatric HIV AIDS Cohort Study, a multicenter U.S.-based program that follows perinatally HIV-infected youth and HIV-exposed/uninfected infants, children, and youth. He also is actively involved in pediatric HIV clinical trials and other observational studies in the United States and abroad.
Dr. Hazra's research interests include studying the long-term impact of HIV and its treatment on children, adolescents, and young adults who were infected with HIV as infants. In addition, he continues to be involved in studies evaluating new antiretroviral medications and treatment strategies in HIV-infected children, especially in resource-limited countries. He can be reached at firstname.lastname@example.org.