A physician-epidemiologist, I arrived at NIH last year after more than 22 years as a professor of population medicine and nutrition at Harvard University. During those years I worked on several large cohort studies and clinical trials that gave me insight into the importance of data quality—and in some cases quantity—for answering meaningful questions about health across the life course.
I was recruited by NIH to direct the nationwide Environmental influences on Child Health Outcomes (ECHO) program, the mission of which is to enhance the health of children for generations to come. In this new initiative, we face many challenges to ensure data quality and quantity—but tackling challenges like those is one of the reasons I love my job.
ECHO’s observational component (ECHO also contains a clinical trials network) is knitting data from 83 existing, ongoing cohort studies of mothers and children into a single ECHO-wide cohort comprising ~50,000 kids. Leveraging the large number of multiple cohorts will allow us to answer important questions about how a broad range of early environmental factors — including geographic, physical, chemical, social, behavioral, and biological—affect child health. These cohort studies focus on four key pediatric outcomes that have a high public health impact: pre-, peri-, and postnatal outcomes; upper and lower airway conditions; obesity; and neurodevelopment. There’s an innovative fifth outcome, positive health, which reflects the positive attributes of healthy growth and development.
In putting together the ECHO-wide cohort, we’re paying attention to the whole life cycle of data, from collection to curation, cleaning, harmonization, management, access, and long-term storage. Our overarching challenge is how to integrate existing and new data from cohorts that started at different times in the life course, in different decades, with different data collection approaches and from different data sources, so that the whole is greater than the sum of the parts. And, yes, that is a complicated process.
One of our first efforts is to have the cohorts collect new data in as standardized a fashion as possible. “As possible” is the operative phrase, given the cohorts’ different populations, foci, and cultures. Therefore, the ECHO-wide Data Collection Protocol contains not only essential and recommended data elements (for each of several life course periods), but also preferred and acceptable measures for each element.
Using the expertise of our cohort investigators and Person-Reported Outcomes Core, ECHO is making every effort on the front end, through existing evidence and validation studies, to ensure that cohorts collect harmonizable data. On the back end, our Data Analysis Center is creating novel approaches, e.g., accounting for missing data for missing data, in order to harmonize in the analysis phase.
Employing our guiding principles of teamwork, impact, responsibility, and value, and operating within NIH guidelines, the ECHO investigators also developed and ratified data sharing and biospecimen use policies. These policies will facilitate widespread use of the ECHO-wide cohort data and at the same time ensure security of access so that data analysts preserve privacy and confidentiality.
Working in concert with these two policies is our publications policy, which specifies how investigators propose and approve ideas for analyses, including appropriate attention to equity, validity, and reproducibility, which accompany cloud-based tools that limit data to only those needed for the analysis. Striking the right balance between the benefits of broad usage, on the one hand, and disclosure, conflict of interest, and other risks on the other, is an ongoing concern, especially given that ECHO is forging new territory in many of these areas. That’s why we’re committed to regular evaluations of the protocol, policies, and strategies for protection of human subjects, with course corrections as needed.
Speaking of data analyses, we recognize that the main ECHO cohorts address etiologic questions rather than prediction. In other words, “to what extent—and how—does x cause y?,” rather than, “can I predict y using values of a, b, c, …and x?” That means that conceptual models, taking care to distinguish confounders, mediators, and moderators, are crucial.
In addition, data analyses from the ECHO-wide cohort face another threat to validity that researchers rarely have to consider. Every cohort has a different source population and different inclusion/exclusion criteria. That means that selection factors differ across the cohorts, and one can’t be certain that data from one are easily combinable with others. Just another challenge for our crack Data Analysis Center.
In ECHO, we are just starting to amass the large amount of observational data that we anticipate will ultimately inform practices, programs, and policies to improve the health of children and adolescents. We haven’t yet tackled the vital issue of what happens to all of these data down the road.
Given that ECHO is slated for seven years, and we’ve just entered the second year, we have some time to figure that out. But not too much time, as planning ahead—for all phases of the life cycle of a datum—is the name of the game.
I hope that you now appreciate some of the ways that ECHO aims to be a model of how to maximize the value of data from a large, multi-cohort consortium to the scientific and general community. I invite you to learn more about the ECHO program.
About the Author:
Matthew W. Gillman, M.D. joined the National Institutes of Health on July 5, 2016 as the inaugural director of the Environmental influences on Child Health Outcomes (ECHO) Program. As a nationwide research program, ECHO aims to conduct impactful observational and intervention studies to assess the effects of a broad array of early environmental influences on child health and development. Dr. Gillman joined NIH from Harvard Medical School where he was a professor of population medicine and a professor of nutrition at Harvard School of Public Health. His background is in the fields of epidemiology, pediatrics, and internal medicine. He has extensive experience with cohort studies, having served as an investigator on several large, high-profile studies such as Project Viva, the Growing Up Today Study, PROBIT, and the Framingham Heart Study.