Since January 2017, I’ve held the title of interim Associate Director of Data Science for NIH. I pronounce this “i-ADDS” (a bit like a five-year-old’s claim of mathematical prowess: “I adds”). Taking over from Phil Bourne, the first NIH ADDS, I am crafting a role in transition, one with a heavy responsibility, great-but-unspecified accountabilities, and no job description.
It’s a tricky spot to be in.
As I move DataScience@NIH forward, I am constantly checking with colleagues and the NIH leadership to ensure I don’t get too far ahead of them in vision or commitment.
I work very closely with DPCPSI (the NIH Division of Program Coordination, Planning, and Strategic Initiatives) as they oversee the BD2K program and the various pilot studies designed to inform future infrastructure investments and research directions for data science at NIH. Together with the Scientific Data Council, Andrea Norris (Chief Information Officer for NIH), and Betsy Wilder (head of the NIH Common Fund), I work to keep the communications lines open, ensure non-duplication of efforts, and move toward generalizable solutions. And I collaborate with the IC Directors across NIH to find solutions to data science challenges and to take advantage of current opportunities in ways that balance institute-specific considerations with enterprise-wide strategies.
I am also working to put into place the tools for our next steps.
I believe we need forecasting tools to help us gauge how long a high-value data set will remain so and at what cost. Through discussions with archivists and data scientists, I am beginning to shape a vision for data storage that is highly distributed and allows for varying investments in curation and maintenance—to ensure we can do what’s needed to preserve high-value data while making appropriate, if more modest investments, in storing and curating important-but-lessor-valued data sets. I am mentally knitting together the experience of the NLM in indexing, cataloging, and curation with the knowledge gained through the BD2K Centers’ efforts to make data discoverable and interoperable. I am fostering an exploration of the ethical dimensions of data-driven discovery with attention to simultaneously preserving the privacy of individual participant data, ensuring shareability under appropriate constraints, and respecting the rights of investigators to explore ideas without undue surveillance or premature disclosure of their thinking.
Mostly I lead from the side—through conversations, speeches, and social media. I listen to the data science challenges faced by NIH colleagues and scientists around the world. I write this blog every week, so I can share my thoughts with you and be guided by your reactions. I reach out to, or respond to invitations from, those responsible for data science in sister federal agencies such as NOAA or NSF. Over the past 12 weeks I have presented over a dozen talks to such diverse groups as the Alliance of Genome Resources and the Interagency Modeling and Analysis Group. I’ve learned of the advances in curation and discoverability made by some groups and tried to increase the understanding of data science’s potential for discovery without overselling it.
Across my speaking engagements, I’m delivering three key messages:
- Data Science is an emerging scientific, methodological, and computational arena that promises to accelerate discovery and make maximum use of data. Note the word “emerging.” No one has the final answer or the final rules. We’re on a journey together.
- We can’t preserve all the data we are generating, so we must make smart choices about what data to preserve, where, and for how long.
- Data science methods aren’t simply statistics on steroids. They arise from a unique, but complementary, philosophical perspective and a different mathematical foundation, both of which are intended to maximize discovery from data subject to constraints that typical statistical approaches are not intended to handle (e.g., data not distributed normally, incomplete data sets, data that stress the central tendency theorem beyond its limits).
What do you think an iADDS should be doing? Let me know if there are ways we can work together!