DataScience@NIH

Driving Discovery Through Data

5 Just What Does an iADDS Do?
Patti Brennan / 04.13.17

Since January 2017, I’ve held the title of interim Associate Director of Data Science for NIH. I pronounce this “i-ADDS” (a bit like a five-year-old’s claim of mathematical prowess: “I adds”). Taking over from Phil Bourne, the first NIH ADDS, I am crafting a role in transition, one with a heavy responsibility, great-but-unspecified accountabilities, and no job description. 

It’s a tricky spot to be in.

As I move DataScience@NIH forward, I am constantly checking with colleagues and the NIH leadership to ensure I don’t get too far ahead of them in vision or commitment.

I work very closely with DPCPSI (the NIH Division of Program Coordination, Planning, and Strategic Initiatives) as they oversee the BD2K program and the various pilot studies designed to inform future infrastructure investments and research directions for data science at NIH. Together with the Scientific Data Council, Andrea Norris (Chief Information Officer for NIH), and Betsy Wilder (head of the NIH Common Fund), I work to keep the communications lines open, ensure non-duplication of efforts, and move toward generalizable solutions. And I collaborate with the IC Directors across NIH to find solutions to data science challenges and to take advantage of current opportunities in ways that balance institute-specific considerations with enterprise-wide strategies. 

I am also working to put into place the tools for our next steps.

I believe we need forecasting tools to help us gauge how long a high-value data set will remain so and at what cost. Through discussions with archivists and data scientists, I am beginning to shape a vision for data storage that is highly distributed and allows for varying investments in curation and maintenance—to ensure we can do what’s needed to preserve high-value data while making appropriate, if more modest investments, in storing and curating important-but-lessor-valued data sets. I am mentally knitting together the experience of the NLM in indexing, cataloging, and curation with the knowledge gained through the BD2K Centers’ efforts to make data discoverable and interoperable. I am fostering an exploration of the ethical dimensions of data-driven discovery with attention to simultaneously preserving the privacy of individual participant data, ensuring shareability under appropriate constraints, and respecting the rights of investigators to explore ideas without undue surveillance or premature disclosure of their thinking.  

Mostly I lead from the side—through conversations, speeches, and social media. I listen to the data science challenges faced by NIH colleagues and scientists around the world. I write this blog every week, so I can share my thoughts with you and be guided by your reactions. I reach out to, or respond to invitations from, those responsible for data science in sister federal agencies such as NOAA or NSF. Over the past 12 weeks I have presented over a dozen talks to such diverse groups as the Alliance of Genome Resources and the Interagency Modeling and Analysis Group. I’ve learned of the advances in curation and discoverability made by some groups and tried to increase the understanding of data science’s potential for discovery without overselling it.

Across my speaking engagements, I’m delivering three key messages:

  1. Data Science is an emerging scientific, methodological, and computational arena that promises to accelerate discovery and make maximum use of data. Note the word “emerging.” No one has the final answer or the final rules. We’re on a journey together.
  2. We can’t preserve all the data we are generating, so we must make smart choices about what data to preserve, where, and for how long.
  3. Data science methods aren’t simply statistics on steroids. They arise from a unique, but complementary, philosophical perspective and a different mathematical foundation, both of which are intended to maximize discovery from data subject to constraints that typical statistical approaches are not intended to handle (e.g., data not distributed normally, incomplete data sets, data that stress the central tendency theorem beyond its limits).

What do you think an iADDS should be doing? Let me know if there are ways we can work together!

Comments

Perhaps the office could look at the value of data in the context of reproducibility - using the analogy of the seeds, the grain and the bread - once the seed is lost, no bread can be made...
Maybe instead of just consulting the archivists, an alternative framework could be designed to identify the seed and save the recipe to get bread from the grain... too many analogies but I hope my point is made - reproducible science is of higher value than archiving datasets.

PattiBrennan's picture

Thanks for sharing your insights – I do have to agree that we need a lot of ways to think about data – I like your analogy of seeds, grain and bread—I also like the models from radiology --- “lossy and loss-less translation” – we’ll be looking for lots of ways to make these complicated decisions!

Your article addresses different points of global relevance. Among them, the ethical dimension of data-driven research. You pointed out the need for achieving a balance between individual privacy and opportunities for research. Which type of data-driven solutions do you envisage to achieve such a balance? How can data scientists make a significant contribution to this debate? Thank you.

PattiBrennan's picture

.....Thank you for your comments. Indeed this is a multifaceted challenge - that is part of what makes it interesting!

I believe that the solutions (note that there needs to be many) will involve both protocol-driven computational solutions and human-mediated decisions. 

Data scientists can help shape this solution! We need to know how important provenance is in computational solutions. We also need them to engage in the debate of the overhead trade offs between permissions and ease of access. Finally, we need data scientists to develop skill in estimating the life cycle of data. 

Your thoughts?!

Patti: Looking forward to your ADDS interimship.
Much of what you say applies to practice-generated data, too. The need for indexing data objects, together with well-thought-out data retention policies (and appropriate storage technologies) may even be greater for data generated in practice. But foundational to it all has to be a new and strict approach to provenance metadata; the huge proliferation of data sources we have today – in practice at least – is coupled with a near total loss of provenance, making scientific re-use of “real-world” data an even more distant dream than before.
May I also recommend that, as iADDS, you take a unified, sociotechnical look at the data-generating & data-management infrastructures in research and practice as a whole; a lot of data value is lost at the source. As a related suggestion, perhaps bring in an economist to help analyze how much the nation looses due to discrepancies, inefficiencies, and incompatibilities at the interfaces between patient-level care, reimbursement, research, regulatory science, policy, etc..
You're off to a great start.

Add New Comment

Posting Calendar

September 2018

Sun Mon Tue Wed Thu Fri Sat
 
 
 
 
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30
 
 
 
 
 
 
 
Back to Top