Those of you born before 1980 or the inveterate fans of 1960s science fiction know what tribbles are. These small, soft, furry, purring space aliens appeared on a memorable 1967 episode of “Star Trek.”
The pleasure and comfort they bring to humans come with a significant downside: tribbles do nothing but eat and reproduce. Born pregnant and with no known predators, tribbles multiply exponentially, quickly consuming anything edible. Their care and feeding exhaust all available resources, leaving little reserve for other pursuits and other species.
Sounds a bit like research-generated data, doesn’t it?
Not long ago (but long after the first tribbles appeared on “Star Trek”) a research project produced a resolution to the hypothesis that initiated the project. Questions were posed, hypotheses generated, data acquired, analytics applied, and interpretation emerged. Done and done. But since the late 1990s or so, many research projects have also generated data sets that promise—or at least hint at—future discoveries.
Researchers the world over spend substantial funds curating and storing these data sets and ensuring they are FAIR (findable, accessible, interoperable, and re-usable). NIH alone spends hundreds of millions of dollars annually to ensure high-value data sets are securely stored and made available for future use and to foster among investigators an appreciation for and willingness to engage in data-driven discovery.
All of which leads me to conclude that data science can learn a lot from “Star Trek” and its tribble troubles.
Like tribbles, data are attractive and pleasing to many. Some species, like Klingons, abhor tribbles, and I suspect a few out there abhor data or see data science as another scientific fad. Just as tribbles and their perpetual offspring can place extreme demands on a system, so too can data sets, especially those left to grow without purpose or design that end up competing for scarce research dollars. (Fortunately, we’re not yet at the point of having to choose between new investigations and data-driven science, but that day may come.) And the solution to the Enterprise’s tribble infestation—transporting them to a nearby Klingon vessel—sounds a bit like the hope that “the cloud” will solve our data science challenges, when such a move solves only one problem and shifts the others to a new environment.
Despite those challenges, some data science zealots advocate replacing experimental science with data-driven discovery, but I recommend a more balanced approach. As the Federation’s Prime Directive makes clear, every species and society should be allowed to follow its normal cultural evolution, and, to me, data science is part of the evolution of scholarly discovery.
I invite you to partner with me and the NIH to ensure a principled approach to data science.
What steps should NIH take to make sure the “data tribbles” don’t crowd out the full range of discovery strategies needed to deliver the greatest benefit to society?