Data, data, everywhere, but not a bit to share…
While a “bit” of an exaggeration, this twist on a common poetic phrase reflects statements about the difficulties of sharing data at a recent scientific meeting. The desire to share the ever-expanding proliferation of research Digital Objects was one of the key drivers behind the meeting, “Managing Digital Research Objects in an Expanding Science Ecosystem,” held November 15, 2017. The National Library of Medicine (NLM) hosted the meeting at its Lister Hill Auditorium. The session was co-sponsored by Commerce, Energy, NASA, Defense Information (CENDI); The National Academies of Sciences, Engineering, and Medicine (NASEM); National Federation of Advanced Information Services (NFAIS); and Research Data Alliance (RDA). It brought together over 80 participants from academia, government agencies, non-profit organizations, and industry. (Check out the meeting agenda and conference presentations.)
So, what are digital objects? Peter Wittenburg, RDA Director Europe, Max Planck Computing and Data Facility, defined Digital Objects as “‘meaningful entities’ existing in the digital domain of bits” and “the ‘atoms’ of our digital domain, since it makes sense to associate relevant characteristics with them.” Examples include digital datasets, publications, software, metadata, collections, queries, etc.
Digital objects are everywhere. For example, streams of high-resolution data are being generated continually with digital sensors across the Internet of Things. Researchers are also generating huge volumes of primary and derived data, as well as models, and other objects that ultimately course through the ecosystem of science and scholarship. A common theme echoed throughout the meeting is that, while researchers spend 75-80% of their time managing data, sharing this digital data and other digital objects is often difficult. Various speakers suggested many reasons for this difficulty, including:
- lack of infrastructure, tools, and training necessary to deposit and curate digital objects;
- the difficulty and expense of producing and managing consistent metadata; and
- insufficient incentives for sharing data and other digital objects.
Fundamental to discussions of sharing data and other digital objects are the topics of metadata and persistent identifiers (PIDs), which received a great deal of attention at this meeting. Basically, metadata is information about the digital object; something that helps describe or locate data, and is not the object, itself. Think card catalog to books. PID is a long-lasting reference to a digital object that typically includes a unique identifier and way to locate the object over time even if the location changes. Think Social Security Number to a person.
Sharing research data is much easier with clearly established metadata and PIDs. Todd Carpenter, executive director of the National Information Standards Organization, stated, “Data is useless unless you can do something with it. That is why metadata is so important.” Patricia Cruse, executive director of DataCite, further made the analogy to our current research practices, “We are building roads from A to B, and we are starting over every time.” Thus, researchers, universities, non-profit organizations, industry, and funders need to work collectively to establish common metadata standards, PIDs, and digital infrastructures that other researchers can use in the future.
The meeting also included discussions of data markets, secure transaction of information, smart contracts, FAIR principles, semantic mapping, global digital object clouds, data repositories, data registries, data lakes, digital package object, and cyberinfrastructure (to name a few topics), each of which could be its own topic of conversation but a bit too unwieldy for this blog.
My key take-away from the meeting: Funders, non-profit organizations, professional societies, and industry are increasingly thinking about how to leverage digital objects (e.g., research data). This is the wave of the future and understanding concepts about data sharing will be important for future endeavors. Having a data-sharing expert on your research team, collaborating with others with data-sharing expertise, or hiring third-party service providers with that expertise would be advantageous as the science ecosystem moves increasingly toward preserving and sharing digital objects.
The NIH is funding an effort to develop and populate with data an NIH Data Commons. NIH has also established data-sharing policies for genomic research data and has issued its plan for increasing access to the full spectrum of biomedical research data produced with NIH funding. Other large-scale NIH initiatives, such as the All Of Us Research Program, are developing ways for patients and the general population to share digital clinical, health, and behavioral information with biomedical researchers. Furthermore, NLM/NIH has long established policies on public access to digital publications for NIH-funded research. To be sure, the creation of metadata and PIDs to facilitate the sharing of research data is becoming increasingly important for biomedical researchers to understand.
But, as many speakers noted, the digital world is a complex landscape, and we have a while to go before digital + data sharing are the default procedures for research.
Much like the desalination of salt water, it will take innovative technology, a clear value proposition/business model, and significant investment to establish and scale the sharing of biomedical research digital objects.
About the Author:
Dr. Audie Atienza is a contractor for the National Library of Medicine and senior fellow at ICF. Dr. Atienza previously served as a program director at the National Cancer Institute, senior advisor to the Chief Technology Officer of the Department of Health and Human Services, and senior advisor to the Associate Director for Data Science of the National Institutes of Health. He led several health technology initiatives for NIH/HHS including Apps Against Abuse developer challenge, U.S. Surgeon General’s Healthy App Challenge , and Open Science Prize. Dr. Atienza has published extensively on technology and health, real-time data capture, mobile health, and innovative research methods.