Driving Discovery Through Data

0 A Case Study in NIH Data Science: Open Data and Understanding the Value of Libraries and Information Services in the Patient Care Setting
/ 11.02.17

These are heady times in biomedical data science. Promising research is underway on many topics, including the nexus of open data, libraries, and patient care.

As librarians and researchers, we had a specific query in mind. We were curious whether we could use data from a study not originally intended to examine the use of PubMed/MEDLINE, to discover where and how clinicians reported using that database as an information resource affecting a clinical decision.

The high use of PubMed/MEDLINE, the world’s largest biomedical database, is well-documented in literature and in our own use statistics. We and our collaborators,  Joanne Marshall, Alumni Distinguished Professor at the School of Information & Library Science at the University of North Carolina at Chapel Hill, and Amber Wells, Program Manager, MDC, Inc., made finding the answer the basis for our study, Examining the role of MEDLINE as a patient care information resource: an analysis of data from the Value of Libraries.

But how to go about it? We used a public data set, Value of Libraries and Information Services in Patient Care, to identify the use of PubMed/MEDLINE in patient care. The original dataset consisted of 16,122 individual responses from health professionals at 118 hospitals served by 56 health libraries in the United States and Canada.

The dataset included responses to a survey question asking respondents to recount an occasion in the last six months when they looked for information for patient care (beyond what is available in the patient record, EMR system, or lab results) and whether they used one or more of 19 different resources to answer their questions about patient care. One of the findings of the original study focused on the number of resources used by a health professional to aid in patient care: on average 3.5 resources per clinical question.

Our secondary analysis of the data found the two most frequently used resources were journals (print and online) and PubMed/MEDLINE. Of course, the two are related; PubMed/MEDLINE connects people to the journals through citations and abstracts. We also found that using a higher number of information resources significantly correlated with a higher probability of changes made to patient care and the avoidance of adverse events.

What helped

The well-designed dataset was collected to ensure consistency across multiple institutions. The study designers also conceived and obtained the dataset with the goal of making it open and available to others, whether individual participants or additional researchers, and they returned institution-specific data to participating institutions.

But no dataset is perfect, so having two of the original conceptualizers of the study, Joanne Marshall and Kathel Dunn, working on this analysis helped immensely. They were familiar with the data and how and why it was collected, and could correct any misunderstandings or misinterpretations. Study co-author Amber Wells, a strong statistician, was invaluable in explaining statistics clearly, allowing all authors to be confident in the results we reviewed.

We were also challenged (in a good way!) by working with Marshall and Wells. They knew the data as intended for the original study and had to work with us to reconceive its use for another purpose. It was the vision and curiosity of the co-author of this post, Joyce Backus, that pushed us to discover what the data might yield for understanding an NLM product.

We all learned and re-learned that, in post-hoc analysis, you can’t make the dataset say what it doesn’t say. We also saw that, while we were working with data older than we might have wanted, the data pointed in a direction that offered applicable results. 

But none of this would have been possible—or at least readily achieved—without the commitment of the original research team to open data. Their upfront planning for data clarity and consistency, coupled with their follow-through in making the data freely available, ensured our ability to seek new discoveries within them.

As our work shows, reusing data saves time and money, encourages collaboration, and extends the life and visibility of the original work. We encourage other librarians and researchers to see what additional insights can be mined from the value of libraries dataset or other existing data.

Dunn K, Marshall JG, Wells AL, Backus JEB. Examining the role of MEDLINE as a patient care information resource: an analysis of data from the Value of Libraries study. J Med Libr Assoc. 2017 Oct;105(4):336-346. doi:10.5195/jmla.2017.87. Epub 2017 Oct 1. PubMed PMID: 28983197; PubMed Central 
PMCID: PMC5624423.


About the Authors:

Joyce Backus is the associate director for library operations, the division responsible for selecting, acquiring and indexing medical journals for MEDLINE.  She also serves as the executive secretary for the Literature Selection Technical Review Committee, the federal advisors recommending journals for MEDLINE, and the NLM representative to the International Committee of Medical Journal Editors.




Kathel Dunn is the NLM Associate Fellowship Program Coordinator. She was part of the original team of the “value of libraries” study while leading the Regional Medical Library (RML) at NYU Langone Health Sciences Library, and subsequently resumed work on the project in a different capacity while at NLM. Kathel works now with early career librarians to incorporate data science concepts into their careers, whether they choose to fully embrace a role as a data scientist, or incorporate data science techniques into their other career interests.

Add New Comment

Posting Calendar

November 2018

Sun Mon Tue Wed Thu Fri Sat
Back to Top