When people hear I’m a data science librarian, their first question is, “What does that mean?” (I’ll get to that shortly.) After that, they’re curious about my educational background. They figure I’ve got training in science, so they’re surprised to hear my first master’s degree was in English and, before I was a librarian, I was an English professor!
So how does someone go from teaching freshman composition to teaching NIH researchers how to program in R (a statistical programming language) and manage their data? Are there any connections?
My interest in supporting data science started when I was getting my second master’s—this one in library and information science at UCLA. I originally started the Master of Library and Information Science (MLIS) program with an interest in archives, but I quickly realized this somewhat solitary profession wasn’t a good fit. Instead, I gravitated toward medical librarianship, which seemed like a rewarding way to meaningfully contribute to medical research and improving health.
A class in medical knowledge representation offered through the biomedical engineering department sparked my interest in data. We spent the semester studying how to design systems that would allow clinicians to make sense of the vast amount of data related to just one specific disease, glioblastoma multiforme (GBM), a dangerous brain cancer. The ideal system would deliver clinicians many different types of patient data—images, clinical information, physicians’ notes, drug records, and more—and not just singularly, but how they progress over time. I enjoyed the challenge of this real-world problem, but I found most fascinating the rich potential within the rapidly increasing genomic data available about glioblastoma and other diseases.
Many scientists suspect the answers to why some develop cancer (or other diseases) and others do not are encoded in our genes. The substantial data we’ve already collected may be the key to discovering those answers if we can figure out how to manage it, analyze it, and extract knowledge from it. After seeing this firsthand in the GBM study, I took classes in data theory and electronic records management so I would have a strong understanding of data management by the time I graduated.
After I received my MLIS, I started working as a health and life sciences librarian at the UCLA Louise M. Darling Biomedical Library. While there, I came across an exciting opportunity to put my interest in data into practice when I became an informationist, supported by an NLM Informationist award. These grant supplements allow researchers to add an informationist to their team to provide specialized information support. I worked with a team researching how to measure corneal swelling using a type of laser, helping them align their data to existing imaging standards so it would be interoperable with other types of data.
The work with the research team was stimulating and challenging, and when the opportunity came to take a position at the National Institutes of Health (NIH) Library, providing this type of support to NIH researchers full-time, I jumped at it. As a Research Data Informationist at the NIH Library, I have worked closely with researchers on a variety of different problems related to their data.
One of the things I love most about my job is how it gives me the chance to work with so many different groups on a multitude of different problems. One day I’m in the film-reading suite of the NIH Clinical Center’s radiology department, advising them on how to organize their data within the new system they’re designing. The next, I’m helping a postdoctoral fellow working on fruit fly genetics create a visualization of her data in R. The common thread in all of it is that I get to provide advice to researchers on how to make the most of their data—and in my experience, researchers need quite a bit of assistance with many data problems. For all their training and expertise, most researchers do not learn the computational tools and data management skills they’ll need if they want to excel in today’s data-driven research environment.
You may be wondering at this point how someone can go from being an English professor who has never written a line of code in her life to teaching researchers how to use programming languages. Although the subject matter is very different, there are actually quite a few similarities between my two careers. The years I spent in the classroom as a professor have helped me gain teaching skills that I can apply to teaching any topic. Instead of meeting one-on-one with students to discuss their term papers, now I’m meeting with researchers to discuss their data problems, but the ability to discern the issues and clearly convey solutions is crucial in both settings. And when you really think about it, there’s even some overlap between my two subjects. Back then, I taught students how to effectively communicate complex ideas to humans using the English language. Now, I teach researchers how to effectively communicate complex ideas to computers using the R language!
So how did I become fluent in “speaking computer” in the form of R? Is writing code something they teach in library school? In my experience, it’s not, but fortunately, if you’re interested in becoming a data librarian, there are plenty of resources for learning, many of them freely available.
My introduction to R was the Coursera specialization on Data Science, but for me, the most important thing I did to learn R and other data science tools was just to practice with them. Most librarians have some sort of data available to them, whether it’s collections data or information about gate counts or budget, or, in my case, my own research data.
The National Library of Medicine and the National Network of Libraries of Medicine (NNLM) have also begun offering a variety of training opportunities to help librarians expand their skillset to provide more in-depth data support at their institutions. The NNLM recently announced a groundbreaking opportunity for librarians to participate in a Biomedical and Health Research Data Management training program, which will include online and in-person training and will pair students with mentors to help them complete a practical capstone project.
In my current role on detail with NLM’s Data Science Coordinating Unit as their Data Science Training Coordinator, I’m excited to help NLM determine how to meet the training needs of librarians interested in working in data science. And I’m also gratified to see that, as I examine the “data” of my career path, there’s a nice, logical progression!
About the Author:
Lisa Federer, MLIS, MA, AHIP, is currently on detail to the NLM’s Data Science Coordinating Unit, where she serves as the Data Science Training Coordinator. She is the author of several peer-reviewed articles, as well as the editor of the Medical Library Association Guide to Data Management for Librarians. She holds an MLIS from the University of California-Los Angeles and an MA in English from the University of North Texas, as well as graduate certificates in data science from Georgetown University and data visualization from New York University. She is currently pursuing a doctoral degree in information science from the University of Maryland, focusing on biomedical researchers’ data reuse practices.