Kim Pruitt, Ph.D.
Chief, Information Engineering Branch, National Center for Biotechnology Information
National Library of Medicine
Co-leads the Lifecycle Metrics Working Group, which hosted the Data Metrics Workshop
How do you enable data science at NIH? What roles do you play?
My branch provides several data resources, services, and analysis tools (such as PubMed Central, SRA, GenBank, BLAST) that support and enable data science research activities. My role as a manager is to establish and support efficient and effective work processes, set the right priorities, and support applied data science research that is carried out in the branch or in collaborations to ensure that we provide the most useful and reliable data, services and tools that meet today’s data science research needs.
Persevere, find a mentor, understand expectations, persevere:
The field of data science holds great promise to deliver new methods, new insights, new knowledge, and new data visualization approaches. My advice to someone entering this field is to persevere, to find an excellent mentor, go into collaborations with clear understanding of each member’s role and publication expectations, and to continually look for the “lessons learned” when an analysis strategy fails (cycle back to persevere).
Providing data access in the cloud:
NIH’s STRIDES Initiative is the largest NIH achievement in the past year. Providing access to data on the STRIDES cloud service provider platform is a prerequisite to supporting and growing the biomedical data science field. Most notable to me personally is the significant achievement of providing the complete Sequence Read Archive data (roughly 40 PB and growing) in two formats and ahead of the planned schedule on both the Amazon Web Services and Google Cloud Platform under the NIH STRIDES Initiative.
Running a start-up before running a branch:
I created and managed a small software company in the early to mid '90s, Salt City Software, that developed and marketed a suite of database applications designed to track and visualize laboratory inventories. Products included Plasmid Tracker and Genome Tracker, applications that tracked laboratory stocks, and provided graphical representations, of plasmid clone constructs and multi-generational plant genetic crosses.
Dr. Pruitt holds a Ph.D. in genetics and development. She was featured in a blog post titled "Women in Tech at NIH: Togetherness Enables Transformation" guest authored by ODSS Director Dr. Susan Gregurick for the NLM's Musings from the Mezzanine in September 2020 and a lecture Gregurick delivered in March 2021 titled "Women Leading the Way: Stories of the Women (and Men) Making an Impact on Data Science at NIH."
Back to the Women in Data Science page