Institute or Center: National Library of Medicine (NLM)
Project: Automating Review and Update of Consumer Health Information
Skills sought:
-
Data science and/or analytics background with experience in data management and data quality
-
Natural language processing and/or machine learning for analyzing text data
-
Knowledge engineering and integration, data harmonization, and interoperability
-
Creative approaches to web crawling, digital archives, and/or metadata
-
Experience as a technical liaison among multiple stakeholders
-
Experience and interest in improving health information for general audiences
About the position: NLM seeks a data scientist to develop automated approaches to review, update, and maintain a large volume of consumer health information (designed for patients and the general public) on MedlinePlus.gov. The scholar will:
-
Develop and test data science approaches to automatically flag information and resources on MedlinePlus that need to be updated without degrading the quality of content.
-
Provide guidance on the most promising techniques and tools for tackling this problem.
-
Explore a combined approach that leverages both automation strategies and library expertise of NLM staff (e.g., information retrieval, quality assurance).
About the work: Providing accurate, up-to-date health and biomedical information is a top priority for NLM. Content maintenance has been a long-standing challenge, as regular review and update of content on MedlinePlus requires the time and effort of multiple NLM staff members and an ever-increasing volume of information has challenged the team’s ability to maintain it effectively. The DATA Scholar will discover and validate automated approaches using machine learning, natural language processing, or other data science techniques to make managing consumer health content more efficient.
Datasets included: MedlinePlus.gov provides trusted information about health and wellness in English and Spanish to more than 1 million daily users. The website presents a large volume of information, covering thousands of topics and comprising almost 28,000 web pages with wide-ranging content about diagnoses, symptoms, drugs, supplements, genetics, and medical tests. The site also links to 35,000 health information resources from NIH, HHS, and other high-quality sources.
Why this project matters: Reviewing and updating consumer health information is both critical and resource intensive. This task is an ongoing challenge for MedlinePlus.gov and other content creators at NIH. Automated approaches to keeping health information up to date represent a little-explored application of data science that could benefit content teams at multiple NIH ICOs and the larger field of consumer health information.
Work Location: Bethesda, MD
Work environment: MedlinePlus is managed by the NLM’s Public Services Division, Reference and Web Services Section, Health Information Products Unit (HIPU). The scholar will work directly with Stephanie Morrison, the HIPU lead; the MedlinePlus team; and other NLM staff who will provide training and guidance, enable access to data, answer questions, and evaluate the project. Professional development opportunities include access to data science experts and researchers at NLM, collaboration with NIH content creators through the NIH Consumer Health Content Community of Practice, and potential to explore a new application of data science with a high-visibility product.
To apply to this or other DATA Scholar positions, please see instructions here: datascience.nih.gov/data-scholars-2022.