May Data Sharing and Reuse Seminar

Friday, May 9, 2025

Bernhard Palsson, Ph.D., will present "What are iModulons?" on May 9, 2025, from 12:00 p.m.–1:00p.m. EDT.

About the Seminar

The first microbial genome sequences appeared in the mid to late 1990s. In the 2000s, computational biology at the genome-scale arose through the reconstruction of metabolic networks based on functional gene annotation. In the late 2000s, the cost of DNA sequencing dropped massively, leading to rapidly expanding data bases of microbial genome sequences and microbial transcriptomes. These data sets could be knowledge-enriched and decomposed into coherently functioning sets of genes using machine learning methods. A growing number of data types can be processed in a similar fashion.  Multiple data types can now be made interoperable based on known mechanisms and molecular functions. The 2020s are likely to see an accelerating fine-grained understanding of microbial physiology.

Analysis of large biological data sets can take place at four levels. At level 1 we perform multi-variate statistics, at level 2 knowledge-enrichment of large data sets, at level 3 systems biology and computational modeling, and at level 4 detailed biophysical modeling. Levels 1 and 4 are well developed in the literature. The history of genome-scale models, level 3, is about 20 years old with much progress made. Level 2 is the least developed and is focused on knowledge mapping and the use of machine learning and explanatory AI.

This talk will focus on progress at levels 2 with transcriptomes. Large compendia of high-quality RNAseq profiles can now be decomposed using Independent Component Analysis (ICA). ICA identifies independently modulated sets of genes, called iModulons. This talk will show the uses of iModulons for metabolic engineering and bioprocess development: including cross-species transfer of iModulons, Media composition, expression of heterologous genes, and y-gene discovery.

About the Speaker

Bernhard Palsson, Ph.D., Director/Principal Investigator, Departments of Bioengineering and Pediatrics, University of California, San Diego

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight examples of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

April Data Sharing and Reuse Seminar

Friday, April 11, 2025

William Klimke, Ph.D., will present "NCBI Pathogen Detection" on April 11, 2025, from 12:00 p.m.–1:00p.m. ET.

About the Seminar

NCBI Pathogen Detection integrates bacterial and fungal pathogen genomic sequences from numerous ongoing surveillance and research efforts whose sources include food, environmental sources such as water or production facilities, and patient samples. Foodborne, hospital-acquired, and other clinically infectious pathogens are included.

The system provides two major automated real-time analyses: 1) it quickly clusters related pathogen genome sequences to identify potential transmission chains, helping public health scientists investigate disease outbreaks, and 2) as part of the National Database of Antibiotic Resistant Organisms (NDARO), NCBI screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences, which enables scientists to track the spread of resistance genes and to understand the relationships among antimicrobial resistance, stress response, and virulence. NDARO is a CARB-funded (Combating Antibiotic-Resistant Bacteria) initiative, whereby more than 2.2 million bacterial genomic sequences, primarily from public health surveillance activities including the NARMS (National Antimicrobial Resistance Monitoring System) project, and across more than 350 species, are analyzed. The reference data used by AMRFinderPlus for identification of these genes comes from ongoing curation of a reference collection involving collaboration with academic experts, as well as an AMR gene allele nomenclature service specifically for beta-lactamases, mobile colistin resistance, and quinolone resistance. This talk will showcase the analysis pipeline and the publicly available resources and demonstrate how the data are being used by public health and research scientists from several recently published studies.

About the Speaker

William Klimke, Ph.D., Researcher, National Library of Medicine, The National Center for Biotechnology Information Division, NIH

Dr. William Klimke is the Product Owner of both the Prokaryotic RefSeq/Genome Annotation and the Pathogen Detection projects at NCBI/NLM/NIH. He received his Ph.D from the University of Alberta in 2002, and has been at NCBI since that time and has been involved in numerous projects including: 1) RefSeq prokaryotic genomes and annotation services, 2) the Pathogen Detection resource which provides real-time genomic cluster analysis of over 350 bacterial and fungal pathogen species as well as antimicrobial resistance and virulence gene and protein identification. Dr. Klimke's has received numerous awards for his work on the Pathogen Detection pipeline which provides real-time cluster analysis of foodborne bacterial pathogens for FDA, CDC, and USDA.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

March Data Sharing and Reuse Seminar

Friday, March 14, 2025

Kasim Allel, Fred Mutisya, Patricia Bradford, and Rebecca Li will present "The VivliAMR DATA Challenge as a Use Case" on March 14, 2025, from 12:00 p.m.–1:00p.m. ET.

About the Seminar

This webinar will explore the critical role of data sharing and re-use and demonstrate how data sharing can be effective using a case study in the area of anti-microbial resistance (AMR) using the real-world AMR data challenge as a use case. Experts and Global Grand prize data challenge awardees from 2023 and 2024 will discuss best practices for overcoming barriers to data accessibility, ensuring responsible re-use, and fostering international collaboration through the mechanism of a data challenge. A key focus will be the AMR Register run by Vivli, a global data-sharing platform designed to facilitate access to antimicrobial resistance data, enhance transparency, and drive innovative solutions in AMR research. 

About the Speakers

Kasim Allel, Ph.D., Researcher, University of Oxford, UK (email: [email protected])

Kasim Allel is a researcher at the Nuffield Department of Primary Care, University of Oxford, specialising in epidemiological, mathematical, and health-economic modelling of antimicrobial resistance (AMR). He holds a PhD in Infectious Diseases from the London School of Hygiene and Tropical Medicine and an MSc in Health Economics from University College London. His research examines how socioeconomic, environmental, and spatial factors drive AMR transmission and its health and economic impacts. He focuses on integrating transmission modelling into health-economic evaluations to strengthen surveillance and intervention strategies for AMR and infectious diseases.

Fred Mutisya, M.D., Field Epidemiology Resident, Kenya (email: [email protected])

Dr. Fred Mutisya is a registered medical doctor, FELTP resident and AI developer. He completed his Bachelor's degree in Medicine and Surgery at the University of Nairobi in 2013. Currently he is enrolled in a Master’s in Field Epidemiology degree sponsored by the CDC (FELTP). He is finalizing his Master’s in AI and Data Science in Germany with a thesis on the performance of open-source large language models for clinical decision support during Epidemics. He is involved with multiple consultancies in the AI field including Antibiotic decision support using AI, and Retinopathy prediction using computer vision. Before proceeding on study leave, Dr. Mutisya was in charge of a 150-bed hospital in Narok County and supervised 20 dispensaries and health centers in the Maasai Mara ecosystem. He is a member of the Adult & Pediatric HIV committee of experts in Kenya. For these efforts, his team received the 2020 PEPFAR Heroes award. He is a member of the Kenya Medical Association where he sits in the managed healthcare and ICT committees. Fred enjoys playing jazz guitar as his creative outlet.

Patricia Bradford, Ph.D., Microbiology Consultant, Antimicrobial Development Specialists, LLC (email: [email protected]

Patricia A. Bradford is the owner of Antimicrobial Development Specialists LLC, a consulting company that focuses on the late-stage development of antibiotics. Prior to this she has held positions in antibiotic research for AstraZeneca, Novartis, Wyeth Pharmaceuticals and Lederle Laboratories. Dr. Bradford is fellow in the American Academy of Microbiology and has over 115 publications in peer-reviewed scientific journals.  She is currently on the editorial board for several society journals.  Dr. Bradford received a Ph.D. in Medical Microbiology from Creighton University and completed a post-doctoral fellowship in β-lactamase research.

Rebecca Li, Ph.D., Vivli CEO and Board Member (email: [email protected])

Rebecca Li, Ph.D., is the CEO of Vivli and on faculty at the Center for Bioethics at the Harvard Medical School.  Previous to her current role she was the Executive Director of the MRCT Center of Brigham and Women’s Hospital and Harvard for over 5 years and remains a Senior Advisor at the Center.  She has over 25 years of experience spanning the entire drug development process with experience in Biotech, Pharma and CRO environments. She completed a Fellowship in 2013 in the Division of Medical Ethics at Harvard Medical School.  She earned her Ph.D. in Chemical and Biomolecular Engineering from Johns Hopkins University.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.