May Data Sharing and Reuse Seminar

Friday, May 9, 2025

Bernhard Palsson, Ph.D., will present "What are iModulons?" on May 9, 2025, from 12:00 p.m.–1:00p.m. EDT.

About the Seminar

The first microbial genome sequences appeared in the mid to late 1990s. In the 2000s, computational biology at the genome-scale arose through the reconstruction of metabolic networks based on functional gene annotation. In the late 2000s, the cost of DNA sequencing dropped massively, leading to rapidly expanding data bases of microbial genome sequences and microbial transcriptomes. These data sets could be knowledge-enriched and decomposed into coherently functioning sets of genes using machine learning methods. A growing number of data types can be processed in a similar fashion.  Multiple data types can now be made interoperable based on known mechanisms and molecular functions. The 2020s are likely to see an accelerating fine-grained understanding of microbial physiology.

Analysis of large biological data sets can take place at four levels. At level 1 we perform multi-variate statistics, at level 2 knowledge-enrichment of large data sets, at level 3 systems biology and computational modeling, and at level 4 detailed biophysical modeling. Levels 1 and 4 are well developed in the literature. The history of genome-scale models, level 3, is about 20 years old with much progress made. Level 2 is the least developed and is focused on knowledge mapping and the use of machine learning and explanatory AI.

This talk will focus on progress at levels 2 with transcriptomes. Large compendia of high-quality RNAseq profiles can now be decomposed using Independent Component Analysis (ICA). ICA identifies independently modulated sets of genes, called iModulons. This talk will show the uses of iModulons for metabolic engineering and bioprocess development: including cross-species transfer of iModulons, Media composition, expression of heterologous genes, and y-gene discovery.

About the Speaker

Bernhard Palsson, Ph.D., Director/Principal Investigator, Departments of Bioengineering and Pediatrics, University of California, San Diego

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight examples of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

April Data Sharing and Reuse Seminar

Friday, April 11, 2025

William Klimke, Ph.D., will present "NCBI Pathogen Detection" on April 11, 2025, from 12:00 p.m.–1:00p.m. ET.

About the Seminar

NCBI Pathogen Detection integrates bacterial and fungal pathogen genomic sequences from numerous ongoing surveillance and research efforts whose sources include food, environmental sources such as water or production facilities, and patient samples. Foodborne, hospital-acquired, and other clinically infectious pathogens are included.

The system provides two major automated real-time analyses: 1) it quickly clusters related pathogen genome sequences to identify potential transmission chains, helping public health scientists investigate disease outbreaks, and 2) as part of the National Database of Antibiotic Resistant Organisms (NDARO), NCBI screens genomic sequences using AMRFinderPlus to identify the antimicrobial resistance, stress response, and virulence genes found in bacterial genomic sequences, which enables scientists to track the spread of resistance genes and to understand the relationships among antimicrobial resistance, stress response, and virulence. NDARO is a CARB-funded (Combating Antibiotic-Resistant Bacteria) initiative, whereby more than 2.2 million bacterial genomic sequences, primarily from public health surveillance activities including the NARMS (National Antimicrobial Resistance Monitoring System) project, and across more than 350 species, are analyzed. The reference data used by AMRFinderPlus for identification of these genes comes from ongoing curation of a reference collection involving collaboration with academic experts, as well as an AMR gene allele nomenclature service specifically for beta-lactamases, mobile colistin resistance, and quinolone resistance. This talk will showcase the analysis pipeline and the publicly available resources and demonstrate how the data are being used by public health and research scientists from several recently published studies.

About the Speaker

William Klimke, Ph.D., Researcher, National Library of Medicine, The National Center for Biotechnology Information Division, NIH

Dr. William Klimke is the Product Owner of both the Prokaryotic RefSeq/Genome Annotation and the Pathogen Detection projects at NCBI/NLM/NIH. He received his Ph.D from the University of Alberta in 2002, and has been at NCBI since that time and has been involved in numerous projects including: 1) RefSeq prokaryotic genomes and annotation services, 2) the Pathogen Detection resource which provides real-time genomic cluster analysis of over 350 bacterial and fungal pathogen species as well as antimicrobial resistance and virulence gene and protein identification. Dr. Klimke's has received numerous awards for his work on the Pathogen Detection pipeline which provides real-time cluster analysis of foodborne bacterial pathogens for FDA, CDC, and USDA.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.