June Data Sharing and Reuse Seminar

Friday, June 14, 2024

Dr. Michael Schatz will present BioDIGS: BioDiversity and Informatics for Genomics Scholars on June 14, 2024, at 12 p.m.

About the Seminar

Soil and soil organisms are essential for sustaining life, as they mediate many biological processes we rely on for food, fibers, and planetary health. Surveys of biodiversity identify soil as the single most diverse habitat on Earth and indicate that a single gram of soil may contain hundreds of millions to billions of bacterial, archaeal, and eukaryotic cells. Soil species play critical roles in promoting both healthy (e.g., nutrient and nitrogen transport, probiotics) and dysbiotic (e.g., pathogens, antibiotic resistance) environments, yet the vast majority of species remain uncharacterized, and their biological potential remains unknown.

Addressing this critical need, we have launched BioDIGS as a collaborative soil metagenome project to sample and analyze soil biodiversity with a focus on understanding how such biodiversity affects human health. To reach the broadest range of environments and participation, we partner with the Genomic Data Science Community Network (GDSCN) to complete the sampling and analysis. GDSCN was established in 2020 to improve the diversity and accessibility of genomics research and education. It includes more than 25 faculty members at community colleges, historically Black colleges and universities, Hispanic-serving institutions, Tribal colleges and universities, and related institutions. BioDIGS engages GDSCN faculty and students at all stages, from experimental design and collection through computational analysis. All data and workflows are available in Galaxy and the NHGRI AnVIL, which allows collaborative and scalable analysis for all institutions. Complementing the research, BioDIGS serves as a catalyst for a variety of professional development opportunities, classroom trainings, and curricula spanning the genomic data science life cycle.

Through BioDIGS, we have collected soil from more than 100 sites across the United States, selected to represent a variety of managed (e.g., lawns, fields, public parks) and unmanaged (e.g., dense forest, dense underbrush) areas. In addition to performing short- and long-read DNA sequencing, we submit the samples for heavy metal, pH, and other soil measurements. We further augment our data set with more than 3,000 public soil metagenomes to present one of the most comprehensive studies of soil biodiversity ever attempted. Our results highlight significant associations between metagenome diversity and heavy metal content, especially lead and arsenic across urban sites. Using long-read sequencing, we have assembled complete genomes and high-quality metagenome-assembled genomes for more than 100 novel species, as well as gigabases of novel gene sequences. Finally, we detect the presence of antimicrobial resistance genes and microbial pathways for plant and animal signaling molecules, highlighting the complex relations across kingdoms, derived from the soil metagenomes.

About the Speaker

Michael Schatz is the Bloomberg Distinguished Professor of Computer Science and Biology at Johns Hopkins University, Co-director of the National Human Genome Research Institute (NHGRI) Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL), and co-founder of the Genomic Data Science Community Network. His research is at the intersection of computer science, biology, and biotechnology and focuses on the development of novel algorithms and computing systems for human genetics, comparative genomics, and personalized medicine. For this work, he received the 2015 Alfred P. Sloan Foundation Fellowship and a 2014 National Science Foundation CAREER award and—with Telomere-to-Telomere Consortium co-leads Adam Phillippy, Karen Miga, and Evan Eichler—was named by Time magazine as one of the most influential people in the world in 2022 (TIME100). Schatz received his Ph.D. and M.S. in Computer Science from the University of Maryland in 2010 and 2008, respectively, and his B.S. in Computer Science from Carnegie Mellon University in 2000. More information is available on his laboratory website: http://schatz-lab.org.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

This page last reviewed on June 28, 2024