Arjun Krishnan, Ph.D., will present "Democratize Data-Driven Biology by Tackling Incomplete Data, Unstructured Metadata, and Hidden Curricula" at the monthly Data Sharing and Reuse Seminar on Aug. 13 at 12 EDT. Krishnan is an assistant professor in the departments of Computational Mathematics, Science, and Engineering and Biochemistry and Molecular Biology at Michigan State University.
About the Seminar
There is much enthusiasm about using omics and biomedical data collections to fuel research on complex traits and diseases. However, there are still some well-known fundamental challenges in seamlessly and effectively using these data to drive research. For instance, there are more than 1.5 million human gene expression profiles that are publicly available, but depending on the technology/platform used to record each profile, different subsets of genes in the genome are measured in these transcriptomes, leading to thousands of unmeasured genes in many of these profiles.
These gaps in data are major hurdles for integrative analysis. Critical problems also exist with data descriptions--the majority of more than 2 million publicly available omics samples lack structured metadata, including information about tissue of origin, disease status, and environmental conditions. Thus, discovering samples and datasets of interest is not straightforward.
In this seminar, Krishnan will present recent work from his group on developing machine learning approaches to address these fundamental challenges. He will also discuss the need for improving advanced research training in biological data analysis by formalizing concepts in statistical procedures, study design, data/code management, critically consuming data-driven findings, and reproducible research.
About the Seminar Series
The seminar is open to the public and registration is required each month. Individuals who need reasonable accommodations to participate in this event should contact Erin Walker at 301-827-9655 or the Federal Relay Service at 800-877-8339. Requests should be made at least 10 days in advance of the event.
The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.