Announcing the 2023 DataWorks! Prize Winners

Thursday, February 1, 2024

In early 2023, ODSS partnered with the Federation of American Societies for Experimental Biology (FASEB) to launch the second annual DataWorks! Prize to highlight examples of innovative data sharing and reuse.

This year, 39 teams registered for the challenge to demonstrate their accomplishments. The 218 team members came from a wide variety of disciplines, including biochemistry, clinical research, genomics, immunology, molecular biology, neuroscience, and more.

Representatives from the grand prize-winning team will present at the Data Sharing and Reuse Seminar series on Friday, Feb. 9, 2024.

Grand Prize $100,000

CCC19

COVID-19 and Cancer: Catalyzing Collaboration

The COVID-19 and Cancer Consortium (CCC19) is a collaboration that collects data about patients with cancer who have been diagnosed with COVID-19.

Distinguished Achievement Award $50,000

IPop CATS

GeoPIPE: Reusing Open Data and Letting Data Flow

GeoPipe is pipeline for enriching open data streams with geospatial analyses and natural language processing.

 

Maryellen Giger’s Team

Sharable Curated, Diverse Medical Images at Scale

MIDRC is a collaboration to create an open curated, diverse commons for medical imaging AI research and a sequestered one for translation.

Exemplary Achievement Award $25,000

ASAP Discovery Consortium

An Open Pipeline for Antiviral Drug Discovery

To nucleate a global antiviral pipeline to prevent future pandemics, we created a new model for open science accelerated drug discovery.

 

Karen Yook’s Team

Making Data Useable While Publishing

microPublication Biology re-architects the publishing workflow by including curators to alleviate numerous obstacles in data reusability.

 

StrokeFAIR

StrokeFAIR: A Public Dataset and Analytical Tools

StrokeFAIR shares FAIR images, metadata, and analytical tools for acute brain stroke, democratizing avenues to perform reproducible reliable research.

Significant Achievement Award $12,500

Caltech Library

Naming Data Files Descriptively for Easier Reuse

A worksheet for creating file naming conventions to label research data descriptively and consistently.

February Data Sharing and Reuse Seminar

Friday, February 9, 2024

Mr. Alex VanHelene, Dr. Sanjay Mishra, Dr. Michael Rooney, and Dr. Jeremy L. Warner will present COVID-19 and Cancer: Catalyzing Collaboration on February 9, 2024, at 12 p.m.

About the Seminar

To understand and assess the uncertain effects of COVID-19 on people affected by cancer, CCC19 was founded in March 2020 and developed a robust and agile strategy to collect and disseminate prospective, granular, uniformly organized information on patients with cancer diagnosed with COVID-19 — at scale and as rapidly as possible. This systematic data sharing recipe included three key components: data sourcing and acquisition, data management, and data model sharing.

Taking inspiration from existing best practices, CCC19 sought to accelerate clinical research by facilitating data sharing amongst 126 cancer institutions across North America and eventually logged more than 19,000 cases – the largest registry of its kind. Data standardization is managed through existing clinical vocabularies whenever possible. Through continuous quality assurance of contributed data from participating institutions, CCC19 ensures compliance and standardization with registry-based research standards. Our knowledge is publicly accessible because of the direct features of REDCap that enable local reusability, and open code sharing on GitHub. To further emphasize best practice “recipes” to advance biological and biomedical research activities, all CCC19 publications, the data model, and derived variable code are publicly available. CCC19’s transparent and streamlined approach to data management demonstrates the power of data sharing practices to advance scientific discovery and human health.

About the Speakers

Mr. Alex VanHelene is a Clinical Research Assistant at Rhode Island Hospital. Dr. Sanjay Mishra is the Research Program Manager at Rhode Island Hospital and Coordinator of the COVID-19 and Cancer Consortium. Dr. Michael Rooney is a Radiation Oncology Resident at the University of Texas MD Anderson Cancer Center. Dr. Jeremy L. Warner is a Professor of Medicine at Brown University and Director of the Research Coordination Center of the COVID-19 and Cancer Consortium.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

January Data Sharing and Reuse Seminar

Friday, January 12, 2024

Dr. Michelle Hribar will present Common Data Models for Ophthalmology Research Collaboration on January 12, 2024, at 12 p.m.

About the Seminar

Large diverse datasets are necessary for building accurate and unbiased AI/ML models in research for vision and eye health, but challenges in data standardization have become a barrier for creating these datasets. Large NIH data generation projects such as All of Us, N3C, and Bridge2AI include minimal ophthalmic clinical or imaging data since these data elements are not yet a part of their underlying common data model (CDM): the OMOP CDM. To address this gap, the Eye Care and Vision Research workgroup was created within the Observational Health Data Science and Informatics (OHDSI) community. As part of her NIH DATA Scholar work, Dr. Hribar co-leads this workgroup. In this talk, she will discuss the standardization efforts and an example research use case that our group has completed as well as the vision for future data models and infrastructure to support research in eye care and vision science. 

About the Speaker

Dr. Hribar is the Assistant Professor of Medical Informatics and Dr. Clinical Epidemiology at Oregon Health & Science University, School of Medicine. 

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

Announcing the Launch of the RADx Tribal Data Repository

Friday, December 1, 2023

American Indian and Alaska Native (AI/AN) communities across the nation were — and continue to be — disproportionately impacted by the COVID-19 pandemic. Health disparities among AI/AN communities include an undue burden of infections, lack of access to health care and increased hospitalizations, and higher death rates.

To address the disparities recognized in these communities, the NIH has focused on supporting research projects that can increase our overall understanding of COVID-19 and its effects on AI/AN communities. In response to the May 2020 Tribal Consultation for COVID-19, NIH incorporated Tribal input into the  Rapid Acceleration of Diagnostics (RADx) Underserved Populations (RADx-UP) initiative. This initiative aims to accelerate innovation in developing and implementing testing strategies for COVID-19 based on community-engaged research.

This week, three years since the launch of RADx, I am incredibly honored and excited to announce the launch of the RADx Tribal Data Repository: Data for Indigenous Implementations, Interventions, and Innovations (RADx TDR: D4I).

RADx TDR: D4I will establish a data repository consistent with Tribal sovereignty for researchers and their collaborators interested in working with RADx data provided by American Indian and Alaska Native research participants to better understand and address the impact of COVID-19 and other health disparities. Specific activities will include education and training programs on best practices for responsible data sharing and access, and constructing a secure repository to support data storage, access, harmonization, and monitored sharing of data related to COVID-19 testing and vaccination.

In support of American Indian/Alaska Native researchers and other scientists working with those communities, will fund efforts working toward a better understanding of COVID-19 impact and provide data to allow for data informed decisions and policy development in addressing the COVID-19 pandemic and potential future pandemics.

The RADx TDR: D4I is supported under an “Other Transaction Agreement” (OTA) managed by ODSS, with six collaborative awards. The awardees includes Stanford University, the prime awardee, with Native BioData Consortium as project and research director, as well as the University of Wisconsin-Madison; The Ohio State University; the University of California, Santa Cruz; Arizona State University; and the University of Washington, Seattle.

As ODSS Director and NIH’s Associate Director for Data Science, I want to express my sincere gratitude to everyone who played a part in this project — my colleagues at the NIH Office of the Director, the NIH Tribal Health Research Office (THRO), the National Institute on Minority Health and Health Disparities (NIMHD), and especially the participants of the Tribal consultations for their guidance and collaboration on this trailblazing project.

ODSS is deeply committed to partnering with Tribal nations to support data science activities that improve the health of American Indian and Alaska Native communities. Across NIH, there is a growing number of Tribal health research efforts with an emphasis on trust, respect, and Tribal sovereignty. We look forward to the work of RADx TDR: D4Ias we continue to understand and address the impacts of COVID-19 and other health disparities. 

To view the NIMHD statement on this announcement, check out the NIMHD Director’s Letter: https://nimhd.nih.gov/about/directors-corner/messages/nih-launches-radx-tribal-data-repository.html

December Data Sharing and Reuse Seminar

Friday, December 8, 2023

Dr. Joaquin M. Espinosa will present Being FAIR in the pan-omics era: lessons from the INCLUDE Project on December 8, 2023, at 12 p.m.

About the Seminar

This presentation will discuss strategies and policies for effective sharing and reuse of large multidimensional datasets. Dr. Espinosa will discuss his experiences as a data generator, data analyst, collaborator, teacher, and mentor through the COVIDome Project, the Human Trisome Project, and the INCLUDE Data Hub.  Dr. Espinosa will illustrate the power of sharing data ahead of publication and the need for user-friendly data sharing platforms and intuitive data visualization portals. His presentation will include real-life examples applicable to the study of COVID19 and Down syndrome. He will also present on the importance of developing training and education opportunities for diverse stakeholders. Lastly, he will discuss the importance of international data collection and sharing at a global scale.

About the Speaker

Dr. Espinosa is the Executive Director of the Linda Crnic Institute for Down Syndrome and Professor of Pharmacology at the University of Colorado School of Medicine at the Anschutz Medical Campus. Dr. Espinosa received his Bachelor’s degree in Biology from the Universidad Nacional de Mar del Plata, Argentina, in 1994, and a PhD in Biology from the Universidad de Buenos Aires, Argentina, in 1999. Supported by a fellowship from the PEW Charitable Trusts, Dr. Espinosa completed his post-doctoral training at the Salk Institute for Biological Studies in La Jolla, California. In 2004, supported by a fellowship from the Leukemia and Lymphoma Society, he began his independent appointment at the University of Colorado Boulder, in the Department of Molecular, Cellular and Developmental Biology. In 2009 he was appointed to the Howard Hughes Medical Institute as an Early Career Scientist. At the Crnic Institute, Dr. Espinosa directs the Human Trisome Project, a pan-omics cohort study of the population with Down syndrome, which has enabled the design and launch of novel clinical trials to improve health outcomes in Down syndrome. Dr. Espinosa currently serves as the Leader of the Administrative and Outreach Core of the NIH INCLUDE Project Data Coordinating Center, a new data resource that aims to accelerate discoveries into the mechanisms underlying the increased risk of co-occurring medical conditions in people with Down syndrome.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

“Todos Somos, Somos Uno: We Are All, We Are One!” ODSS Celebrates Hispanic Heritage Month

Thursday, September 28, 2023

Guest Blog written by Dr. Samson Gebreab, AIM-AHEAD Program Lead

In celebration of the history, culture, and contributions of Hispanics and Latinos, the NIH Office of Data Science Strategy (ODSS) is highlighting one of its flagship initiatives. The Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity (AIM-AHEAD) program, launched in 2021, is increasing diversity in the AI/ML workforce and building a more inclusive research community to address health disparities and advance health equity.

AIM-AHEAD’s overall mission is to bring the benefit of AI/ML to all people of diverse backgrounds, especially those who may have been left out in the AI/ML research enterprise.  Many historically underserved communities, including Hispanic and Latino communities, have not been well represented in the AI/ML workforce, datasets, research, and infrastructure development. The lack of representation can contribute to AI bias, leading to inaccurate clinical outcomes that may not reflect these underserved communities' health conditions or lived experiences.

ODSS recognizes that achieving diversity in the AI/ML workforce is critical to addressing the sources of AI bias contributing to health disparities and inequities. The AIM-AHEAD initiative provides a range of training opportunities across the academic continuum to increase the representation of Hispanic, Latino, and other underrepresented researchers in the AI/ML and data science space, including:

These training and fellowship programs include 15 Hispanic individuals. In recognition of Hispanic Heritage Month 2023, ODSS is pleased to share some of their perspectives in the video below.

 

 

AIM-AHEAD is also committed to using AI/ML to understand and addressing the varied factors driving the health disparities of Hispanic and Latino communities, including economic and healthcare access barriers, cultural factors, and lived experiences. In particular, the AIM-AHEAD program promotes community-centered AI/ML research projects that engage, empower, and closely collaborate with Hispanic and Latino community stakeholders when tackling their health challenges and needs:

  • An AIM-AHEAD-supported community-entered research project is a collaboration with ROSAesROJO that makes wellness and cancer prevention accessible to Hispanic/Latina women and their families in the United States. The researchers are working with the Bi-National Center at Texas A&M University and Hospital Mexico Americano in Nuevo Laredo, Mexico, to build a trilateral relationship to collect data and run a racially unbiased AI algorithm trial for breast cancer detection in the Mobile Mammogram vans.
  • AIM-AHEAD researchers, in partnership with Tepeyac Community Health Center and Clinic Chat LLC, are developing an artificially intelligent chatbot to facilitate improved access to cancer screening in English and Spanish-speaking Hispanic/Latino populations in Colorado experience disparities in cancer screening, timely diagnosis, and access to treatment for several cancers in comparison to other demographic groups.

These training and community-centered pilot projects reflect a small sample of AIM-AHEAD program activities focused on Hispanic and Latino researchers and communities. During Hispanic Heritage Month and beyond, we encourage you to visit the AIM-AHEAD website to engage and learn more about how the program is leading the way to advance health equity using AI/ML by bringing together diverse datasets, researchers, and communities.

“Todos Somos, Somos Uno: We Are All, We Are One!”

Health Science Administrator - Data Search and Discovery

The position would serve as a Health Scientist Administrator (Program Officer, PO) within the National Institutes of Health (NIH), Office of the Director (OD), Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI), Office of Data Science Strategy (ODSS), and will work as the Program Officer for Data Search and Discovery in the Integrated Infrastructure and Emerging Technologies (IIET) team.

Scientific Program Analysts

GS9-13 Scientific Program Analysts who will contribute to analysis and reporting projects on a diverse portfolio of extramural research investments and supporting operations.

The selected candidate will conduct both portfolio analysis projects and initiatives to improve research operations within the NINDS and in partnership with other ICs and enterprise system owners.

Requirements:

Strong project management skills.
Strong data visualization and presentation skills.
Experience with Qlik, Tableau or RShiny is highly desirable.

October Data Sharing and Reuse Seminar

Friday, October 13, 2023

Zhiyong Lu, Ph.D. will present AI in Medicine: Improving Access to Literature Data for Knowledge Discovery at the monthly Data Sharing and Reuse Seminar on Friday, October 13, 2023, at 12 p.m. EDT.

About the Seminar

AI in Medicine: Improving Access to Literature Data for Knowledge Discovery

The explosion of biomedical big data and information in the past decade or so has created new opportunities for discoveries to improve the treatment and prevention of human diseases. But the large body of knowledge—mostly exists as free text in journal articles for humans to read—presents a grand new challenge: individual scientists around the world are increasingly finding themselves overwhelmed by the sheer volume of research literature and are struggling to keep up to date and to make sense of this wealth of textual information. Our research aims to break down this barrier and to empower scientists towards accelerated knowledge discovery. This seminar will discuss the development of large-scale, AI-based solutions for better understanding scientific text in the biomedical literature. Moreover, I will demonstrate their uses in some real-world applications such as improving PubMed searches (Fiorini et al., Nature Biotechnology 2018), supporting precision medicine with LitVar (Allot et al., Nature Genetics 2023), and taming COVID-19 pandemic paper tsunami in LitCovid (Chen et al., Nature 2000).

About the Speaker

Dr. Zhiyong Lu is a (tenured) Senior Investigator at the National Library of Medicine Intramural Research Program, leading research in biomedical text and image processing, information retrieval, and AI/machine learning. In his role as Deputy Director for Literature Search at National Center of Biotechnology Information (NCBI), Dr. Lu oversees the overall R&D efforts to improve literature search and information access in resources like PubMed and LitCovid, which are used by millions worldwide each day. Dr. Lu also serves as an Associate Editor of Bioinformatics, and Organizer of the BioCreative NLP challenge. Over the last 15 years, Dr. Lu has mentored over 60 trainees, many of whom have gone on to become independent faculty members/researchers at academic institutions in the US, Europe, and Asia. With over 300 peer-reviewed publications, Dr. Lu is a highly cited author, and a Fellow of the American College of Medical Informatics (ACMI) and the International Academy of Health Sciences Informatics (IAHSI).

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.