NIH/ODSS Cloud Supplement Program PI Meeting

Wednesday, January 17, 2024

January 17-18, 2024, 11am — 5pm EST

This meeting will foster the development of a cohesive NIH cloud computing community by uniting PIs (Principal Investigators) of the cloud supplement programs for a two-day gathering that will provide a platform for participants to exchange insights on their projects, celebrate accomplishments, discuss best practices, share lessons learned, and engage in collaborative discussions.

Attendees

Principal Investigators from the following cloud supplement programs have been invited:

  • HVD 20 - FY2020 Request for ODSS Funds to Catalyze Migration to the Cloud via the STRIDES Initiative (also known as FY20 High-Value Datasets program)
  • HVD 21 - FY2021 Request for ODSS Funds to Catalyze Migration to and Usage of the Cloud via the STRIDES Initiative (also known as HVD 21)
  • HVD 22 - FY2022 Funding Request Notice for ODSS High-Value Datasets Program
  • HVD 23 - FY2023 Funding Request Notice for Supporting the Exploration of Cloud in NIH Intramural Research and Contracts (also known as HVD 23)
  • NOT-OD-23-070 - Notice of Special Interest (NOSI): Administrative Supplements to Support the Exploration of Cloud in NIH-supported Research

Featured Speakers

Dr. Belinda Seto — Deputy Director, NIH/ODSS

Dr. Belinda Seto was appointed deputy director of the ODSS in January 2020. A former deputy director of the National Eye Institute and the National Institute of Biomedical Imaging and Bioengineering, she brings a wealth of experience and knowledge to the position. After earning her Ph.D. in biochemistry from Purdue University, Dr. Seto completed a postdoctoral fellowship in the Stadtman Lab of the National Heart, Lung, and Blood Institute. She researched hepatitis B and vaccine development at the FDA. She oversaw the analysis and reporting of NIH grants data and trends through the Office of Extramural Research. Her experience in database management, analysis and extramural grants policies led her to serve on the NIH Scientific Data Council and the Scientific Data Policy Council.​

Laura Biven, Ph.D. — Lead, Integrated Infrastructure and Emerging Technologies, NIH/ODSS

Since joining NIH in 2020, Dr. Laura Biven has led the Integrated Infrastructure and Emerging Technologies (IIET) branch in ODSS. She is responsible for strategic planning, coordination, and oversight of programs that integrate independently managed, cloud data resources across the NIH to advance NIH’s vision for an integrated, FAIR biomedical data ecosystem. She also oversees multidisciplinary NIH-wide programs that focus on integrating computational, mathematical, and biomedical research communities around emerging technologies such as artificial intelligence and machine learning, (AI/ML) quantum computing, and digital twins.

Agenda

Day 1
TimeEvent
11:00-11:05am ETWELCOME
Dr. Fenglou Mao, Program Officer, Cloud Computing Programs, NIH/ODSS
11:05-11:30am ETNIH Data Science Strategic Plan
Dr. Belinda Seto, Deputy Director, NIH/ODSS
Download Slides
11:30am-12:00pm ETODSS Data Infrastructure and Cloud Programs Overviews
Dr. Laura Biven, Lead, Integrated Infrastructure and Emerging Technologies, NIH/ODSS
Download Slides
12:00-12:20pm ETBEGINNING OF THE MEETING POLL
Dr. Fenglou Mao, Program Officer, Cloud Computing Programs, NIH/ODSS
12:20-1:20pm ET

BREAKOUT SESSION 1

TRACK A
Dr. Joseph Marcotrigiano (Moderator), Senior Investigator, NIH/NIAID
Implementation of AWS Cloud Computing for cryoEM Data Processing
Download Slides

Dr. Qian Zhu, Team lead, NIH/NCATS
Rare Disease Alert System
Download Slides

Dr. Tieming Liu, Professor, Oklahoma State University
Empowering Cloud Computing for Non-image-based Diabetic Retinopathy Screening by Designing an EHR-oriented Incremental Learning Framework
Download Slides

Dr. Robert Schuler, Lead Scientist, USC Information Sciences Institute
Hybrid- and Multi-Cloud Storage Strategies for Cost-effective Deployment of Data Resources
Download Slides

Dr. Jack DiGiovanna, Chief Science Officer, Velsera
Using Seven Bridges’ CAVATICA to Empower Use of the INCLUDE DCC Platform
Download Slides

TRACK B
Dr. Johnny Tam (Moderator), Senior Investigator, NIH/NEI, Dr. Vineeta Das, Postdoc, NEI, NIH, Dr. Jiamin Liu, Staff Scientist, Advanced Imaging and Microscopy (AIM) Resource, NIH
Cloud Computing for Optical Image Restoration and Intramural Training
Download Slides

Mr. William Longabaugh, Senior Software Engineer, Institute for Systems Biology
NCI CRDC Cloud Transfer of TP53 Website and Database
Download Slides

Dr. William Wasswa, Co-PI Admin Supplement, Mbarara University of Science and Technology
MUST Data Science Research Hub (MUDSReH) - Democratized Trusted Research Environment (dTRE)
Download Slides

Dr. Sandra Safo, Assistant Professor, University of Minnesota
MultiViewPortal: Towards a Scalable Web Application for Multiview Learning
Download Slides

1:20-1:30pm ETBREAK
1:30-2:30pm ET

BREAKOUT SESSION 2

TRACK A
Dr. Kim Pruitt (Moderator), Acting Director, NCBI, NIH/NLM
SRA RNA-seq Precomputed Alignments and Gene Expression Counts
Download Slides

Dr. Lee Cooper, Associate Professor, Northwestern University, Dr. Andinet Enquobahrie, Senior Director of Medical Computing, Kitware Inc.
Cloud Strategies for Improving Cost, Scalability, and Accessibility of a Machine Learning System for Pathology Images
Download Slides

Ms. Kailing Chen, Cloud Architect, CBIIT
DCEG Analytic Tools Suite
Download Slides

Dr. Kelly Crotty Program Director, NCI, Ms. Kailing Chen, Cloud Architect, CBIIT
COnsortium of METabolomics Studies
Download Slides

Dr. Georgiy Bobashev, Senior Fellow, RTI International
Cloud Computing in Opioid Policy Modeling
Download Slides

TRACK B
Dr. Kenneth Young (Moderator), CIO/Assistant Professor, University of South Florida - Health Informatics Institute
Cloud Migration of Data and Data Analysis Platform of The Environmental Determinants of Diabetes in The Young Study (TEDDY)
Download Slides

Mr. Mark Weston, CEO, Netrias, LLC
Scaling CDE Curation Model Training
Download Slides

Dr. Salvador Dura-Bernal, PI, Assistant Professor, SUNY Downstate
Dissemination of a Tool for Data-driven Multiscale Modeling of Brain Circuits (U24EB028998)
Download Slides

Dr. Adrienne Campbell, Investigator, NIH/NHLBI
Inline Image Reconstruction of Dynamic 3D Data Using a GPU-enabled Cloud Implementation
Download Slides

Dr. Jeffrey Grethe, PI, NIDDK Information Network (dkNET), University of California, San Diego
Migration of Core Applications from the NIDDK information Network (dkNET)
Download Slides

2:30-2:40pm ETBREAK
2:40-3:40pm ET

BREAKOUT SESSION 3

TRACK A
Dr. Yanbin Yin (Moderator), Professor, University of Nebraska Lincoln
Exploration of Cloud Computing for CAZyme Research
Download Slides

Mr. Srinivas Chepuri, Lead Enterprise Architect, ImmPort - NIAID/NIH
ImmPort - NIH STRIDES: Facilitating Access of Immunological Data in ImmPort for Analyses
Download Slides

Dr. Pritam Mukherjee, Clinical Center, Staff Scientist, NIH
Small Bowel Segmentation - Challenges and Directions
Download Slides

Dr. Ben Heavner, Senior Research Scientist, University of Washington
Building a Cross-study Data Set for the PRIMED Consortium
Download Slides

Dr. Alton Bodley, Postdoctoral Researcher, University of the West Indies
Exploration of Cloud Solutions to Enhance Global Infectious Diseases Research Training Program Activities
Download Slides

TRACK B
Dr. Albert Lai (Moderator), Professor, Washington University in St. Louis
Exploration of Cloud-based High Performance Computing
Download Slides

Dr. Kirsten Herrick, Program Director; System Owner and COR for ASA24, NCI, Mr. Tom Nicholson, Senior Developer, Westat/NCI
Migrating ASA24 Automated Self-Administered 24-hour Dietary Assessment Tool
Download Slides

Dr. Alexander Welsch, Contractor Programmer/Data Manager, NCATS/IFX (Axle)
Public Substance Registration Using the Global Substance Registration System (GSRS)
Download Slides

Dr. Ariana Familiar, Senior Data Scientist, Children's Hospital of Philadelphia
Enhancing Kids First Digital Pathology Datasets Via Scalable, Cloud-based Data Management, Processing, and Analytics
Download Slides

Dr. Hongsuda Tangmunarunkit, Supervising Computer Scientist, University of Southern California
ATLAS-D2K - Exploring Cloud Optimization
Download Slides

3:40-3:50pm ETBREAK
3:50-4:50pm ET

BREAKOUT SESSION 4

TRACK A
Dr. Vivek Kumar (Moderator), Associate Professor, The Jackson Laboratory
Google Cloud Pipeline for Mouse Behavior and Frailty Assessment for the Aging Research Community
Download Slides

Dr. Alison Motsinger-Reif, Branch Chief, NIH/NIEHS
Genome-wide Analysis Using Cloud Computing in the All of Us Researcher Workbench
Download Slides

Dr. Davide Ortolan, Postdoc, NEI/NIH
REShAPE: A Machine Learning Software for Cell Morphometry Analysis of Epithelial Monolayers
Download Slides

Dr. Ben Hitz, MPI, Stanford University
IGVF Cloud Computing
Download Slides

TRACK B
Dr. Janelle Cortner (Moderator), Director, Data Management and Analysis Program, CBIIT, NCI
Leveraging Intramural Data Platforms for Accelerated Data Sharing
Download Slides

Dr. Michael Nalls, Lead, NIH CARD
Cloud Forward Data Sharing: “Limit Testing” with Long Reads at CARD
Download Slides

Dr. Matt Howe, Assistant Professor, Virginia Tech
Implementation of Cloud Based Computing in a Modern Systems/Behavioral-Neuroscience Laboratory
Download Slides

Dr. Rainer Hilscher, Senior Research Data Scientist, RTI International, Katherine J. Karriker-Jaffe, Director, Community Health & Implementation Research Program, RTI International
Alcohol Use Disorder (AUD) Treatment Simulation
Download Slides

4:50-5:00pm ETCLOSE
Dr. Fenglou Mao, Program Officer, Cloud Computing Programs, NIH/ODSS
Day 2
TimeEvent
11:00-11:10am ETINTRODUCTION
Dr. Laura Biven, Lead, Integrated Infrastructure and Emerging Technologies, NIH/ODSS
11:10am-12:10pm ET

BREAKOUT SESSION 5

Professor Michael Schatz (Moderator), Bloomberg Distinguished Professor of Computer Science and Biology, Johns Hopkins University
“T2T-omics” at scale: Improving our understanding of human genetic variation using AnVIL
Download Slides

Dr. Shaun Purcell, Associate Professor, Brigham & Women's Hospital, Harvard Medical School
National Sleep Research Resource
Download Slides

Dr. Marcelo Freire, Associate Professor, J. Craig Venter Institute
Cloud-Based Machine Learning and Biomarker Visual Analytics for Salivary Proteomics
Download Slides

Dr. Daniel Veltri, Health Scientist (Data Science), NIH/NIAID
PII-secured AWS Computing Environment (PACE)- Some lessons learned working with PII in the STRIDES AWS Environment
Download Slides

12:10-12:20pm ETBREAK
12:20-1:20pm ET

BREAKOUT SESSION 6

Dr. Zhiyong Lu (Moderator), Deputy Director for Literature Search, NCBI; Senior Investigator, NIH/NLM
Scaling Up Literature Annotations with Cloud Computing in PubTator 3.0
Download Slides

Dr. Christopher Zalewski, Clinical Research Audiologist, NIDCD, NIH
Generation of an NIH-wide Clinical Database of Hearing and Balance Function
Download Slides

Dr. Nathan Salomonis, Associate Professor, Cincinnati Children's Hospital Medical Center
The NHBLI LungMAP Cloud Ecosystem: Connecting Diverse Digital and Lung Biology Resources
Download Slides

Dr. Cody Baker, Sr. Neurodata Scientist; Ms. Urjoshi Sinha, Computer Engineer, Lawrence Berkeley National Laboratory
Evaluation and Optimization of NWB Neurophysiology Software and Data in the Cloud
Download Slides

Dr. Javed Khan, Senior Investigator, CCR, NCI
Migration to Cloud of the Oncogenomics Next Generation Sequencing Pipelines & Databases for CCDI and Other Pediatric Cancers
Download Slides

1:20-1:30pm ETBREAK
1:30-2:30pm ET

BREAKOUT SESSION 7

Dr. Deborah Duran (Moderator), Senior Advisor, Office of the Director, NIH/NIMHD
ScHARe - Science collaborative for Health disparities and Artificial intelligence bias Reduction
Download Slides

Dr. Govind Bhagavatheeshwaran, Staff Scientist, NINDS/NIH
Medical Image Processing and Structured Storage
Download Slides

Dr. Aparna Gullapalli, Assistant Professor, Mind Research Network
Cloud Based Neuroimaging Analysis for Identifying Traumatic Brain Injuries and Related Changes
Download Slides

Dr. Bing Yu, Associate Professor, University of Texas Health Science Center at Houston
Development of a Cloud-based Analytical Tool for Polygenic Risk Score and its Implication in Heart Failure Research
Download Slides

Dr. Keyvan Farahani, NIH/NHLBI
A Sustainable Medical Imaging Challenge Cloud Infrastructure (MedICCI)
Download Slides

2:30-2:40pm ETBREAK
2:40-3:40pm ET

BREAKOUT SESSION 8

During this breakout session, we will explore barriers, challenges, opportunities, including novel ideas and future directions in cloud computing for biomedical research. The session will be led by NIH program officers.

TRACK A
Dr. Fenglou Mao (Moderator), Program Officer, Cloud Computing Programs, NIH/ODSS

TRACK B
Mr. Nick Weber (Moderator), Acting Director, Office of Scientific Computing, Cloud Services Program Manager, STRIDES Initiative Lead, NIH/CIT

TRACK C
Mr. Mike Conway (Moderator), Data Systems Architect/Engineer, Office of Data Science, NIH/NIEHS

3:40-3:50pm ETBREAK
3:50-4:30pm ETBREAKOUT SESSION 8 REPORT BACK
Dr. Fenglou Mao, Program Officer, Cloud Computing Programs, NIH/ODSS
4:30-4:50pm ETEND OF THE MEETING POLL
Dr. Fenglou Mao, Program Officer, Cloud Computing Programs, NIH/ODSS
4:50-5:00pm ETCLOSEOUT & ADJOURN

 

June Data Sharing and Reuse Seminar

Friday, June 10, 2022

Dr. Rosa Alcazar will present "Creating a Just Genomic Data Science Community by Providing Resources at Community Colleges" at the monthly Data Sharing and Reuse Seminar on June 10, 2022 at 12 p.m. EDT.

View Recording

About the Seminar

Genomic data science has become foundational to modern biology research.  To create a just research community, we must provide sufficient resources to places that excel in serving people from underrepresented populations.  While community colleges offer pathways for many who wouldn’t otherwise be able to attend college, they  often lack the compute infrastructure, curriculum, and professional development necessary for instruction in rapidly changing fields.  In partnership with a group of dedicated researchers and educators, we are building the Genomic Data Science Community Network (GDSCN) to provide resources accessible by anyone with an internet connection.  Our vision is to share not only data but training and expertise irrespective of institutional affiliation breaking down barriers that create silos perpetuating a homogenized research community. 

About the Speaker

Dr. Rosa Alcazar is a Latina, first-generation high school graduate that attended community college part-time before transferring to UC Riverside and then receiving her doctorate at Johns Hopkins University.  She is currently a Biology Instructor at Clovis Community College where she advocates for institutional changes to remove systemic barriers in order to create a diverse research community.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Rachel Pisarski at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

March Data Sharing and Reuse Seminar

Friday, March 8, 2024

Dr. Kilian Pohl will present, "Accelerating Neuroscience Discovery Using Shared Software and Data" on March 8, 2024, at 12 p.m.

About the Seminar

Sharing software and data has led to new discoveries in neuroscience and lowered the barriers for replication. Adequate power to promote discovery results from aggregating and repurposing well-curated data acquired by multiple sites. Studies based on NIAAA-funded NCANDA-A are exemplary of this sharing process. Since 2013, NCANDA-A has been collecting multimodal neuroscience data annually on 831 individuals (baseline age: 12–21 years). The data are uploaded and curated through a data management system called Scalable Informatics for Biomedical Imaging Studies (SIBIS) (https://github.com/sibis-platform). SIBIS relies principally on publicly available software to span the entire life cycle of electronic data (i.e., capture, harmonize, quality control, share, and analyze). This talk will review the design of SIBIS, identify the challenges in analyzing public multimodal data via machine-learning technology, and highlight research findings that resulted from overcoming those challenges. 

About the Speaker

In 2002, Kilian M. Pohl started sharing machine-learning software for the analysis of neuroscience data as part of his graduate research at the Massachusetts Institute of Technology and Harvard Medical School. Kilian is now a Professor in Psychiatry and Behavioral Sciences and, by courtesy, Electrical Engineering at Stanford University. He is the contact Principal Investigator of the Data Analysis Resource of the National Consortium on Alcohol and Neurodevelopment in Adolescence - Adulthood (NCANDA-A) and of the Computational Neuroscience Laboratory (CNSLAB). For neuroscience studies such as those conducted by NCANDA-A, the CNSLAB manages the data and creates machine-learning models to identify phenotypes that improve the mechanistic understanding, treatment, and prevention of neuropsychiatric disorders.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.

NIH released RFI on Proposed Use of CDEs for NIH-Funded Clinical Research and Trials

Wednesday, February 21, 2024

Responses due April 20

The National Institutes of Health (NIH) released a Request for Information (RFI) on Proposed Use of Common Data Elements (CDEs) for NIH-Funded Clinical Research and Trials (NOT-OD-24-063). Responses are due April 20, 2024. 

NIH is requesting input on: a set of minimum core CDEs in the demographics/personal characteristics category; recommended CDEs in the clinical domains including autoimmune diseases and immune-mediated diseases; high-level CDEs for social determinants of health (SDoH) domains; tools and technologies that could enhance the use of NIH CDEs; and policies and governance that could facilitate and incentivize broader CDE usage in research and in data sharing and management.

This RFI is also an NIH effort to understand the challenges and opportunities in the use and development of CDEs in research and to inform appropriate NIH guidance and mechanisms to lower the barriers to CDE use and improve the ability to aggregate and integrate CDE-based data.

Interested parties may find additional information at: https://datascience.nih.gov/cde-rfi

Inquiries for this RFI should be directed to Belinda Seto, Ph.D., at cde-rfi@od.nih.gov.

April Data Sharing and Reuse Seminar

Friday, April 12, 2024

Mr. Andrew Smith will present ELIXIR: Working Together to Accelerate the Understanding of Life on April 12, 2024, at 12 p.m.

About the Seminar

ELIXIR is a pan-European research infrastructure for life science data. It recently published its Scientific Programme at https://elixir-europe.org/news/programme-2024-28, setting out its vision for the next 5 years. ELIXIR’s new strategic priorities acknowledge the importance of not only investing in science and technology but also building capacity and increasing participation in ELIXIR member countries. Already present in more than 20 countries across Europe, ELIXIR will work to:

  • Enable scientists across the globe to access and analyse life science data
  • Deliver services to support federated data management and analytics in life science
  • Equip national ELIXIR Nodes for successful long-term operations
  • Develop people and capacity to benefit science and society
     

About the Speaker

Andrew (Andy) Smith joined ELIXIR in 2011 to help establish the organization and support its progression from preparatory stages to permanence. Until June 2024, Andy is serving as Interim Director in addition to his role as Head of External Relations. 
As Head of External Relations, Andy manages ELIXIR’s engagement with Member States, funders, and policymakers. He also leads ELIXIR’s engagement with the EU institutions. His team is responsible for developing ELIXIR’s industry strategy and facilitating international collaborations between ELIXIR partners and global collaborators, including those in the United States. 
Andy has represented ELIXIR on the Organisation for Economic Co-operation and Development (OECD) and G7 Group of Senior Officials working groups on topics relating to open science and international collaboration. He is the coordinator of the EU-funded ELIXIR-STEERS project, which has a focus on software and workflow development best practices.
 

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Janiya Peters at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight exemplars of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.