| Data Science at NIH

March Data Sharing and Reuse Seminar

Friday, March 13, 2026

Derek Caetano-Anollés, Ph.D., will present "Sequence Read Archive: Leveraging this petabyte-scale database to drive biomedical discovery" from 12:00 p.m.–1:00 p.m. EST.

About the Seminar

The Sequence Read Archive (SRA) is the largest publicly available repository of high-throughput sequencing data. With big data come big challenges, and that includes keeping the SRA sustainable while making sure that data is findable, accessible, interoperable and reusable. Following a brief introduction to the SRA and the expanse of data it holds, we will share best practices for accessing SRA data for your analyses and the various formats you may encounter. Finally, we will describe the SRA Lite file format, which is faster to download with the added advantage of shrinking the overall footprint of SRA. We will demonstrate the use of SRA Lite format in NCBI RNA-seq pipelines and related analyses, and offer appropriate NCBI resources to learn more and engage with us.

About the Seminar Series

The seminar is open to the public and registration is required each month. Individuals who need interpreting services and/or other reasonable accommodations to participate in this event should contact Allison Hurst at 301-670-4990. Requests should be made at least five days in advance of the event.

The National Institutes of Health (NIH) Office of Data Science Strategy hosts this seminar series to highlight examples of data sharing and reuse on the second Friday of each month at noon ET. The monthly series highlights researchers who have taken existing data and found clever ways to reuse the data or generate new findings. A different NIH institute or center will also share its data science activities each month.