Skip to main content
  • U.S. Department of Health & Human Services
  • National Institutes of Health
  • Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI)
  National Institutes of HealthOffice of Data Science Strategy National Institutes of HealthOffice of Data Science Strategy National Institutes of HealthOffice of Data Science Strategy National Institutes of Health Office of Data Science Strategy National Institutes of Health Office of Data Science Strategy National Institutes of Health Office of Data Science Strategy National Institutes of Health Office of Data Science Strategy
  • Facebook
  • Twitter
  • Home
    • Data Infrastucture
    • Data Ecosystem
    • Tools & Analytics
    • Community Engagement
    • Workforce Development
    • News
  • Strategic Plan
  • Resources
    • STRIDES Initiative
    • NIH Data Repositories
    • Common Data Elements (CDE)
    • Data Sharing Policies
  • Research Funding
  • About
    • Vision and Mission
    • Director's Corner
    • Scientific Data Council
    • Staff
    • Contact

COVID-19 is an emerging, rapidly evolving situation.

  • Get the latest public health information from CDC »
  • Get the latest research information from NIH »
  • NIH staff guidance on coronavirus (NIH Only) »
  • NIH and other federal agencies have made COVID-19 data available through several Open-Access Data and Computational Resources »

Wednesday, September 25, 2019

Five Petabytes of Sequence Read Archive Data Now in the Cloud

The Sequence Read Archive (SRA) is the largest publicly available repository of raw, next-generation sequence data, and half of it is now available in the cloud.

The National Center for Biomedical Information (NCBI) at the National Library of Medicine (NLM) recently moved the five petabytes of public SRA data to the cloud with support from the National Institutes of Health (NIH) Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative. These data include a variety of genomes, gene expression data, and more. Plans are underway to move the other half of the SRA data, which is controlled-access human genomic data.

Having this high-throughput sequence data publicly available in the cloud marks the first time in history that researchers can compute across the entire 5-petabye collection. With this move, NIH is accelerating discoveries by providing researchers with access to this data in a flexible and scalable way via the cloud.

NCBI Director Jim Ostell talks more about the significance of this milestone in a guest blog post on the NLM director’s blog titled “Biomedical Discovery through SRA and the Cloud.”

Footer

  • Office of Data Science Strategy
  • Site Map
  • Web Policies and Notices
  • FOIA
  • No Fear Act
  • DPCPSI Home
  • NIH Home
  • Office of the Inspector General
  • USA.gov

NIH…Turning Discovery Into Health®

National Institutes of Health, 9000 Rockville Pike, Bethesda, Maryland 20892

U.S. Department of Health and Human Services

Back to Top