Skip to main content
  • U.S. Department of Health & Human Services
  • National Institutes of Health
  • Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI)
  National Institutes of HealthOffice of Data Science Strategy National Institutes of HealthOffice of Data Science Strategy National Institutes of HealthOffice of Data Science Strategy National Institutes of Health Office of Data Science Strategy National Institutes of Health Office of Data Science Strategy National Institutes of Health Office of Data Science Strategy National Institutes of Health Office of Data Science Strategy
  • Facebook
  • Twitter
  • Home
    • Data Infrastucture
    • Data Ecosystem
    • Tools & Analytics
    • Community Engagement
    • Workforce Development
    • News
  • Strategic Plan
  • Resources
    • STRIDES Initiative
    • NIH Data Repositories
    • Common Data Elements (CDE)
    • Data Sharing Policies
  • Research Funding
  • About
    • Vision and Mission
    • Director's Corner
    • Scientific Data Council
    • Staff
    • Contact

COVID-19 is an emerging, rapidly evolving situation.

  • Get the latest public health information from CDC »
  • Get the latest research information from NIH »
  • NIH staff guidance on coronavirus (NIH Only) »
  • NIH and other federal agencies have made COVID-19 data available through several Open-Access Data and Computational Resources »

Wednesday, May 20, 2020

New Request for Information Seeks Public Input on Use of Cloud Resources and New File Formats for Sequence Read Archive Data

Submissions Due July 17

SRA Data in the Cloud

The National Institutes of Health’s Office of Data Science Strategy and the National Center for Biotechnology Information (NCBI) at the National Library of Medicine recently issued a Request for Information (RFI) (NOT-OD-20-108) seeking public input on how Sequence Read Archive (SRA) data can be formatted and stored to better facilitate usage, exchange, and scientific impact of the data while maintaining a sustainable, cost-effective footprint that can support continued submissions to the archive.

The SRA is one of NIH's largest and most diverse datasets – a broad collection of experimental DNA and RNA sequences that represent genome diversity across the tree of life. The SRA currently contains more than 36 petabytes of data and is continually growing. The SRA was copied to Google Cloud Platform and Amazon Web Services cloud services in 2019 as part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative. Currently, the SRA data continues to be accessible from NCBI on-premises (on-prem) storage as well.

NIH is requesting input on the use of SRA data to understand how best to manage this resource in cloud environments to facilitate its use in research while controlling costs as it grows in size. NIH would like to better understand how the research community currently uses SRA data, how researchers are using or anticipate using cloud computing with SRA data, and which formats of SRA data are most valuable to the research community.

Comments to the RFI should be submitted electronically by July 17. 

Footer

  • Office of Data Science Strategy
  • Site Map
  • Web Policies and Notices
  • FOIA
  • No Fear Act
  • DPCPSI Home
  • NIH Home
  • Office of the Inspector General
  • USA.gov

NIH…Turning Discovery Into Health®

National Institutes of Health, 9000 Rockville Pike, Bethesda, Maryland 20892

U.S. Department of Health and Human Services

Back to Top