Sequence Read Archive
About the Sequence Read Archive (SRA)
The Sequence Read Archive (SRA) is the National Center for Biotechnology Information (NCBI) database that stores sequence data obtained from next generation sequence technology. Released in 2009, the SRA contains 9 million records and 12 petabytes of data. The SRA is a broad collection of experimental DNA and RNA sequences that represent genome diversity across the tree of life. Through this database, researchers can search metadata for those sequences to locate the sequence reads for further analyses.
SRA in the Cloud
SRA data is available in the Google Cloud Platform and Amazon Web Services clouds through the STRIDES Initiative. All publicly available, unassembled read data and authorized-access human data are available for access and compute through these cloud providers. For more information on how to access and work with SRA data in the cloud, please see the NCBI SRA in the Cloud documentation. There is also information available about formats of SRA data available in the cloud, as well as about SRA data access costs in the cloud.
SRA Data Working Group
The Council of Councils advises the NIH Director on matters related to the Division of Program Coordination, Planning, and Strategic Initiatives (DPCPSI). The Council established the SRA Data Working Group in 2019 to provide recommendations to the Council on key factors for storing, managing, and accessing SRA data on cloud service provider environments.
The SRA Data Working Group is currently examining data analyses of SRA related to access, cost, and usage, as well as other areas. The SRA working group is using these analyses, among other factors and considerations, to evaluate and deliberate data storage options. The group reports to the Council of Councils and will provide findings and draft recommendations on an ongoing basis.
The working group presented its interim report of draft recommendations at the Council meeting on Jan. 24. The group provided recommendations that would reduce the overall storage footprint of the SRA data while maintaining access to and use of the data by the research community.
News and Events
- April 3, 2020: SRA cloud sequences hold the promise of additional discoveries related to COVID-19
- Feb. 24, 2020: All SRA data is made available in the cloud
- Jan. 24, 2020: SRA Data Working Group presents at DPCPSI Council of Councils
- Sept. 25, 2019: Five petabytes of SRA data moved to cloud
This page last reviewed on July 20, 2020