
Open-Access Data and Computational Resources to Address COVID-19
Open-Access Data and Computational Resources to Address COVID-19
COVID-19 open-access data and computational resources are being provided by federal agencies, including NIH, public consortia, and private entities. These resources are freely available to researchers, and this page will be updated as more information becomes available.
The Office of Data Science Strategy seeks to provide the research community with links to open-access data, computational, and supporting resources. These resources are being aggregated and posted for scientific and public health interests. Inclusion of a resource on this list does not mean it has been evaluated or endorsed by NIH.
To suggest a new resource, please send an email with the name of the resource, the website, and a short description to datascience@nih.gov.
See Computational Resources
See Supporting Resources
Resource | Resource Description | Data Type | NIH Funded |
---|---|---|---|
AccessClinicalData@NIAID |
NIAID Clinical Trials Data Repository, AccessClinicalData@NIAID, is a NIAID cloud-based, secure data platform that enables sharing of and access to reports and data sets from NIAID COVID-19 and other sponsored clinical trials for the basic and clinical research community. |
clinical studies |
![]() |
Amazon Web Services (AWS) data lake for analysis of COVID-19 data |
A centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of SARS-CoV-2 and COVID-19. Information on how to best use this resource is available. |
dashboards and visualization tools, epidemiology, healthcare resources, literature | |
Broad Terra cloud commons for pathogen surveillance |
The Broad Terra cloud workspace for best practices with COVID-19 genomics data
|
genomics | |
CAS COVID-19 antiviral candidate compounds dataset |
The open source dataset of nearly 50,000 chemical substances includes antiviral drugs and related compounds that are structurally similar to known antivirals for use in applications including research, data mining, machine learning and analytics. A COVID-19 Protein Target Thesaurus is also available. CAS is a division of the American Chemical Society. |
chemical structure data | |
CDC COVID-19 Cases, Data, and Surveillance |
The CDC is providing a variety of data on COVID-19 in the United States. |
dashboards and visualization tools, epidemiology, healthcare resources | |
China National Center for Bioinformation's 2019 Novel Coronavirus Resource (2019nCoVR) |
Maintained by China National Center for Bioinformation/National Genomics Data Center, 2019nCoVR is a comprehensive resource on COVID-19, combining up-to-date information on all published sequences, mutation analyses, literatures and others. |
dashboards and visualization tools, genomics, literature | |
ClinicalTrials.gov COVID-19 related studies |
View listed clinical studies related to the coronavirus disease (COVID-19). Studies are submitted in a structured format directly by the sponsors and investigators conducting the studies. Submitted study information is generally posted on ClinicalTrials.gov within 2 days after initial submission and site content is updated daily. Full website content is also available through the API. |
clinical studies |
![]() |
Collection of 3D Print Models of SARS-CoV-2 virions and proteins |
This collection of files contains information for printing 3D physical models of SARS-CoV-2 proteins and is part of the NIH 3D Print Exchange. |
chemical structure data |
![]() |
CORD-19: COVID-19 Open Research Dataset and AI Challenge |
Freely available dataset of 45,000 scholarly articles, including over 33,000 with full text, on COVID-19, SARS-CoV-2, and related coronaviruses. This machine-readable resource is provided to enable the application of natural language processing and other AI techniques. See the CORD-19 Challenge, developed in partnership with Kaggle. Amazon Web Services has a CORD-19 search website. Read the accompanying call to action from the White House Office of Science & Technology Policy and learn more about the creation of CORD-19. |
literature | |
Coronavirus3D |
This web-based viewer offers 3D visualization and analysis of SARS-CoV-2 protein structures with respect to the CoV-2 mutational patterns. |
chemical structure data |
![]() |
COVID Digital Pathology Resource (COVID-DPR) |
The COVID-DPR provides whole slide images of histopathologic samples relevant to COVID-19, including biopsy samples and autopsy specimens. The current focus of the repository includes tissue from the lungs, heart, liver, and kidney. The repository contains examples of H1N1, SARS, and MERS for comparison. |
digital images |
![]() |
COVID-19 Datasets on The Cancer Imaging Archive (TCIA) |
The NCI Cancer Imaging Program (CIP) is utilizing its Cancer Imaging Archive as a resource for making COVID-19 radiology and digitized histopathology patient image sets publicly available. |
digital images |
![]() |
COVID-19 Genome Sequence Dataset on Registry of Open Data on AWS |
A centralized sequence repository for all strains of novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI). Included are both the original sequences submitted by the principal investigator as well as SRA-processed sequences that require the SRA Toolkit for analysis. |
genomics |
![]() |
Dimensions COVID-19 publications, datasets, and clinical trials |
All Dimensions publications, datasets, and clinical trials related to COVID-19, updated daily. Content exported from the openly accessible Dimensions application accessible at https://covid-19.dimensions.ai/. |
literature | |
EMBL-EBI's COVID-19 Data Portal |
The European Bioinformatics Institute (EMBL-EBI), part of the European Molecular Biology Laboratory, has a COVID-19 Data Portal to facilitate data sharing and analysis and ultimately contribute to the European COVID-19 Data Platform. EMBL-EBI is part of the International Nucleotide Sequence Database Collaboration (INSDC); the National Center for Biotechnology Information (NCBI) is the U.S. partner of the INSDC. |
chemical structure data, genomics, literature, RNA-seq and expression counts | |
European CDC geographic distribution of COVID-19 cases worldwide |
The downloadable data file is updated daily and contains the latest available public data on COVID-19. Each row/entry contains the number of new cases reported per day and per country. You may use the data in line with ECDC’s copyright policy. |
epidemiology | |
GenBank Nucleotide Sequences |
Provides rapid, open, and unrestricted access to virus nucleotide sequences and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query. |
genomics |
![]() |
GenBank Protein Sequences |
Provides rapid, open, and unrestricted access to virus conceptually translated protein sequences and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query. |
genomics |
![]() |
GEO DataSets |
Human transcriptional responses to SARS-CoV-2 infection |
RNA-seq and expression counts |
![]() |
GISAID |
International database of hCoV-19 genome sequences and related clinical and epidemiological data |
genomics | |
Google Cloud Platform (GCP) Datasets for COVID-19 Research |
GCP is hosting a repository of public datasets and offering free hosting and queries of COVID datasets. Learn more about the free hosting and queries of COVID datasets. |
epidemiology, healthcare resources, social sciences | |
iSearch COVID-19 Portfolio |
Comprehensive, expert-curated portfolio of COVID‑19 publications and preprints that includes peer-reviewed articles from PubMed and preprints from medRxiv, bioRxiv, ChemRxiv, and arXiv. |
literature |
![]() |
LitCovid |
NLM curated literature hub for COVID-19 |
literature |
![]() |
Modeling Infectious Disease Agents Study (MIDAS) online portal for COVID-19 |
NIGMS-funded modeling research. Public-access data collections with documented metadata. |
case studies, dashboards and visualization tools |
![]() |
National COVID Cohort Collaborative (N3C) |
The National COVID Cohort Collaborative, N3C is one of the largest repositories of longitudinal COVID-19 real-world deidentified linked clinical, CMS, Mortality and Viral Variants data in an open science secure collaboration analytics environment. More information about N3C and how to get access can be found at https://covid.cd2h.org/onboarding |
chemical structure data, dashboards and visualization tools, healthcare resources |
![]() |
NCATS OpenData | COVID-19 |
NCATS is generating a collection of datasets by screening a panel of SARS-CoV-2-related assays against all approved drugs. These datasets, as well as the assay protocols used to generate them, are being made immediately available to the scientific community on this site as these screens are completed. |
bioactivity, chemical structure data, dashboards and visualization tools |
![]() |
NCBI Virus: SARS-CoV-2 data hub |
SARS-CoV-2 focused content from NCBI Virus, including links to related resources. Search, filter, and download the most up-to-date nucleotide and protein sequences from GenBank and RefSeq (taxid 2697049). Generate multiple sequence alignments and phylogenetic trees for sequences of interest. Provides one-click access to the Betacoronavirus BLAST database and relevant literature in PubMed. |
genomics |
![]() |
Nextstrain COVID-19 genetic epidemiology |
Open-source SARS-CoV-2 genome data and analytic and visualization tools |
genomics | |
OpenICPSR's COVID-19 Data Repository |
The Inter-university Consortium for Political and Social Research (ICPSR) has launched a new repository of data examining the impact of the novel coronavirus global pandemic. This repository is a free, self-publishing option for researchers to share COVID-19 related data. |
social sciences | |
outbreak.info |
A resource to aggregate data critical to scientific research during outbreaks of emerging diseases, such as COVID-19 |
epidemiology |
![]() |
PubChem |
Small molecule compounds, bioactivity data, biological targets, bioassays, chemical substances, patents, and pathways |
bioactivity |
![]() |
PubMed Central (PMC) COVID-19 Initiative |
On March 13, national science and technology advisors from a dozen countries, including the United States, called on publishers to voluntarily agree to make their COVID-19 and coronavirus-related publications, and the available data supporting them, immediately accessible in PMC and other appropriate public repositories to support the ongoing public health emergency response efforts. The articles added to PMC are distributed through the PMC Open Access Subset and are made available in CORD-19. |
literature |
![]() |
RCSB Protein Data Bank COVID-19/SARS-CoV-2 Resources |
The RCSB Protein Data Bank is offering access to COVID-19 related PDB structures for research and related images and videos for education. |
chemical structure data |
![]() |
Reactome |
Reactome is a free, open-source, curated and peer-reviewed pathway database. The goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education. In response to the COVID-19 pandemic, Reactome is fast-tracking the annotation of human coronavirus infection pathways. |
dashboards and visualization tools, genomics |
![]() |
SARS-CoV-2 Related Structures |
A database of carefully validated SARS-CoV-2 protein structures, including many structural models which have been re-refined or re-processed. The resource is being updated weekly by Minor Lab at the University of Virginia as new SARS-CoV-2 structures are being deposited to the Protein Data Bank. |
chemical structure data |
![]() |
Sequence Read Archive (SRA) |
Provides rapid, open, and unrestricted access to virus nucleotide or metagenomic sequence data and is the repository being recommended by NIAID and CDC for investigator and public health submissions. Due to the scale of data indexing, there may be a delay before new submissions are indexed and retrievable with a term-based query. |
genomics |
![]() |
Computational Resources to Address COVID-19
See Data Resources
See Supporting Resources
Resource | Resource Description | NIH Funded |
---|---|---|
Atrio |
Powered by Atrio software platform offers easy access to large numbers of freely available, high-performing GPU and CPU resources. Contact support for help creating portable application containers that are performance optimized for these powerful systems. |
|
Betacoronavirus BLAST |
BLAST database containing sequences from Betacoronavirus (taxid 694002), including the latest SARS-CoV-2 sequences in GenBank and RefSeq. |
![]() |
Cloud resources for COVID-19 research |
Freely available high-performance computing resources immediately available for COVID-19 research. Provided by Rescale, Google Cloud, and Microsoft Azure. |
|
The COVID-19 High Performance Computing (HPC) Consortium |
Computing Infrastructure: XSEDE provides the portal, computing resources updated regularly, includes DOE National Laboratories, IBM, NSF, NASA, tech companies and academic computing centers. |
Supporting Resources
See Data Resources
See Computational Resources
Resource | Resource Description | NIH Funded |
---|---|---|
Data-Against-COVID Team |
A group of more than 600 volunteer data scientists, machines learning experts, bioinformaticians and professional software developers who have joined together to offer their expertise for any data analysis problems that arise in the context of the ongoing coronavirus pandemic. |
|
GenBank/SRA SARS-CoV-2 Sequence Submissions |
Quickly and easily submit assembled and unassembled SARS-CoV-2 data with help from NCBI if needed. |
![]() |
NASEM Standing Committee on Emerging Infectious Diseases and 21st Century Health Threats |
This National Academies of Science, Engineering, and Medicine (NASEM) standing committee provides rapid expert consultation on data elements and systems design for modeling and decision making for the COVID-19 pandemic. |
|
NIAID Overview of Coronaviruses |
Information about coronaviruses, including COVID-19, and resources for researchers |
![]() |
Research Data Alliance Working Group |
Guidelines for data deposition in any common data hub or platform to facilitate data sharing in public health emergencies for scientific research |
|
Schema.org |
Schema.org 7.0 includes fast-tracked new vocabulary to assist the global response to the Coronavirus outbreak. Schema.org creates, maintains, and promotes schemas for structured data. |
|
UC Health clinical data warehouse |
Data warehouse using Observational Medical Outcomes Partnership standard to integrate patient data across University of California health systems |
|
Viral Annotation DefineR (VADR) Sequence Annotation Tool |
NCBI developed a system called Viral Annotation DefineR (VADR) that validates and annotates viral sequences, including SARS-CoV-2. |
![]() |
Virus Outbreak Data Network (VODAN) |
CODATA, RDA, WDS, and GO FAIR have created a Virus Outbreak Data Network (VODAN) to make SARS CoV-2 virus data FAIR, or findable, accessible, interoperable and reusable, by both humans and machines. |
|
Virus Pathogen Resource (ViPR) |
ViPR is an NIAID-funded resource that support the research of viral pathogens in the NIAID Category A-C Priority Pathogen lists and those causing (re)emerging infectious diseases. It provides a dedicated gateway to SARS-CoV-2 data that integrates data from external sources (GenBank, UniProt, Immune Epitope Database, Protein Data Bank), direct submissions, analysis pipelines and expert curation, and provides a suite of bioinformatics analysis and visualization tools for virology research. |
![]() |
Webinar on Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories |
Generalist repositories are supporting the discoverability and reusability of COVID-19 data and associated code in different ways. See the presentations and resources from the webinar held April 24 or visit resources directly by clicking the links below. |
This page last reviewed on May 13, 2020