This week finally sees the release of the NIH position statement on the use of dbGaP data in the cloud. We’ve been keenly aware of the need to address this issue for some time however, it’s essential that great care be taken when making changes to the processes for accessing and using sensitive, controlled-access data, such as that which resides in dbGaP (database of Genotypes and Phenotypes). For this reason it’s taken quite some time to think through all the issues and to develop our first steps towards supporting the secondary use of these data in the dynamic world of cloud computing.
The NIH Position Statement on the Use of Cloud Computing Services for the storage and analysis of controlled-access data subject to the NIH Genomic Data Sharing policy and the Model Data Use Certification have been posted on the NIH GDS Policy website.
The NIH Best Practices for Controlled-Access Data Subject to the NIH GDS Policy, which includes guidelines for cloud computing, is now available on the dbGaP website.
The new guidelines will permit investigators to request to move dbGaP data associated with a specific dbGaP project to a cloud provider of their choice. dbGaP data can reside within the cloud and be utilized for analysis purposes until the completion of the project whereupon the dbGaP data must be destroyed. The investigator and their associated institution will assume the responsibility for the security of the dbGaP data, not the cloud provider. As such, NIH has tried to provide as much information as possible for PIs, institutional signing officials and the IT staff who will be supporting these projects, to make sure they understand their responsibilities. We’ve also provided additional information about cloud computing best practices with links to current best practices from common cloud providers such as Amazon Web Services (AWS) and Google, so that institutions and their staff can make the most informed decisions. We expect to see additional cloud providers added to this list and will include links to their best practices as they become available.
Biomedical big data is primarily digital, and as such has a natural partner in computing technologies for its collection, storage, management and analysis. Cloud computing has gained traction within many biomedical communities because of its ability to support vast quantities of data with a fast, scalable, on-demand and low cost approach.
We expect to see cloud computing and data science technologies continue to evolve to support biomedical big data, but we also expect to see the NIH improve its understanding of these technologies and their impact on biomedical research and the polices that surround the use of this data.
Bringing about this change has taken a long and concerted effort between NIH OSP (Office of Science Policy), NCBI and the ADDS (Assoc. Director for Data Science) office.
This is one small step for NIH, one giant leap forward for the community or maybe it’s the other way round (with apologies to Neil Armstrong).