Wednesday, January 20, 2021
Office of Data Science Strategy: 2020 Year in Review
The National Institutes of Health (NIH) Office of Data Science Strategy (ODSS) began 2020 with plans to announce the Data and Technology Advancement (DATA) National Service Scholars program and new funding opportunities for software development and support for data repositories and knowledgebases. Along with accomplishing these goals, the office also added six new teammates and celebrated Dr. Susan Gregurick’s one-year anniversary as Associate Director for Data Science with a special blog post featuring NIH women in tech/data science. Additionally, the office was pleased to see all Sequence Read Archive data move to the cloud and the release of the Final NIH Policy for Data Management and Sharing in 2020.
In between these accomplishments—and the highlights that follow—ODSS quickly pivoted its focus in the spring to support NIH’s efforts to combat the COVID-19 pandemic. Under Gregurick’s leadership, a team of more than 100 dedicated NIH employees have spent much of 2020 rapidly creating the necessary infrastructure for COVID-19 data hubs. ODSS created a page of open-access data and computational COVID-19 resources and co-hosted with the National Library of Medicine (NLM) a webinar in April to provide tips for researchers to share, discover, and cite COVID-19 data and code via generalist repositories to accelerate discovery. In July, ODSS and NLM cohosted a virtual workshop on Jumpstarting COVID-19 Clinical Data Access.
Read below for a recap of some of our notable achievements in 2020 and a look-ahead at what to expect in 2021.
- ODSS Completes One-Year Pilot of NIH Figshare Instance
- Researcher Auth Service Launches Phase 1
- NIH Awards 28 Supplements to Advance Software Tools for Open Science
- ODSS Launches Data and Technology Advancement (DATA) National Service Scholars Program
- Interagency Smart and Connected Health Program Expands to Include Artificial Intelligence, Advanced Data Science
- Two Cohorts of Coding it Forward Civic Digital Fellows Take on New Challenges
- NIH Hosts Workshops on the Role of Generalist Repositories, Data Metrics
- NIH Cloud Platform Interoperability Effort Marks a Year of Progress
- NIH STRIDES Initiative Highlights Benefits of Cloud with Success Stories
- Data Science Activities in 2021
ODSS Completes One-Year Pilot of NIH Figshare Instance
In July 2019, NIH’s Office of Data Science Strategy (ODSS) established the NIH Figshare instance, a one-year pilot with existing generalist repository Figshare, to determine how biomedical researchers may use a generalist repository for sharing and reusing NIH-funded data.
NIH’s overarching goal is to support a more seamless repository ecosystem to ensure that data and other digital objects resulting from NIH research can be stored and shared with the research community. While NIH encourages the use of domain-specific or institutional repositories where available, not all datasets have a logical home in one of these repositories. This pilot allowed ODSS to test the need for and utility of a generalist repository to fill these gaps in the biomedical data repository landscape.
Over the course of the one-year pilot, NIH assessed how the NIH Figshare instance was meeting researchers’ needs and how it was making an impact on data sharing and discovery.
Researcher Auth Service Integrates Single Sign-on for Researchers
In 2020, NIH launched the first two phases of the Researcher Auth Service (RAS) initiative. RAS is designed to make it easier to find and work with data by enabling single sign-on for researchers working in the NIH data ecosystem. With the early implementation of RAS, NIH staff or extramural researchers can log into NIH data platforms using their NIH or eRA Commons credentials. Authentication and authorization tokens move with the researcher as they navigate to any of the eight participating platforms, eliminating redundant logins while centrally enforcing policies so a user can only access data they have been authorized to view.
The RAS initiative will add new log-in options for extramural researchers, support multifactor authentication, and add more NIH data platforms for researchers to access in 2021.
NIH Awards 28 Supplements to Advance Software Tools for Open Science
In 2020 the ODSS and 24 institutes and centers at NIH announced a notice for administrative supplements to enhance software tool development for open science (NOT-OD-20-073). Twenty-eight awards were made in the fall of 2020 to principal investigators at 26 different institutions across the United States.
These supplements will invest in research software tools with recognized value in a scientific community to enhance their impact by leveraging best practices in software development and advances in cloud computing. The supplements are intended to support collaborations between biomedical scientists and software engineers to enhance the design, implementation, and “cloud-readiness” of research software.
ODSS Launches Data and Technology Advancement (DATA) National Service Scholars Program
The DATA Scholars program launched in 2020 to bring expert data and computer scientists and engineers to NIH to tackle challenging biomedical data problems with the potential for substantial public health impact. Seven Scholars are now on board working to
- Unravel the Alzheimer’s Disease genome.
- Support cancer knowledge extraction.
- Accelerate the clinical adoption of machine intelligence applications in medical imaging.
- Harness data science for health discovery and innovation in Africa.
- Catalyze mental health and substance abuse research.
- Expand theories of brain circuits.
- Integrate NIH cloud-based platforms for genomics research.
ODSS will begin recruitment for a second cohort of Scholars in February 2021.
Interagency Smart and Connected Health Program Expands to Include Artificial Intelligence, Advanced Data Science
NIH announced a new opportunity in artificial intelligence (AI) and advanced data science in 2020 via an interagency program with the National Science Foundation on Smart Health and Biomedical Research (NOT-OD-21-011). Twenty-two of the 27 NIH institutes and centers have signed on to this expanded initiative, which supports innovative, high-risk/high-reward research with the promise of disruptive transformations in biomedical research.
The first proposal deadline for the new opportunity is Feb. 16, 2021.
Two Cohorts of Coding it Forward Civic Digital Fellows Take on New Challenges
ODSS hosted two cohorts of Coding it Forward fellows across NIH in 2020. These undergraduate and master’s-level fellows spend 10 weeks channeling their computational expertise toward hands-on experience with biomedical data-related challenges. Traditionally an in-person fellowship, these fellows had the unique distinction of working remotely with their NIH teams. A cohort of 16 students completed a virtual fellowship for 10 weeks in the summer, and 24 students took advantage of a first-ever fall fellowship, finding a silver lining among the consequences of COVID-19.
NIH Hosts Workshops on the Role of Generalist Repositories, Data Metrics
The NIH Workshop on the Role of Generalist Repositories to Enhance Data Discoverability and Reuse highlighted the breadth and depth of activities of generalist and institutional repositories in supporting biomedical researchers and biomedical data. The workshop was held Feb. 11-12, 2020, with more than 750 in-person and videocast attendees to explore the roles of generalist and institutional data repositories in the biomedical data repository landscape. The workshop had five key goals and supported NIH’s ongoing efforts to provide researchers with appropriate solutions to make their data findable, accessible, interoperable, and reusable (FAIR).
ODSS hosted a virtual workshop on assessing dataset and data resource value and reach on Feb. 19, 2020. The goal of this workshop was to discuss core metrics, use cases, and best practices to better understand data usage and impact. The workshop focused on two types of data resources – repositories and knowledgebases – and brought together managers of diverse biomedical data resources to discuss community-supported best practices for data metrics.
As a follow-on activity, NIH is seeking input from data resource funders and managers through a short survey on metrics for biomedical data resources.
NIH Cloud Platform Interoperability Effort Marks a Year of Progress
The NIH Cloud Platform Interoperability (NCPI) effort marked a year of progress toward creating a federated data ecosystem to improve researchers’ access to data in 2020. When researchers obtain data from a specific platform, there is no guarantee that the data will be readily usable alongside data from a different platform. By focusing on interoperability, the NCPI effort is ensuring that researchers can both find and integrate data more easily from the participating platforms: Cancer Research Data Commons, Kids First Data Resource Center, BioData Catalyst, AnVIL.
NIH STRIDES Initiative Highlights Benefits of Cloud with Success Stories
The NIH STRIDES Initiative continued to provide researchers a pathway to industry-leading cloud services and tools. In 2020, we shared how the University of Michigan expanded access to TOPMed genomics data through cloud services and how the University of Wisconsin used the cloud to boost collaboration with its TOPMed program.
Data Science Activities in 2021
ODSS expects 2021 to be a year of continued growth with greater emphasis on artificial intelligence (AI) and the Fast Healthcare Interoperability Resources® (FHIR®) standard for research. Here is a quick look at expected highlights in data science activities in 2021:
- A new data science seminar series highlighting data sharing and reuse.
- Workshops to engage the community and our stakeholders on advancing the use of FHIR in research and strategies for FAIR (findable, accessible, interoperable, reusable) data sharing.
- New initiatives in workforce development for AI-ready data, and to address AI challenges in ethics and bias.
- Added emphasis on data science training and reaching diverse audiences.
- New efforts to increase the coopetition of data repositories and programs to promote FAIR data and TRUSTworthy repositories.
- Expanded interagency programs to support biomedical software development and AI.
- Steady progress on infrastructure improvements to make it easier for researchers to access NIH data.
- New COVID-19 data activities focused on sequalae data and more.