Data Science Community News
SYMPOSIUM AND WEBCAST : Principles for Data-Driven Decision Making
The abundance of large and complex data, coupled with powerful modeling techniques and analytic methods, creates tremendous opportunity for organizations and individuals to base their decisions on empirical evidence. However, to appreciate both the capabilities and limitations of these data and tools, decision makers need some understanding of data science principles. The National Academies of Sciences, Engineering, and Medicine invite you to attend our upcoming symposium and webcast on data-driven decision making that will take place on September 14, 2017 from 9:00am-5:00pm at the Keck Center in Washington, DC. The event will highlight simple principles that can support data-driven decision making and help decision makers learn the right questions to ask when presented with new analyses.
Register here to attend in person or online.
About Math and Statistics at the National Academies
The Board on Mathematical Sciences and Analytics (BMSA) leads activities in the mathematical sciences at the National Academies in topic areas including from applied mathematics, scientific computing, and risk analysis.
The Committee on Applied and Theoretical Statistics (CATS) organizes studies and events focusing on the statistical sciences, big data and data science, statistical education, the use of statistics, and issues affecting the field. CATS occupies a pivotal position in the statistical community, providing expertise in methodology and policy formation.
NIH Data Science Week 2017
The NIH Data Science week is a bi-annual series of talks and workshops focused on Data Science
hosted by the Data Science and Bioinformatics Scientific Interest Groups
Monday, September 18th
9 am - 12 pm EDirect workshop at NLM
NCBI staff will offer a workshop on EDirect, NCBI’s suite of programs for easy command line access to literature and biomolecular records. To join the workshop, please register.
1 pm - 3:30 pm Containerization Workshops and Roundtable -- Natcher Balcony A
Details: 1:30 - 2:10 Docker presentation; 2:10 to 2:50 Singularity Presentation; 2:50 to 3:30 Containerization, HPC and cloud roundtable.
Tuesday, September 19th
11 am - 12 pm Speaker: Sarah Pendergrass from Geisinger -- NLM Visitor Center, Bldg 38A (Lister Hill Center) Lobby
From Learning Health Care to Genetic Research: Precision Medicine In Action at Geisinger Health System
Learning Health Care is now becoming a reality within Geisinger Health System. The MyCode Community Health Initiative of Geisinger Health System has whole exome sequencing data and whole genome array genotyping for more than 90,000 individuals to date, and is continuing to expand. Geisinger provides primary and specialty care across the life span, and the biorepository of genetic data are linked to de-identified longitudinal health records. With the breadth of data being collected, Geisinger is returning genetic results to patients and engaging in a variety of research to bring additional clinical and genetic findings back to the clinic. This talk will cover return of results at Geisinger, and new research within the Pendergrass Lab.
*** There are still a few spots available to speak directly with Sarah. If interested, please email firstname.lastname@example.org
Thursday, September 21st
11 am - 12 pm Speaker: Jake Lever from UBC -- NLM Visitor Center
PubRunner: Keeping text mining up-to-date with the latest publications
Biologists face a daunting challenge when trying to read all relevant scientific literature for their field. Text mining tools are designed to assist them by aiding search, summarizing the latest research and identifying important patterns in the literature. However many published tools lay dormant, as code is not public and any results shared become out-of-date as new publications enter the field. Through the NCBI hackathons initiative, we have built PubRunner; a framework for managing download of the latest publications, execution of text mining tools, and sharing of the results. This effort aims to help research groups keep text mining tools alive and make text mining results even more valuable to the biology community.
3 pm - 4 pm Speaker: Imran Haque from Freenome -- NLM Lindberg Room, Bldg 38
Embracing heterogeneity: statistical limitations and opportunities in early detection liquid biopsies
The discovery of tumor-derived circulating cell-free DNA (ctDNA) in cancer patients has ignited interest and investment in developing blood-based assays to detect cancer at early, treatable stages. The existence of many analytical methods (dPCR, BEAMing, UMI-tagged high-depth NGS) to detect mutated tumor-derived material combined with increasing knowledge of the characteristics of tumor genomes has driven an empirical approach of “more is better” to translate assays developed on late-stage cancer patients to the early detection setting. However, there exists a lack of data and analysis on the feasibility of such a translation.
In this presentation, Imran will analyze fundamental statistical challenges in liquid biopsy, including benign somatic heterogeneity, and quantitative limitations in the analysis of patient samples. He will further demonstrate that these limitations arise from upstream statistical assumptions about the nature of the problem, and that relaxing these assumptions admits potential solutions of a different flavor: making use of modern machine learning to integrate both prior data as well as multi-analyte analysis on individual samples to address the fundamental challenges of liquid biopsy.
*** There are still spots available to speak directly with Imran. If interested, please email email@example.com
Upon approval by presenters, materials or links will be available at https://www.slideshare.net/DataScienceNIH/
Lessons Learned from Funding the International Open Science Prize
Major funding bodies reflect on developing and implementing the Open Science Prize, a novel approach for funding international open science, in an essay publishing August 1 in the open access journal PLOS Biology. The essay by Elizabeth Kittrie of The National Institutes of Health, Philip Bourne of the University of Virginia, and colleagues from the Wellcome Trust, in partnership with the Howard Hughes Medical Institute, provides a series of reflections, addressing topics such as partnership development and sustainability, and the challenges of multiple funders pursuing joint global health technology initiatives.
The Open Science Prize (launched in October 2015) was a global competition designed to encourage innovative solutions in public health and biomedicine using open digital content. Prize competitions have received increased attention within the U.S. federal government with the passage of the America Competes Re-Authorization Act of 2010. The PLOS Biology essay points to the importance of aligning policies, procedures and regulations of the various funding agencies when engaging in joint prize competitions. The collaboration for this competition led to the inclusion of international participants, a larger purse for winners, and a shared responsibility for the costs of running the challenge.
The grand prize winner, “Real-time Evolutionary Tracking for Pathogen Surveillance and Epidemiological Investigation,” created its nextstrain.org prototype that uses real-time visualization and viral genome data to track the spread of global pathogens such as Zika and Ebola. Prototypes developed by the six finalists can be accessed here: https://www.openscienceprize.org/
“The Open Science Prize model accelerates team science and exemplifies the force multiplier effect that can occur when funding agencies join forces around a common goal,” said Dr. Patti Flatley Brennan, NIH Interim Associate Director for Data Science, and director, National Library of Medicine. “At times of declining budgets, leveraging resources through partnerships can be a key strategy for promoting innovation.”
Citation: Kittrie E, Atienza AA, Kiley R, Carr D, MacFarlane A, Pai V, et al. (2017) Developing international open science collaborations: Funder reflections on the Open Science Prize. PLoS Biol 15(8): e2002617. https://doi.org/10.1371/journal.pbio.2002617
PLOS Biology is an open-access, peer-reviewed journal published by PLOS, featuring research articles of exceptional significance, originality, and relevance in all areas of biology. For more information visit http://journals.plos.org/plosbiology/, or follow @PLOSBiology on Twitter.
NIH Pi Day Celebration: New Date, New Location!
The National Institutes of Health will hold its third annual Pi Day Celebration on the NIH Main Campus on Pi Day 2.0, Thursday, May 18, 2107. As you may recall, the original Pi Day festivities, on 3.14, were postponed due to inclement weather. The goal of the NIH Pi Day Celebration is to increase awareness across the biomedical science community of the role that the quantitative sciences play in biomedical science.
Pi Day @ NIH will feature the following activities:
- 10:00 AM - 11:00 AM: Data Center Tours, Building 12A, Room 1100 (REGISTRATION REQUIRED)
- 11:00 AM - 12:00 PM: PiCo Lightning Talks by NIH staff, Masur Auditorium, Clinical Center (Building 10), first floor
VIEW VIDEOCAST AT: https://videocast.nih.gov/summary.asp?live=23246&bhcp=1
- 12:00 PM - 1:00 PM: Poster/Demo Session and Networking, FAES Terrace, Clinical Center (Building 10), first floor
- 1:00 PM - 2:00 PM: NIH Data Science Distinguished Seminar Series, Lecture by Simons Professor of Mathematics at MIT, Dr. Bonnie Berger, “The Mathematics of Biomedical Data Science,” Masur Auditorium, Clinical Center (Building 10), first floor
VIEW VIDEOCAST AT: https://videocast.nih.gov/summary.asp?live=23249&bhcp=1/
- 2:30 PM - 4:30 PM: Research Reproducibility Workshop, NIH Library Training Room, Clinical Center (Building 10), first floor, near the South Entrance (REGISTRATION REQUIRED)
NIH campus map: https://www.ors.od.nih.gov/maps/Pages/vis_map.aspx
For more information about the day's events, visit the NIH Pi Day website: http://nihpiday.nih.gov/.
Pi Day is celebrated on March 14th (3/14) around the world and, under normal circumstances, at NIH! The Greek letter Pi is the symbol used in mathematics to represent a constant—the ratio of the circumference of a circle to its diameter—which is approximately 3.14159.
Pi has been calculated to over one trillion digits beyond its decimal point. As an irrational and transcendental number, it will continue infinitely without repetition or pattern. While only a handful of digits are needed for typical calculations, Pi’s infinite nature makes it a fun challenge to memorize, and to computationally calculate more and more digits.
NIH Pi Day is a joint effort of multiple ICs, including CIT, NCI, NHGRI, and NLM, and the NIH Office of the Director, including the NIH Library and the Office of Intramural Research. Additional support is provided by the Foundation for Advanced Education in the Sciences (FAES) and the NIH Bioinformatics Special Interest Group.
For all events, sign language interpreters can be provided. Individuals with disabilities who need reasonable accommodation to participate in this event should contact Jacqueline Roberts, Jacqueline.Roberts@nih.gov, 301-594-6747, or the Federal Relay, 800-877-8339.
Open Science Prize announces nextstrain.org as Grand Prize Winner
Congratulations to the nextstrain.org development team led by Trevor Bedford, PhD, of the Fred Hutchinson Cancer Research Center, Seattle, and Richard Neher, PhD, of Biozentrum at the University of Basel, Switzerland winners of the grand prize of $230,000. Also participating were students from the laboratories of the team leaders; the University of Washington, Seattle; and the University of Auckland in New Zealand.
Read the official NIH press release.
A prototype online platform that uses real-time visualization and viral genome data to track the spread of global pathogens such as Zika and Ebola is the grand prize winner of the Open Science Prize. The international team competition is an initiative by the National Institutes of Health, in collaboration with the Wellcome Trust and the Howard Hughes Medical Institute (HHMI). The winning team, Real-time Evolutionary Tracking for Pathogen Surveillance and Epidemiological Investigation, created its nextstrain.org prototype to pool data from researchers across the globe, perform rapid phylogenetic analysis, and post the results on the platform’s website.
Genome sequences of viral pathogens provide a hugely valuable insight into the spread of an epidemic, but to be useful, samples have to be collected, analyzed and the results disseminated in near real-time. The statistical analyses behind nextstrain.org can be conducted in minutes, and can reveal patterns of geographic spread, timings of introduction events, and can connect cases to aid contact tracing efforts. The phylogenetic analyses are posted on the website as interactive and easy to understand visualizations. They hope that the platform will be of great use to researchers, public health officials and the public who want a snapshot of an epidemic.
Nextstrain.org placed first out of three top finalists, selected from a pool of 96 multinational, interdisciplinary teams including 450 innovators from 45 countries. This award is the culmination of a year-long process which included development and demonstration of working prototypes and multiple stages of rigorous review by panels of expert Open Science advisors and judges from the Wellcome Trust and NIH. All stages of the competition emphasized open science in both form and process, including public input for the award gathered via a global public voting portal. During the public voting phase, which narrowed the six finalists to three top contenders, nearly 4,000 online votes were cast by members of the public from a total of 76 countries on all six inhabited continents.
The Open Science Prize is a global competition designed to foster innovative solutions in public health and biomedicine using open digital content. As increasing amounts of data are produced by scientists around the world and made openly available through publicly-accessible repositories, a major challenge to fully maximize this health information will be the lack of tools, platforms, and services that enable the sharing and synthesizing of disparate data sources. Development in this area is essential to turning diverse types of health data into usable and actionable knowledge.
The prize, which was launched in October 2015, aims to forge new international collaborations that bring together open science innovators to develop services and tools of benefit to the global research community. All six finalist teams were considered exemplary by the funders and are to be commended for their tenacity in developing creative approaches to applying publicly-accessible data to solve complex biomedical and public health challenges. The topics spanned the breadth of biomedical and public challenges, ranging from understanding the genetic basis of rare diseases, mapping the human brain, and enhancing the sharing of clinical trial information. As evidenced from the six Open Science Prize finalists, public health and biomedical solutions are enriched when data are combined from geographically diverse sources. Final prototypes developed by the six finalists can be accessed on the Open Science Prize website.
NLM Director Dr. Patricia Flatley Brennan Appointed NIH Interim Associate Director for Data Science
ON JANUARY 6, 2017, the National Institutes of Health announced that National Library of Medicine Director Patricia Flatley Brennan, RN, PhD will assume an additional role as NIH Interim Associate Director for Data Science.
The NIH Associate Director for Data Science (ADDS) and team provide input to the overall NIH vision and actions undertaken by each of the 27 Institutes and Centers in support of biomedical research as a digital enterprise. Among other duties, the office oversees the Big Data to Knowledge (BD2K) initiative, stimulating the best developments in the data science community.
This year will see the transition of trans-NIH data science initiatives to NLM, with the operational oversight of the BD2K initiatives being housed within the Common Fund programs in the Division of Program Coordination, Planning and Strategic Initiatives. This change builds on the recommendations by the NLM Working Group Report to the NIH Director, makes concrete steps towards the vision of NLM’s future proclaimed in the Advisory Committee to the NIH Director’s report—that the National Library of Medicine become the “epicenter of data science for the NIH.”
“I believe the future of health and health care rests on data—genomic data, environmental sensor-generated data, electronic health records data, patient-generated data, research collected data,” Dr. Brennan observed. “The data originating from research projects is becoming as important as the answers those research projects are providing.”
“NLM must play a key role in preserving data generated in the course of research, whether conducted by professional scientists or citizen scientists,” she continued. “We know how to purposefully create collections of information and organize them for viewing and use by the public. We can extend this skill set to the curation of research data. We also have the utilities in place to protect the data by making sure only those individuals with permission to access data can actually do so.”
“NLM is well positioned to add these new functions to its research portfolio,” the NLM Director observed. “In this new year and the years to follow, we welcome these exciting opportunities and challenges.”
Big Data to Knowledge Multi-Council Working Group - January 2017
Notice is hereby given of a meeting of the Big Data to Knowledge (BD2K) Multi-Council Working Group.
Name of Working Group: Big Data to Knowledge Multi-Council Working Group
Date: January 9, 2017 - Canceled
This portion of the meeting is open to the public and is being held by teleconference. This is a listen ONLY meeting. Please submit any questions or comments via email to the contact person listed below.
Join WebEx Meeting
Meeting number: 627 298 875
Meeting password: 1234
Open Session: 11:00am - 12:00pm ET
Discussion will review current Big Data to Knowledge (BD2K) activities and newly proposed BD2K initiatives.
- Roll Call and Introduction
- Update from the Associate Director for Data Science
- BD2K All Hands Meeting and Open Data Science Symposium Recap
Closed Session: 12:30pm - 3:00pm ET
Agenda: Discussion will focus on review of proposed FY17 Funding Plans for BD2K Funding Opportunity Announcements and Administrative Supplements.
Individuals who plan to attend and need special assistance, such as sign language interpretation or other reasonable accommodations, should notify Tonya Scott, email: Tonya.Scott@nih.gov, phone: 301-402-9817.
Federal Register Meeting Announcement:
National Institutes of Health, Office of the Director - Notice of Meeting
Public Voting Determines Three Finalists for the Open Science Prize
Public voting for the Open Science Prize is now closed. Thank you to everyone who voted. The 3 prototypes which scored highest and will therefore be going forward to the next stage of review are:
MyGene2: Accelerating Gene Discovery with Radically Open Data Sharing
Real-Time Evolutionary Tracking for Pathogen Surveillance and Epidemiological Investigation
We will now be collecting expert reviews of these three prototypes. We anticipate announcing the the Grand Prize winner in early March 2017.
For additional information, contact: Elizabeth.Kittrie@nih.gov.
Need Cloud for Your Research? Calling All NIH Extramural Investigators
The NIH Big Data to Knowledge (BD2K) initiative has partnered with the CMS Alliance to Modernize Healthcare (CAMH), operated by MITRE, to launch and test a new funding paradigm that will provide NIH extramural researchers with access to cloud computing and storage capabilities. This funding model, called the Commons Credits Pilot, will provide extramural biomedical investigators with active NIH grants access to cloud-based environments to network, securely store, and share their work in the form of digital objects.
The first cycle for applications is open now through January 16, 2017.
Successful pilot applicants will receive dollar-denominated “credits” to obtain cloud-based computing and storage resources through an online market environment. Currently, the Commons Credits Pilot environment offers a variety of conformant cloud providers, including IBM, Seven bridges, and resellers of Google and Amazon. This list will grow as more vendors become available. Investigators will have the flexibility to select their preferred cloud provider from the list and provide feedback to NIH on their experiences. The Commons Credits Pilot is not a grants program; it has shorter application requirements and review times, ensuring that the credits are dispensed rapidly to keep pace with novel research.
An active NIH extramural grant is required for participation in the Commons Credits Pilot. Successful applications will likely complement the current grant(s) to enable novel research that may not have been accomplished or funded through other outlets. NIH expects that requests will not typically exceed $50,000 in dollar-denominated credits.
To date, the NIH Commons Credits Pilot has been shared with researchers at various research institutes and conferences, including the BD2K All-Hands Meeting held November 29-30, 2016. NIH encourages active NIH grant holders to take advantage of this new funding mechanism and we hope that you’ll also share this opportunity with your respective institutes.
Interested researchers should register and apply now at: http://www.commons-credit-portal.org. The Commons Credits Pilot team has created a short instructional video describing the application process within the portal to facilitate participation. To stay connected on the latest news regarding the NIH Commons Credit Pilot:
Please share this very exciting announcement with your extramural reasearch communities. For additional information, email the Commons Credits Pilot Team at: firstname.lastname@example.org.
Public Voting for the Open Science Prize is LIVE!
Public voting for the Open Science Prize is LIVE!
Help shape new directions in biomedical research by VOTING HERE.
Voting will be open December 1, 2016 through January 6, 2017 at 11:59pm PST.
In the spirit of Open Science, we invite you to help decide which of the prototypes competing for the Open Science Prize will be considered for the final grand prize. You will be asked to review 6 prototypes developed by the finalist teams and cast your vote for the most novel and impactful ones. The 3 prototypes receiving the highest number of public votes will advance to a final round of review by a panel of science experts and judges. A single, grand prize winner of $230,000 will be announced in March 2017.
In this competition, the teams were challenged to use open, publicly accessible data to improve human health. Each team produced prototypes that demonstrate how the power of Open Data can be harnessed to address a wide array of human health concerns through crowdsourcing or the development of innovative platforms on which to conduct computational modeling. Each team includes at least one U.S. and one international member with the goal of forging new collaborations with health and technology innovators from across the world, benefiting the global research community and the public in the process.
We invite you to watch the video demonstrations and test drive the prototypes before voting at: https://www.openscienceprize.org/. An archive of the NIH Open Data Science Symposium webcast is available here: http://www.tvworldwide.com/events/bd2k/161129/default.cfm?id=16845&type=flv&test=0&live=0, if you would like to watch the onstage prototype demonstrations or any other presentations from the Big Data to Knowledge (BD2K) All Hands Meeting (November 29-30) or Open Data Science Symposium (December 1).
The winning prototype will be selected by the National Institutes of Health and the Wellcome Trust and publically announced in March 2017. For additional information, email: Elizabeth.Kittrie@nih.gov.
The Open Science Prize is a collaboration between the National Institutes of Health (Bethesda, MD, USA) and the Wellcome Trust (London, UK), with additional funding provided by the Howard Hughes Medical Institute (Chevy Chase, MD, USA). This opportunity is being funded in part by the NIH Big Data to Knowledge (BD2K) Initiative.
We appreciate your help with getting the word out to your stakeholder communities about this worldwide public voting opportunity. Thank you for voting and helping to support the Open Science Prize.