Skip to Main Content

News & Events

Data Science Community News

SYMPOSIUM AND WEBCAST : Principles for Data-Driven Decision Making

September 7, 2017

The abundance of large and complex data, coupled with powerful modeling techniques and analytic methods, creates tremendous opportunity for organizations and individuals to base their decisions on empirical evidence. However, to appreciate both the capabilities and limitations of these data and tools, decision makers need some understanding of data science principles. The National Academies of Sciences, Engineering, and Medicine invite you to attend our upcoming symposium and webcast on data-driven decision making that will take place on September 14, 2017 from 9:00am-5:00pm at the Keck Center in Washington, DC. The event will highlight simple principles that can support data-driven decision making and help decision makers learn the right questions to ask when presented with new analyses.

Register here to attend in person or online.

About Math and Statistics at the National Academies

The Board on Mathematical Sciences and Analytics (BMSA) leads activities in the mathematical sciences at the National Academies in topic areas including from applied mathematics, scientific computing, and risk analysis.

The Committee on Applied and Theoretical Statistics (CATS) organizes studies and events focusing on the statistical sciences, big data and data science, statistical education, the use of statistics, and issues affecting the field. CATS occupies a pivotal position in the statistical community, providing expertise in methodology and policy formation.

NIH Data Science Week 2017

September 7, 2017

The NIH Data Science week is a bi-annual series of talks and workshops focused on Data Science
hosted by the Data Science and Bioinformatics Scientific Interest Groups

Monday, September 18th

9 am - 12 pm EDirect workshop at NLM

NCBI staff will offer a workshop on EDirect, NCBI’s suite of programs for easy command line access to literature and biomolecular records. To join the workshop, please register.

1 pm - 3:30 pm Containerization Workshops and Roundtable -- Natcher Balcony A

Details: 1:30 - 2:10 Docker presentation; 2:10 to 2:50 Singularity Presentation; 2:50 to 3:30 Containerization, HPC and cloud roundtable.

Tuesday, September 19th

11 am - 12 pm Speaker: Sarah Pendergrass from Geisinger -- NLM Visitor Center, Bldg 38A (Lister Hill Center) Lobby

From Learning Health Care to Genetic Research: Precision Medicine In Action at Geisinger Health System

Learning Health Care is now becoming a reality within Geisinger Health System. The MyCode Community Health Initiative of Geisinger Health System has whole exome sequencing data and whole genome array genotyping for more than 90,000 individuals to date, and is continuing to expand. Geisinger provides primary and specialty care across the life span, and the biorepository of genetic data are linked to de-identified longitudinal health records. With the breadth of data being collected, Geisinger is returning genetic results to patients and engaging in a variety of research to bring additional clinical and genetic findings back to the clinic. This talk will cover return of results at Geisinger, and new research within the Pendergrass Lab.

*** There are still a few spots available to speak directly with Sarah.  If interested, please email

Thursday, September 21st

11 am - 12 pm Speaker: Jake Lever from UBC -- NLM Visitor Center

PubRunner: Keeping text mining up-to-date with the latest publications

Biologists face a daunting challenge when trying to read all relevant scientific literature for their field. Text mining tools are designed to assist them by aiding search, summarizing the latest research and identifying important patterns in the literature. However many published tools lay dormant, as code is not public and any results shared become out-of-date as new publications enter the field. Through the NCBI hackathons initiative, we have built PubRunner; a framework for managing download of the latest publications, execution of text mining tools, and sharing of the results. This effort aims to help research groups keep text mining tools alive and make text mining results even more valuable to the biology community.

3 pm - 4 pm Speaker: Imran Haque from Freenome -- NLM Lindberg Room, Bldg 38

Embracing heterogeneity: statistical limitations and opportunities in early detection liquid biopsies

The discovery of tumor-derived circulating cell-free DNA (ctDNA) in cancer patients has ignited interest and investment in developing blood-based assays to detect cancer at early, treatable stages. The existence of many analytical methods (dPCR, BEAMing, UMI-tagged high-depth NGS) to detect mutated tumor-derived material combined with increasing knowledge of the characteristics of tumor genomes has driven an empirical approach of “more is better” to translate assays developed on late-stage cancer patients to the early detection setting. However, there exists a lack of data and analysis on the feasibility of such a translation.

In this presentation, Imran will analyze fundamental statistical challenges in liquid biopsy, including benign somatic heterogeneity, and quantitative limitations in the analysis of patient samples. He will further demonstrate that these limitations arise from upstream statistical assumptions about the nature of the problem, and that relaxing these assumptions admits potential solutions of a different flavor: making use of modern machine learning to integrate both prior data as well as multi-analyte analysis on individual samples to address the fundamental challenges of liquid biopsy.

*** There are still spots available to speak directly with Imran.  If interested, please email

Upon approval by presenters, materials or links will be available at

More News

Top Stories

Pi Day at NIH

Pi Day at NIH 3.1417

NIH invites you to celebrate the intersection between the mathematical and biomedical sciences with a day of events and activities.
Back to Top