Bringing Hope to Untreatable Rare Diseases through Data Science

Institute or Center: National Center for Advancing Translational Sciences (NCATS)

Project: Bringing Hope to Untreatable Rare Diseases through Data Science

Skills sought:

  • Proficient in Python, Perl, Java, C and/or C++
  • Experience in graph, NoSQL, and/or RDBMS databases, and semantic data standards (e.g., OWL, RDF)
  • Machine learning and/or deep learning (Bayesian inference, probabilistic programming, natural language processing, etc.)
  • Knowledge engineering and integration, data harmonization, and interoperability
  • Experience with Linux OS, cloud infrastructure (e.g., AWS, Azure, GCP), and R or MATLAB (preferred)

About the position: NCATS seeks a data scientist to play an integral role in evaluating thousands of rare diseases to elucidate new knowledge in a currently untapped industry. The Scholar’s work will serve to further the research understanding of more than 7,000 rare diseases for which existing data is currently ripe for integration and use.

The Scholar will have support from expert NCATS staff who will help translate the Scholar’s work into actionable and scalable solutions. This work will support NCATS’ mission of bringing more treatments to more patients more quickly.

About the work: The Scholar will  

  • Identify rare diseases that may be candidates for repurposing existing approved drugs.
  • Explore publicly available data resources in the rare disease research landscape.
  • Create models extracting knowledge, harmonizing data, and validating evidence to find meaningful research targets filtered from heterogeneous groups of diseases limited by small sample sizes.
  • Disseminate findings by presenting at conferences, submitting manuscripts for publication, and building a GitHub site that captures code to aid in replicating results.

Datasets involved: The DATA Scholar will take advantage of existing, publicly available disease ontologies and research and health data that is currently captured in a graph data warehouse using Neo4j.

Why this project matters: Of the more than 7,000 rare diseases that affect humans, only a few hundred have any treatment. Although each rare disease affects fewer than 200,000 Americans, in total these illnesses affect an estimated 30 million people in the United States. Since rare diseases often are difficult to diagnose, it can take years to obtain an accurate diagnosis. Even after a proper diagnosis, treatment often is unavailable because only about 5% of rare diseases have a treatment approved by the FDA.

This project will bring hope to the millions affected by rare diseases by furthering NCATS’ understanding of these diseases and ultimately helping develop more treatments. The use of a knowledgebase and data models can support and further all the stages of translational research from basic research understanding to therapeutic development.

Work Location: Rockville, MD

Work environment: NCATS values a team science-based approach that is multidisciplinary, supportive, and collaborative. The Scholar will join a team experienced in clinical informatics and bioinformatics, translational research, clinical research in rare diseases, and informatics project management. This team meets regularly, supports open communication, and includes experts from intramural and extramural research programs.

To apply to this or other DATA Scholar positions, please see instructions here:

This page last reviewed on February 22, 2021