A computational framework to identify shared molecular etiology among rare diseases towards drug discovery (NCATS/DPI)

Project Point of Contact: Qian Zhu, PhD (Team lead of Rare Disease Translational Research) / Ewy A. Mathé, PhD (Director of Informatics Core)

Goals and Objectives: Drugs that target molecular pathways shared between multiple diseases can, in principle, be used to treat more than one disease, even if their clinical manifestations are very different. We aim to develop a computational framework with integration of biomedical, genomics, and clinical data, to identify shared molecular etiology among rare diseases to help inform drug repurposing and basket drug trial design.

Significance: Current methods of drug development are costly and inefficient and result in more failures than success. Especially for rare diseases, drug development is even more challenging because of the small cadre of researchers working on each disease, heterogeneity of the clinical manifestations, limited availability of patient cohort, and the limited marketability of the drugs even when they are successful. To address these challenges, we aim to identify rare diseases with shared molecular etiology to inform recruitment efforts for basket drug trial that would encompass a larger patient pool, and to derive new insights of understudied rare diseases, based on their relationship with better understood diseases, that would enable drug repurposing efforts. This work aligns well with the National Center for Advancing Translational Sciences (NCATS) mission to catalyze the generation of innovative approaches for enhancing the development of diagnostics and therapeutics in diseases, with an emphasis on rare diseases.  Further, our approach to building this framework encompasses all diseases, in line with NCATS focus on finding commonalities between all disease and the translational process.  This work also naturally aligns with the NIH mission, to seek fundamental knowledge about living systems and that knowledge to enhance health, by providing a means to identify shared molecular etiology among rare diseases to facilitate drug discovery efforts.

Description: Drug development in rare diseases is challenging due to heterogeneity of the clinical manifestations, small sample sizes (single digits), and the limited marketability of the drugs even when they are successful. Thus, there is an urgent need to develop innovative, data-driven approaches to increase the efficiency and success of drug development and testing of potentially effective products for these rare conditions. In contrast to the roughly 10,000 rare diseases, the number of underlying disease mechanisms and biochemical pathways is likely to be much smaller, as molecular mechanisms are shared across multiple diseases. For instance, rare monogenic diseases share a common set of etiologies, including premature termination codons, protein misfolding, abnormal RNA splicing. Identifying such shared molecular etiologies for rare diseases would provide key data for informing the development of preclinical and clinical efforts. In addition, integrating molecular phenotyping data (genomic, proteomic, etc.) with Electronic Health Records (EHR) would enhance our understanding of disease pathogenesis and offer putative drug targetable avenues. To reach our overarching goal of leveraging molecular etiology for drug discovery in rare diseases, our specific aims are:

Aim 1. We will integrate and analyze biomedical and genomic data pertinent to rare diseases from NCATS Genetic and Rare Disease (GARD) program to generate molecular etiology profiles for rare diseases. Next, we will categorize rare diseases based on their shared profiles. This aim will extend the work done by our current 2021 DATA Scholar with the use of additional resources, as listed in the section of “Data set(s) involved”.

Aim 2. Each category of rare diseases will be enriched with the integration of clinical data extracted from All of Us, National COVID Cohort Collaborative (N3C), and the Biomedical Data Translator, to further stratify those categories into subgroups (i.e., clusters) defined by their disease pathophysiology. Given the short timeline and as a proof-of-concept, one or two rare disease categories from Aim 1, will be selected to integrate with clinical data with consultation of our Subject Matter Experts (SMEs).

Aim 3. We will evaluate the utility of high-priority disease clusters, with consultation of our SMEs, by investigating the feasibility of or whether these clusters inform clinical trial design and drug repurposing efforts. Feedback and advice from SMEs will be fruitful to revise and enhance the proposed approach.

SMEs will be invited from the Division of Rare Disease Research Innovation (DRDRI), the Office of Drug Development Partnership Programs (ODDPP) and the Division of Preclinical Innovation (DPI) at NCATS.

Data set(s) involved: The scholar will work on the below datasets, but not limited to:

Biomedical annotations from NCATS resources:

  • Genetic and Rare Diseases (GARD) program maintains a rare disease catalog with around 10,000 rare diseases along with their relevant biomedical data, including genes, phenotypes, drugs, and so on
  • Rare Disease Alert System (RDAS) is a web application with an access to the latest rare disease related finding/information from multiple integrative knowledge graphs in neo4j as backend databases. The knowledge graphs include rare disease based Clinical Trial Knowledge Graph (rdas_CTKG), rare disease based Scientific Annotation Knowledge Graph (rdas_SAKG), and rare disease based NIH Grant Funding Knowledge Graph (rdas_GFKG), containing extracted information, annotations of genes, mutations, chemical, etc., from clinical trials, PubMed articles and NIH grant funding.
  • Pharos, is an integrated knowledgebase for the Druggable Genome (DG) to illuminate the uncharacterized and/or poorly annotated portion of the DG, focusing on three of the most commonly drug-targeted protein families, including G-protein-coupled receptors (GPCRs), Ion channels (ICs), Kinases
  • Inxight Drugs, contains information on ingredients in medicinal products including US FDA approved drugs, marked drugs, investigational drugs, and all substances
  • Biomedical Data Translator, integrates multiple types of existing data sources, including objective signs and symptoms of disease, drug effects, and intervening types of biological data relevant to understanding pathophysiology
  • Relational database of Metabolomic Pathways (RaMP-DB) integrates annotations on the chemistry, biology, and other ontologies relates to human metabolites, genes, and proteins from multiple sources

Genomic Annotations:

  • Human Gene Mutation Database
  • Online Mendelian Inheritance in Man (OMIM)
  • Kyoto Encyclopedia of Genes and Genomes (KEGG)

Clinical data:

  • NIH All of Us program including genomic and clinical data, additional clinical fields in electronic health records, and additional demographic data from surveys. (NCATS has gained the center wide DUA to support the research)
  • National COVID Cohort Collaborative (N3C), one of the largest collections of secure and deidentified clinical data in the United States for COVID-19 research
  • Clinical data from NCATS Biomedical Data Translator
  • Clinical data from NIH Biomedical Translational Research Information System (BTRIS)

Anticipated outcomes: 

  • A prototype of the proposed computational framework will be developed
  • One publication regarding the computational framework will be submitted to one biomedical informatics journal
  • One presentation will be given at AMIA
  • Internal presentations to our NCATS Data Science group and NCATS Rare Disease Informatics group

Required skills of the DATA Scholar: 

  • Expertise in cloud computing and programming (Python, R, etc.)
  • Expertise in developing artificial intelligence and machine learning models
  • Expertise in knowledge representation methods (semantic networks, knowledge graphs, etc.)
  • Expertise in biomedical and/or clinical data strongly preferred
  • Knowledge of rare disease and molecular biology preferred
  • Strong communications skills with Subject Matter Experts (SMEs)
  • Ability to work independently and collaboratively across NCATS, NIH

Expected/preferred length of DATA Scholar appointment: 2 years.

Expected/preferred time effort commitment of the DATA Scholar: Full time (100%)

Remote work preference: 100% remote allowable

ICO support: Our ICO’s culture emphasizes teamwork, collaborations, and inclusion, which will help the data scholar grow very quickly.  Specifically, the scholar will be working with a team of data scientists, rare disease specialists, bioinformaticians, program managers and computer programmers with diverse backgrounds and expertise from both intramural and extramural components of NCATS. In addition, the mentors (Drs. Zhu and Mathé) have extensive experience in biomedical informatics and mentoring, and will work very closely with the DATA scholar. The scholar will also have the opportunity to mentor a summer intern, thereby strengthening the scholar’s mentorship skills and supplementing the scholar’s work.

Additional activities: At NCATS, the scholar will be encouraged to contribute or listen into various research activities based on his/her research interests. For example, the scholar could join one or several of the interest groups within the Intramural Informatics Core at NCATS or within NIH, including clinical informatics IG, Bioinformatics IG, etc.  Participation and contribution in those activities will help the scholar stay on top of state-of-the art methodologies and will inspire new ideas and/or initiate research collaborations.

Career or professional development opportunities: The scholar will be able to network and gain professional advice for his/her career or professional path development from different aspects by interacting/collaborating with multidisciplinary scientists within and beyond NCATS, including trainees, junior/senior researchers, established investigators and program officers. The scholar will be supported to attend academic conferences, e.g. AMIA Annual Symposium, IEEE International Conference on BioInformatics and Biomedicine (BIBM) to present her/his work and social network with external experts.

To apply to this or other DATA Scholar positions, please see instructions here: datascience.nih.gov/data-scholars.

 

This page last reviewed on April 17, 2023