Principal Investigator: Prof. Robert Hoehndorf

Poster Presenter: Azza Althagafi

Lab: BORG | Bio-Ontology Research Group

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning: Insights from Saudi Patient Data

 

Abstract

 

Whole-exome and genome sequencing has become a common tool in diagnosing patients with rare diseases. Despite its success, this approach leaves many patients undiagnosed. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing single nucleotide variants, such methods are based on known gene-diseases association as training data. In addition, the difference in phenotypes that come from clinicians with the phenotypes in the public databases makes it more challenging to predict the causing variants. We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information.