An Integrated Machine Learning and Functional Analysis Approach for Resolution of Variants of Uncertain Significance (VUS) in STXBP1
Abstract number :
2.317
Submission category :
12. Genetics / 12A. Human Studies
Year :
2021
Submission ID :
1826160
Source :
www.aesnet.org
Presentation date :
12/5/2021 12:00:00 PM
Published date :
Nov 22, 2021, 06:52 AM
Authors :
Jeffrey Calhoun, PhD - Northwestern University; Jonathan Gunti, BS – Northwestern University; Heather Mefford – St. Jude Children's Research Hospital; Santiago Schnell – University of Michigan; Vanessa Aguiar-Pulido – University of Miami; Margaret Ross – Weill Cornell Medical College; Jack Parent – University of Michigan; Lori Isom – University of Michigan; EpiMVP consortium – EpiMVP Consortium; Gemma Carvill – Northwestern University
Rationale: Variants of uncertain significance (VUS) pose a significant challenge for genetic diagnosis of epilepsy, in that these variants can be classified as neither pathogenic nor benign. In order to address this challenge, we established a highly integrated Epilepsy Multiplatform Variant Predictor (EpiMVP) Center Without Walls to develop precise, single-gene in silico pathogenicity prediction tools (EpiPred) for the most common epilepsy-associated genes. We are using supervised and unsupervised machine learning algorithms to discriminate pathogenic from benign variants. This approach will be bolstered by functional characterization of a subset of variants in multiple cellular and animal models. Functional data will then be incorporated into the machine learning model to improve classification performance in an iterative fashion. The first gene for study is STXBP1, a gene implicated in a range of pediatric-onset epilepsies.
Methods: We first curated a robust training set for the development of our classifier. This consisted of known benign or likely benign (BLB) missense STXBP1 variants from the general population (gnomAD; n=128) and known pathogenic or likely pathogenic (PLP) missense STXBP1 variants from multiple sources, including industry partners (Invitae, GeneDx, Geisenger), Clinvar, and the literature (n=93). Variants were annotated with multiple features (n=19) using in silico measures related to evolutionary conservation, protein structure and stability, among others. Our approach employed standard unsupervised and supervised machine learning algorithms to develop an optimal single-gene classifier. Our classifier outputs a score ranging from PLP (high) to BLB (low) and was used to score STXBP1 VUSs (n=133) collated from ClinVar and industry partners.
Results: Our classifier is successful at discriminating STXBP1 missense BLB and PLP variants (ROCAUC> 0.9). We used four different models to score and then rank each of the 133 VUS. About half of the VUSs (50.3%) were highly concordant across the four models (SD < 0.01), while a small fraction of VUSs (9%) were highly discordant across the models (SD >0.1). We selected eight (nconcordant=4; ndiscordant=4) representative variants for functional modeling in multiple cellular and animal models.
Conclusions: The first iteration of our EpiPred classifier can distinguish between BLB and PLP variants with high specificity. After functional validation in the EpiMVP cellular and animal models, we will improve the predictive power of EpiPred and release this model to the wider scientific and epilepsy community for interpretation of VUS. Our overarching goal is to develop gene-specific classifiers for additional epilepsy-associated genes with an emphasis on non-ion channel genes with the highest volume of VUS and PLP variants identified in routine clinical genetic testing.
Funding: Please list any funding that was received in support of this abstract.: NIH/NINDS Center Without Walls (CWOW):U54NS117170.
Genetics