Authors :
Presenting Author: Lina Zhang, MS – School of Engineering, UCLA
Richard Jiang, BS – School of Engineering, UCLA
Prateik Sinha, BS – UCLA
Jessica Pasqua, MD – DGSOM, UCLA
Kartik Sharma, MS – School of Engineering, UCLA
Tonmoy Monsoor, PhD – UCLA
Victor Morales, BS – DGSOM, UCLA
Vwani Roychowdhury, PhD – Department of Electrical and Computer Engineering, University of California Los Angeles
Rajarshi Mazumder, MD – DGSOM, UCLA
Rationale:
While interpretation of seizure semiology is critical for diagnosis of epilepsy and localization of seizure onset zones, current technology using video-EEG analysis remains labor-intensive and are subject to inter-rater variability. Advances in artificial intelligence holds promise for automating and improving the study of seizure phenomenology. We leverage Multimodal Large Language Models (MLLMs) to generate narrative description of seizure semiology directly from raw video-audio data. Our goals for this study: 1) develop an MLLM-based framework for semiological feature extraction based on narrative description generated from seizure videos, and 2) assess the clinical utility of MLLM-extracted semiological features for distinguishing Epileptic Seizures (ES) from Non-Epileptic Seizures (NES).
Methods:
90 videos of seizure episodes (45 ES, 45 NES) from adults with epilepsy (age >18 years), who were evaluated at an epilepsy-monitoring units were included. Two epileptologists annotated the presence or absence of 23 semiological features by evaluating the videos of the seizure spells. Segmented videos (30 s, 64 frames) and associated audios were processed by locally deployed open-source Vision-Language Models (InternVL-3, Qwen-2.5-VL) and Audio-Language Models (Qwen-2-Audio) with two prompting strategies: direct question answering and event description plus LLM parsing. Performance of the MLLM pipeline was compared with expert annotations measured by accuracy and recall. The extracted features were used to train a K-Nearest Neighbors (KNN) classifier to distinguish between ES and NES with Leave-One-Patient-Out (LOPO) cross validation. Results:
The performance evaluation of MLLM-based framework in extracting the 23 clinical semiological features from seizure episode videos showed a mean recall of 0.71 and overall accuracy of 0.56. The framework performed higher for contextual clinical signs such as seizure episode “occur during sleep” (recall 0.94) and lower on focal regional clinical signs such as presence of “eye blinking” ( recall 0.43). The optimal KNN classifier distinguishing between ES and NES using the MLLM-extracted semiological/clinical features reached an accuracy of 0.76, and AUC of 0.76 (detailed in Figure 1).
Conclusions:
We demonstrate the first comprehensive MLLM-based framework to susccesfully extract semiological feature from seizure videos by generating clinical narrative description and, then, accurately classify ES vs NES based on these features. Our work establishes the foundation for an MLLM-powered 'virtual epileptologist' to assist clinicians in the interpretation of seizure semiology.Funding: Dr. Mazumder is funded by the NIH Fogarty International Center 5K01TW012178