Abstracts

Chatgpt as Neuropsychologists, Are We There Yet?

Abstract number : 2.022
Submission category : 11. Behavior/Neuropsychology/Language / 11A. Adult
Year : 2024
Submission ID : 623
Source : www.aesnet.org
Presentation date : 12/8/2024 12:00:00 AM
Published date :

Authors :
Presenting Author: Erafat Rehim, MD – University of Pittsburgh, Emory University

Aida Risman, MD – Emory University
Adam Dickey, MD, PhD – Emory University
Daniel Drane, PhD – Emory University

Rationale:

Neuropsychological assessment plays a critical role in the pre-surgical evaluation of epilepsy patients. Lateralization and localization of neuropsychological deficits help us define the epileptogenic zone and can impact surgical planning and outcomes. Traditionally, these interpretations are performed by experienced neuropsychologists. The advent of advanced artificial intelligence (AI) models, such as ChatGPT, offers a potential role for this complex evaluation. We explore the potential of ChatGPT in providing accurate lateralization and localization based on the neuropsychological assessment.


Methods:

A sample of six cases with left-sided language dominance was provided to ChatGPT-4o, each with a snapshot of neuropsychological test results. Test results were modified to ensure patient confidentiality while preserving the overall cognitive profiles. Prompts were iteratively developed to achieve consistency and accuracy. The generated impressions were compared against the impressions by a neuropsychologist. A second neuropsychologist, blinded to the original report, provided independent interpretation.


Results:
ChatGPT successfully lateralized dysfunction in all but one case. In case #1, ChatGPT’s interpretation was bilateral dysfunction, right worse than left, while the first neuropsychologist’s impression was right side dysfunction, although the second neuropsychologist confirmed the impression of bilateral dysfunction, aligning more closely with ChatGPT’s output.




ChatGPT accurately localized temporal lobe dysfunction when it was left-sided. In cases #3 and #4, it matched the initial neuropsychologist's impressions and were validated by the second neuropsychologist, who additionally noted bilateral dysfunction but worse on the left-side.




ChatGPT reported temporoparietal lobe dysfunction when the it was right-sided or bilateral, as seen in cases #1, #2, and #6, where both neuropsychologists’ localization was only to the temporal lobe.




ChatGPT provided specific localizations when initial neuropsychologist impressions were non-lateralizing, and non-localizing. In cases #5, #6, ChatGPT identified frontotemporal and temporoparietal dysfunctions. The second neuropsychologist partially corroborated these interpretations.




Conclusions:

We demonstrated that there is potential in using a large language model (such as ChatGPT) to interpret neuropsychological test results for epilepsy surgery evaluations. With refined prompts, ChatGPT could provide reasonable interpretations.



We identified several limitations in ChatGPT’s performance. It tended to incorrectly attributing deficits to the parietal lobe when it was right sided dysfunction. It assessed severity based on percentiles, which can be problematic since normative data sets for different cognitive domains carry different weights in final interpretations. Finally, the process of refining the prompts may have led to overfitting to the desired results.

Further research using a full spectrum of neuropsychology test results in larger cohorts would be necessary to define the utility and limitations of using large language models in epilepsy surgery evaluation.






Funding: No funding.

Behavior