Using Natural Language Processing to Analyze Electronic Health Records of Women with Epilepsy for Sexual and Reproductive Health Counseling
Abstract number :
2.164
Submission category :
4. Clinical Epilepsy / 4E. Women's Issues
Year :
2022
Submission ID :
2203916
Source :
www.aesnet.org
Presentation date :
12/4/2022 12:00:00 PM
Published date :
Nov 22, 2022, 05:22 AM
Authors :
Elizabeth Harrison, MD, MS – UPMC Children's Hospital of Pittsburgh; Laura Kirkpatrick, MD – UPMC Children's Hospital of Pittsburgh; Patrick Harrison, MS – Data Theoretic; Traci Kazmerski, MD, MS – UPMC Children's Hospital of Pittsburgh; Yoshimi Sogawa, MD – UPMC Children's Hospital of Pittsburgh; Harry Hochheiser, PhD – Department of Biomedical Informatics – University of Pittsburgh
Rationale: We propose a reproducible methodology for retrospective textual analysis of sexual and reproductive healthcare (SRH) topics discussed with adolescent and young adult women with epilepsy. Information regarding SRH counseling is typically only recorded as natural-language text in clinical notes. A methodology using natural language processing (NLP) to extract and transform these data into a form suitable for conventional analysis could help facilitate studies of SRH and other sensitive healthcare topics relevant to this population.
Methods: (1) Initial text is retrieved from the electronic health record in the form of individual clinic notes. (2) These notes are segmented into sentences using spaCy, an NLP processing toolkit for the Python programming language. (3) The sentences are exported from Python in CSV format. (4) Subsets of these sentences are labeled for references to SRH counseling across multiple relevant categories (ex. menstruation, sexual activity, contraception). These labels are created in the application Watchful by applying a combination of regular expressions and manual annotation. (5) The labeled sentences serve as training data to create a machine learning model using spaCy’s textcat, a machine learning architecture for classifying text. (6) This model is then applied to the remaining unlabeled sentences to identify additional sentences with references to SRH counseling. Steps 3 through 6 are repeated iteratively until no new relevant sentences are identified by the machine learning model. Finally, all labeled sentences can be recombined into notes or can be further aggregated for analysis. Validation of results can be performed by external subject matter experts, using Cohen’s Kappa to compare model output to reviewer sentence classification on a sample of sentences.
Results: This approach was used successfully to identify references to menstruation in a set of 3663 child neurology notes from clinical encounters with 971 adolescent and young adult women with epilepsy. We illustrate our methodology with examples from this project.
Conclusions: The methodology proposed is both easily reproducible and flexible enough to be adapted for a variety of sensitive healthcare topics.
Funding: American Academy of Neurology Resident Research Grant
Clinical Epilepsy