Leveraging medical context to recommend semantically similar terms for chart reviews release_d75mir6euvh3hey3fgqqxsgy6i

by Cheng Ye, Bradley A. Malin, Daniel Fabbri

Published in BMC Medical Informatics and Decision Making by Springer Science and Business Media LLC.

2021   Volume 21, Issue 1, p353

Abstract

<jats:title>Abstract</jats:title><jats:sec> <jats:title>Background</jats:title> Information retrieval (IR) help clinicians answer questions posed to large collections of electronic medical records (EMRs), such as how best to identify a patient's cancer stage. One of the more promising approaches to IR for EMRs is to expand a keyword query with similar terms (e.g., augmenting <jats:italic>cancer</jats:italic> with <jats:italic>mets</jats:italic>). However, there is a large range of clinical chart review tasks, such that fixed sets of similar terms is insufficient. Current language models, such as Bidirectional Encoder Representations from Transformers (BERT) embeddings, do not capture the full non-textual context of a task. In this study, we present new methods that provide similar terms dynamically by adjusting with the context of the chart review task. </jats:sec><jats:sec> <jats:title>Methods</jats:title> We introduce a vector space for medical-context in which each word is represented by a vector that captures the word's usage in different medical contexts (e.g., how frequently <jats:italic>cancer</jats:italic> is used when ordering a prescription versus describing family history) beyond the context learned from the surrounding text. These vectors are transformed into a vector space for customizing the set of similar terms selected for different chart review tasks. We evaluate the vector space model with multiple chart review tasks, in which supervised machine learning models learn to predict the preferred terms of clinically knowledgeable reviewers. To quantify the usefulness of the predicted similar terms to a baseline of standard word2vec embeddings, we measure (1) the prediction performance of the medical-context vector space model using the area under the receiver operating characteristic curve (AUROC) and (2) the labeling effort required to train the models. </jats:sec><jats:sec> <jats:title>Results</jats:title> The vector space outperformed the baseline word2vec embeddings in all three chart review tasks with an average AUROC of 0.80 versus 0.66, respectively. Additionally, the medical-context vector space significantly reduced the number of labels required to learn and predict the preferred similar terms of reviewers. Specifically, the labeling effort was reduced to 10% of the entire dataset in all three tasks. </jats:sec><jats:sec> <jats:title>Conclusions</jats:title> The set of preferred similar terms that are relevant to a chart review task can be learned by leveraging the medical context of the task. </jats:sec>
In application/xml+jats format

Archived Files and Locations

application/pdf  5.1 MB
file_a27qjx6hrzb57lebsl7l5l64ma
bmcmedinformdecismak.biomedcentral.com (publisher)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2021-12-18
Language   en ?
Container Metadata
Open Access Publication
In DOAJ
In ISSN ROAD
In Keepers Registry
ISSN-L:  1472-6947
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 53fd5551-8085-4675-9a09-e540932665ca
API URL: JSON