Learning Low-Dimensional Representations of Medical Concepts release_7oxesiahsjbivguvocsf2sohua

by Youngduck Choi, Chill Yi-I Chiu, David Sontag

Published .

2016   Volume 2016, p41-50


We show how to learn low-dimensional representations (embeddings) of a wide range of concepts in medicine, including diseases (e.g., ICD9 codes), medications, procedures, and laboratory tests. We expect that these embeddings will be useful across medical informatics for tasks such as cohort selection and patient summarization. These embeddings are learned using a technique called neural language modeling from the natural language processing community. However, rather than learning the embeddings solely from text, we show how to learn the embeddings from claims data, which is widely available both to providers and to payers. We also show that with a simple algorithmic adjustment, it is possible to learn medical concept embeddings in a privacy preserving manner from co-occurrence counts derived from clinical narratives. Finally, we establish a methodological framework, arising from standard medical ontologies such as UMLS, NDF-RT, and CCS, to further investigate the embeddings and precisely characterize their quantitative properties.
In text/plain format

Archived Files and Locations

application/pdf  339.2 kB
europepmc.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Type  article-journal
Stage   published
Date   2016-07-20
Language   en ?
PubMed  27570647
PMC  PMC5001761
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 9d3f8aa8-9d08-4060-a49f-f3294c815530