Multi-Sense Language Modelling release_uohwpw4fubgapgystba3vbelnu

by Andrea Lekkas, Peter Schneider-Kamp, Isabelle Augenstein

Released as a article .

2020  

Abstract

The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. For sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets.
In text/plain format

Archived Files and Locations

application/pdf  463.2 kB
file_ohjcar4kprgjzo36nsw2zqjyji
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2020-12-10
Version   v1
Language   en ?
arXiv  2012.05776v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 69388fe9-1860-458c-bf6d-617ef4365aad
API URL: JSON