Diachronic Cross-modal Embeddings
release_a7fn7s6ykfbzbe47wvou7mxe5y
by
David Semedo, João Magalhães
2019
Abstract
Understanding the semantic shifts of multimodal information is only possible
with models that capture cross-modal interactions over time. Under this
paradigm, a new embedding is needed that structures visual-textual interactions
according to the temporal dimension, thus, preserving data's original temporal
organisation. This paper introduces a novel diachronic cross-modal embedding
(DCM), where cross-modal correlations are represented in embedding space,
throughout the temporal dimension, preserving semantic similarity at each
instant t. To achieve this, we trained a neural cross-modal architecture, under
a novel ranking loss strategy, that for each multimodal instance, enforces
neighbour instances' temporal alignment, through subspace structuring
constraints based on a temporal alignment window. Experimental results show
that our DCM embedding successfully organises instances over time. Quantitative
experiments, confirm that DCM is able to preserve semantic cross-modal
correlations at each instant t while also providing better alignment
capabilities. Qualitative experiments unveil new ways to browse multimodal
content and hint that multimodal understanding tasks can benefit from this new
embedding.
In text/plain
format
Archived Files and Locations
application/pdf 6.5 MB
file_z3gj4uayk5c5hcjgxgcjsre7w4
|
arxiv.org (repository) web.archive.org (webarchive) |
1909.13689v1
access all versions, variants, and formats of this works (eg, pre-prints)