Semantically Invariant Text-to-Image Generation
release_wnnairxjjzc7pcad72dihtf45m
by
Shagan Sah, Dheeraj Peri, Ameya Shringi, Chi Zhang, Miguel Dominguez,
Andreas Savakis, Ray Ptucha
2018
Abstract
Image captioning has demonstrated models that are capable of generating
plausible text given input images or videos. Further, recent work in image
generation has shown significant improvements in image quality when text is
used as a prior. Our work ties these concepts together by creating an
architecture that can enable bidirectional generation of images and text. We
call this network Multi-Modal Vector Representation (MMVR). Along with MMVR, we
propose two improvements to the text conditioned image generation. Firstly, a
n-gram metric based cost function is introduced that generalizes the caption
with respect to the image. Secondly, multiple semantically similar sentences
are shown to help in generating better images. Qualitative and quantitative
evaluations demonstrate that MMVR improves upon existing text conditioned image
generation results by over 20%, while integrating visual and text modalities.
In text/plain
format
Archived Files and Locations
application/pdf 2.5 MB
file_onhpthsrjvey5o5grtzuzdgmsq
|
arxiv.org (repository) web.archive.org (webarchive) |
1809.10274v1
access all versions, variants, and formats of this works (eg, pre-prints)