You Only Need Adversarial Supervision for Semantic Image Synthesis
release_tues33cpwzc6vfmlyot6s3x6jm
by
Vadim Sushko, Edgar Schönfeld, Dan Zhang, Juergen Gall, Bernt Schiele, Anna Khoreva
2021
Abstract
Despite their recent successes, GAN models for semantic image synthesis still
suffer from poor image quality when trained with only adversarial supervision.
Historically, additionally employing the VGG-based perceptual loss has helped
to overcome this issue, significantly improving the synthesis quality, but at
the same time limiting the progress of GAN models for semantic image synthesis.
In this work, we propose a novel, simplified GAN model, which needs only
adversarial supervision to achieve high quality results. We re-design the
discriminator as a semantic segmentation network, directly using the given
semantic label maps as the ground truth for training. By providing stronger
supervision to the discriminator as well as to the generator through spatially-
and semantically-aware discriminator feedback, we are able to synthesize images
of higher fidelity with better alignment to their input label maps, making the
use of the perceptual loss superfluous. Moreover, we enable high-quality
multi-modal image synthesis through global and local sampling of a 3D noise
tensor injected into the generator, which allows complete or partial image
change. We show that images synthesized by our model are more diverse and
follow the color and texture distributions of real images more closely. We
achieve an average improvement of 6 FID and 5 mIoU points over the state of
the art across different datasets using only adversarial supervision.
In text/plain
format
Archived Files and Locations
application/pdf 23.6 MB
file_bgqec62qtjb6jdmnl6nnwuc26y
|
arxiv.org (repository) web.archive.org (webarchive) |
2012.04781v2
access all versions, variants, and formats of this works (eg, pre-prints)