Towards Diverse and Accurate Image Captions via Reinforcing
Determinantal Point Process
release_5uvoc2im65gxfisc3qg6s5p6n4
by
Qingzhong Wang, Antoni B. Chan
2019
Abstract
Although significant progress has been made in the field of automatic image
captioning, it is still a challenging task. Previous works normally pay much
attention to improving the quality of the generated captions but ignore the
diversity of captions. In this paper, we combine determinantal point process
(DPP) and reinforcement learning (RL) and propose a novel reinforcing DPP
(R-DPP) approach to generate a set of captions with high quality and diversity
for an image. We show that R-DPP performs better on accuracy and diversity than
using noise as a control signal (GANs, VAEs). Moreover, R-DPP is able to
preserve the modes of the learned distribution. Hence, beam search algorithm
can be applied to generate a single accurate caption, which performs better
than other RL-based models.
In text/plain
format
Archived Files and Locations
application/pdf 2.0 MB
file_flajbvufkbfzbdtwfhgy52wh3q
|
arxiv.org (repository) web.archive.org (webarchive) |
1908.04919v1
access all versions, variants, and formats of this works (eg, pre-prints)