Motion-Attentive Transition for Zero-Shot Video Object Segmentation
release_uezd57hdija4vjy2ixjzwofbma
by
Tianfei Zhou, Shunzhou Wang, Yi Zhou, Yazhou Yao, Jianwu Li, Ling Shao
2020
Abstract
In this paper, we present a novel Motion-Attentive Transition Network
(MATNet) for zero-shot video object segmentation, which provides a new way of
leveraging motion information to reinforce spatio-temporal object
representation. An asymmetric attention block, called Motion-Attentive
Transition (MAT), is designed within a two-stream encoder, which transforms
appearance features into motion-attentive representations at each convolutional
stage. In this way, the encoder becomes deeply interleaved, allowing for
closely hierarchical interactions between object motion and appearance. This is
superior to the typical two-stream architecture, which treats motion and
appearance separately in each stream and often suffers from overfitting to
appearance information. Additionally, a bridge network is proposed to obtain a
compact, discriminative and scale-sensitive representation for multi-level
encoder features, which is further fed into a decoder to achieve segmentation
results. Extensive experiments on three challenging public benchmarks (i.e.
DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling
performance against the state-of-the-arts.
In text/plain
format
Archived Files and Locations
application/pdf 9.7 MB
file_qa4g5ppoqfb2jd6rq47bcm264y
|
arxiv.org (repository) web.archive.org (webarchive) |
2003.04253v2
access all versions, variants, and formats of this works (eg, pre-prints)