Learning Features by Watching Objects Move
release_jamgcps3r5hzvmnfy6xyrfh72u
by
Deepak Pathak, Ross Girshick, Piotr Dollár, Trevor Darrell, Bharath
Hariharan
2016
Abstract
This paper presents a novel yet intuitive approach to unsupervised feature
learning. Inspired by the human visual system, we explore whether low-level
motion-based grouping cues can be used to learn an effective visual
representation. Specifically, we use unsupervised motion-based segmentation on
videos to obtain segments, which we use as 'pseudo ground truth' to train a
convolutional network to segment objects from a single frame. Given the
extensive evidence that motion plays a key role in the development of the human
visual system, we hope that this straightforward approach to unsupervised
learning will be more effective than cleverly designed 'pretext' tasks studied
in the literature. Indeed, our extensive experiments show that this is the
case. When used for transfer learning on object detection, our representation
significantly outperforms previous unsupervised approaches across multiple
settings, especially when training data for the target task is scarce.
In text/plain
format
Archived Files and Locations
application/pdf 9.4 MB
file_qbcu2swsujd3zi53mqnv52u5gy
|
arxiv.org (repository) web.archive.org (webarchive) |
1612.06370v1
access all versions, variants, and formats of this works (eg, pre-prints)