Synthesizing Training Data for Object Detection in Indoor Scenes
release_ij76wyd22jhmzkfysjwn5zr6om
by
Georgios Georgakis, Arsalan Mousavian, Alexander C. Berg, Jana Kosecka
2017
Abstract
Detection of objects in cluttered indoor environments is one of the key
enabling functionalities for service robots. The best performing object
detection approaches in computer vision exploit deep Convolutional Neural
Networks (CNN) to simultaneously detect and categorize the objects of interest
in cluttered scenes. Training of such models typically requires large amounts
of annotated training data which is time consuming and costly to obtain. In
this work we explore the ability of using synthetically generated composite
images for training state-of-the-art object detectors, especially for object
instance detection. We superimpose 2D images of textured object models into
images of real environments at variety of locations and scales. Our experiments
evaluate different superimposition strategies ranging from purely image-based
blending all the way to depth and semantics informed positioning of the object
models into real scenes. We demonstrate the effectiveness of these object
detector training strategies on two publicly available datasets, the
GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting
some hand-labeled training data with synthetic examples carefully composed onto
scenes yields object detectors with comparable performance to using much more
hand-labeled data. Broadly, this work charts new opportunities for training
detectors for new objects by exploiting existing object model repositories in
either a purely automatic fashion or with only a very small number of
human-annotated examples.
In text/plain
format
Archived Files and Locations
application/pdf 4.7 MB
file_4f55s6tbbbdbvg3a2ekixeeufq
|
arxiv.org (repository) web.archive.org (webarchive) |
1702.07836v1
access all versions, variants, and formats of this works (eg, pre-prints)