Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation
release_iq6om7fye5cjfoemj6zuqsnima
by
Xiaoguang Chang, Teng Wang, Changyin Sun, Wenzhe Cai
2022
Abstract
Scene graph generation is a sophisticated task because there is no specific
recognition pattern (e.g., "looking at" and "near" have no conspicuous
difference concerning vision, whereas "near" could occur between entities with
different morphology). Thus some scene graph generation methods are trapped
into most frequent relation predictions caused by capricious visual features
and trivial dataset annotations. Therefore, recent works emphasized the
"unbiased" approaches to balance predictions for a more informative scene
graph. However, human's quick and accurate judgments over relations between
numerous objects should be attributed to "bias" (i.e., experience and
linguistic knowledge) rather than pure vision. To enhance the model capability,
inspired by the "cognitive bias" mechanism, we propose a novel 3-paradigms
framework that simulates how humans incorporate the label linguistic features
as guidance of vision-based representations to better mine hidden relation
patterns and alleviate noisy visual propagation. Our framework is
model-agnostic to any scene graph model. Comprehensive experiments prove our
framework outperforms baseline modules in several metrics with minimum
parameters increment and achieves new SOTA performance on Visual Genome
dataset.
In text/plain
format
Archived Files and Locations
application/pdf 5.5 MB
file_dgkzh5essva6pj6bpc7o7pzg3i
|
arxiv.org (repository) web.archive.org (webarchive) |
2203.09160v1
access all versions, variants, and formats of this works (eg, pre-prints)