Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation release_iq6om7fye5cjfoemj6zuqsnima

by Xiaoguang Chang, Teng Wang, Changyin Sun, Wenzhe Cai

Released as a article .

2022  

Abstract

Scene graph generation is a sophisticated task because there is no specific recognition pattern (e.g., "looking at" and "near" have no conspicuous difference concerning vision, whereas "near" could occur between entities with different morphology). Thus some scene graph generation methods are trapped into most frequent relation predictions caused by capricious visual features and trivial dataset annotations. Therefore, recent works emphasized the "unbiased" approaches to balance predictions for a more informative scene graph. However, human's quick and accurate judgments over relations between numerous objects should be attributed to "bias" (i.e., experience and linguistic knowledge) rather than pure vision. To enhance the model capability, inspired by the "cognitive bias" mechanism, we propose a novel 3-paradigms framework that simulates how humans incorporate the label linguistic features as guidance of vision-based representations to better mine hidden relation patterns and alleviate noisy visual propagation. Our framework is model-agnostic to any scene graph model. Comprehensive experiments prove our framework outperforms baseline modules in several metrics with minimum parameters increment and achieves new SOTA performance on Visual Genome dataset.
In text/plain format

Archived Files and Locations

application/pdf  5.5 MB
file_dgkzh5essva6pj6bpc7o7pzg3i
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2022-03-17
Version   v1
Language   en ?
arXiv  2203.09160v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 0ffc7aa7-c60b-49a9-9180-62ac0159ed17
API URL: JSON