MarioQA: Answering Questions by Watching Gameplay Videos release_ah5fpvhwcjfnrmmn3y3wduvjfe

by Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han

Released as a article .

2017  

Abstract

We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed automatically from gameplay videos. Our work is motivated by the fact that existing models are often tested only on datasets that require excessively high-level reasoning or mostly contain instances accessible through single frame inferences. Hence, it is difficult to measure capacity and flexibility of trained models, and existing techniques often rely on ad-hoc implementations of deep neural networks without clear insight into datasets and models. We are particularly interested in understanding temporal relationships between video events to solve VideoQA problems; this is because reasoning temporal dependency is one of the most distinct components in videos from images. To address this objective, we automatically generate a customized synthetic VideoQA dataset using Super Mario Bros. gameplay videos so that it contains events with different levels of reasoning complexity. Using the dataset, we show that properly constructed datasets with events in various complexity levels are critical to learn effective models and improve overall performance.
In text/plain format

Archived Files and Locations

application/pdf  2.3 MB
file_ema2xbq5f5ablkqd4qiv2fdvz4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2017-08-13
Version   v2
Language   en ?
arXiv  1612.01669v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: bfb596ce-f95e-4dca-bdee-bffae840733d
API URL: JSON