360-Degree Gaze Estimation in the Wild Using Multiple Zoom Scales
release_7oult55dozbozaxza5c2syx3ru
by
Ashesh, Chu-Song Chen, Hsuan-Tien Lin
2021
Abstract
Gaze estimation involves predicting where the person is looking at within an
image or video. Technically, the gaze information can be inferred from two
different magnification levels: face orientation and eye orientation. The
inference is not always feasible for gaze estimation in the wild, given the
lack of clear eye patches in conditions like extreme left/right gazes or
occlusions. In this work, we design a model that mimics humans' ability to
estimate the gaze by aggregating from focused looks, each at a different
magnification level of the face area. The model avoids the need to extract
clear eye patches and at the same time addresses another important issue of
face-scale variation for gaze estimation in the wild. We further extend the
model to handle the challenging task of 360-degree gaze estimation by encoding
the backward gazes in the polar representation along with a robust averaging
scheme. Experiment results on the ETH-XGaze dataset, which does not contain
scale-varying faces, demonstrate the model's effectiveness to assimilate
information from multiple scales. For other benchmark datasets with many
scale-varying faces (Gaze360 and RT-GENE), the proposed model achieves
state-of-the-art performance for gaze estimation when using either images or
videos. Our code and pretrained models can be accessed at
https://github.com/ashesh-0/MultiZoomGaze.
In text/plain
format
Archived Content
There are no accessible files associated with this release. You could check other releases for this work for an accessible version.
Know of a fulltext copy of on the public web? Submit a URL and we will archive it
2009.06924v3
access all versions, variants, and formats of this works (eg, pre-prints)