Attention-Set based Metric Learning for Video Face Recognition release_bgbb2hnkjbacjccnnsbxcjvhoi

by Yibo Hu, Xiang Wu, Ran He

Released as a article .

2017  

Abstract

Face recognition has made great progress with the development of deep learning. However, video face recognition (VFR) is still an ongoing task due to various illumination, low-resolution, pose variations and motion blur. Most existing CNN-based VFR methods only obtain a feature vector from a single image and simply aggregate the features in a video, which less consider the correlations of face images in one video. In this paper, we propose a novel Attention-Set based Metric Learning (ASML) method to measure the statistical characteristics of image sets. It is a promising and generalized extension of Maximum Mean Discrepancy with memory attention weighting. First, we define an effective distance metric on image sets, which explicitly minimizes the intra-set distance and maximizes the inter-set distance simultaneously. Second, inspired by Neural Turing Machine, a Memory Attention Weighting is proposed to adapt set-aware global contents. Then ASML is naturally integrated into CNNs, resulting in an end-to-end learning scheme. Our method achieves state-of-the-art performance for the task of video face recognition on the three widely used benchmarks including YouTubeFace, YouTube Celebrities and Celebrity-1000.
In text/plain format

Archived Files and Locations

application/pdf  915.8 kB
file_fwmt7hcvhffsboe5ffxkj5wml4
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2017-04-19
Version   v2
Language   en ?
arXiv  1704.03805v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 6594c02e-0d1c-412b-9673-e74856d732c8
API URL: JSON