General Facial Representation Learning in a Visual-Linguistic Manner release_rsialmokandv3lcruoeizr4wb4

by Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen

Released as a article .

2021  

Abstract

How to learn a universal facial representation that boosts all face analysis tasks? This paper takes one step toward this goal. In this paper, we study the transfer performance of pre-trained models on face analysis tasks and introduce a framework, called FaRL, for general Facial Representation Learning in a visual-linguistic manner. On one hand, the framework involves a contrastive loss to learn high-level semantic meaning from image-text pairs. On the other hand, we propose exploring low-level information simultaneously to further enhance the face representation, by adding a masked image modeling. We perform pre-training on LAION-FACE, a dataset containing large amount of face image-text pairs, and evaluate the representation capability on multiple downstream tasks. We show that FaRL achieves better transfer performance compared with previous pre-trained models. We also verify its superiority in the low-data regime. More importantly, our model surpasses the state-of-the-art methods on face analysis tasks including face parsing and face alignment.
In text/plain format

Archived Files and Locations

application/pdf  814.0 kB
file_a5uem2v3xney3kjlslfr6etdsy
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2021-12-06
Version   v1
Language   en ?
arXiv  2112.03109v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 2c4fff66-cec4-4e9a-bead-7f63203128fa
API URL: JSON