General Facial Representation Learning in a Visual-Linguistic Manner
release_rsialmokandv3lcruoeizr4wb4
by
Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen
2021
Abstract
How to learn a universal facial representation that boosts all face analysis
tasks? This paper takes one step toward this goal. In this paper, we study the
transfer performance of pre-trained models on face analysis tasks and introduce
a framework, called FaRL, for general Facial Representation Learning in a
visual-linguistic manner. On one hand, the framework involves a contrastive
loss to learn high-level semantic meaning from image-text pairs. On the other
hand, we propose exploring low-level information simultaneously to further
enhance the face representation, by adding a masked image modeling. We perform
pre-training on LAION-FACE, a dataset containing large amount of face
image-text pairs, and evaluate the representation capability on multiple
downstream tasks. We show that FaRL achieves better transfer performance
compared with previous pre-trained models. We also verify its superiority in
the low-data regime. More importantly, our model surpasses the state-of-the-art
methods on face analysis tasks including face parsing and face alignment.
In text/plain
format
Archived Files and Locations
application/pdf 814.0 kB
file_a5uem2v3xney3kjlslfr6etdsy
|
arxiv.org (repository) web.archive.org (webarchive) |
2112.03109v1
access all versions, variants, and formats of this works (eg, pre-prints)