Song, et al.. CLIP Models Are Few-shot Learners: Empirical Studies on VQA and Visual Entailment. 14 Mar. 2022.