Wang, et al.. Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-language Tasks. 28 Apr. 2022.