Qin, et al.. TVDIM: Enhancing Image Self-supervised Pretraining via Noisy Text Data. 13 June 2021.