Lacalle, et al.. Evaluating Multimodal Representations on Visual Semantic Textual Similarity. 4 Apr. 2020.