Tewel, et al.. Zerocap: Zero-shot Image-to-text Generation for Visual-semantic Arithmetic. 31 Mar. 2022.