Soricut. Understanding Image and Text Simultaneously: A Dual Vision-language Machine Comprehension Task. 22 Dec. 2016.