Yu, Zeiler, Kolossa, 2021. Fusing information streams in end-to-end audio-visual speech recognition.