Ao, Wang, Zhou, Liu, Ren, Wu, Ko, Li, Zhang, Wei, Qian, Li, Wei, 2021. SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.