Zhang, B., Hu, H., Sha, F., 2018. Cross-Modal and Hierarchical Modeling of Video and Text, in: . Springer International Publishing.. https://doi.org/10.1007/978-3-030-01261-8_23