1. [1] Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S. and Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.6077-6086 (2018).
2. [2] Bahdanau, D., Cho, K. and Bengio, Y.: Neural machine translation by jointly learning to align and translate, Bengio, Y. and LeCun, Y. (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings (2015).
3. [3] Barrault, L., Bougares, F., Specia, L., Lala, C., Elliott, D. and Frank, S.: Findings of the third shared task on multimodal machine translation, Proc. 3rd Conference on Machine Translation: Shared Task Papers, pp.304-323, Association for Computational Linguistics (2018).
4. [4] Barrault, L., Bougares, F., Specia, L., Lala, C., Elliott, D. and Frank, S.: Findings of the third shared task on multimodal machine translation, Bojar, O., Chatterjee, R., Federmann, C., Fishel, M., Graham, Y., Haddow, B., Huck, M., Jimeno-Yepes, A., Koehn, P., Monz, C., Negri, M., Névéol, A., Neves, M.L., Post, M., Specia, L., Turchi, M. and Verspoor, K. (Eds.), Proc. 3rd Conference on Machine Translation: Shared Task Papers, WMT 2018, pp.304-323, Association for Computational Linguistics (2018).
5. [5] Bird, S. and Loper, E.: NLTK: The natural language toolkit, Proc. ACL Interactive Poster and Demonstration Sessions, pp.214-217, Association for Computational Linguistics (2004).