1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In 12th$$\{$$USENIX$$\}$$symposium on operating systems design and implementation ($$\{$$OSDI$$\}$$16), (pp. 265–283).
2. Anne Hendricks, L., Wang, O., Shechtman, E., Sivic, J., Darrell, T., & Russell, B. (2017). Localizing moments in video with natural language. In Proceedings of the IEEE international conference on computer vision (ICCV), (pp. 5803–5812).
3. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
4. Cer, D., Yang, Y., Kong, S.Y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C. & Sung, Y.H. (2018). Universal sentence encoder. arXiv preprint arXiv:1803.11175.
5. Chen, J., Chen, X., Ma, L., Jie, Z., & Chua, T. S. (2018a). Temporally grounding natural sentence in video. In Proceedings of the 2018 conference on empirical methods in natural language processing, (pp. 162–171).