1. Bahdanau, D., Cho, K. H., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate.
2. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks.
3. Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguisitics.
4. Cheng, Y., Liu, Y., Yang, Q., Sun, M., & Xu, W. (2016). Neural machine translation with pivot languages.
arXiv:1611.04928
.
5. Cheng, Y., Shen, S., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016). Agreement-based joint training for bidirectional attention-based neural machine translation. In International Joint Conference on Artificial Intelligence (IJCAI).