1. Towards VQA models that can read;Singh,2019
2. Vizwiz: nearly real-time answers to visual questions;Bigham,2010
3. Visual question answering: Which investigated applications?;Barra;Pattern Recognit. Lett.,2021
4. Attention is all you need;Vaswani;NIPS,2017
5. PalI-X: On scaling up a multilingual vision and language model;Chen,2023