1. Textcaps: A dataset for image captioning with reading comprehension;Sidorov,2020
2. Confidence-aware non-repetitive multimodal transformers for textcaps;Wang,2021
3. Improving ocr-based image captioning by incorporating geometrical relationship;Wang,2021
4. J. Wang, J. Tang, J. Luo, Multimodal attention with image text spatial relationship for ocr-based image captioning, in: The 28th ACM International Conference on Multimedia, New York, NY, USA, 2020, pp. 4337–4345, http://dx.doi.org/10.1145/3394171.3413753.
5. Context-aware transformer for image captioning;Yang;Neurocomputing,2023