1. Language and Visual Perception Associations: Meta-Analytic Connectivity Modeling of Brodmann Area 37
2. CPTR: full transformer network for image captioning;Liu;CoRR,2021
3. An image is worth 16x16 words: Transformers for image recognition at scale;Dosovitskiy;CoRR,2020
4. M2: Meshedmemory transformer for image captioning;Cornia;CoRR,2019
5. Multimodal transformer with multi-view visual representation for image captioning;Yu;CoRR,2019