Language with vision: A study on grounded word and sentence embeddings-Reference-Cited by-同舟云学术

Language with vision: A study on grounded word and sentence embeddings

Published:2023-12-19 Issue: Volume: Page:
ISSN:1554-3528
Container-title:Behavior Research Methods
language:en
Short-container-title:Behav Res

Author:

Shahmohammadi Hassan,Heitmeier Maria,Shafaei-Bajestan Elnaz,Lensch Hendrik P. A.,Baayen R. Harald

Abstract

AbstractGrounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective computational grounding model for pre-trained word embeddings. Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information while simultaneously preserving the distributional statistics that characterize word usage in text corpora. By applying a learned alignment, we are able to indirectly ground unseen words including abstract words. A series of evaluations on a range of behavioral datasets shows that visual grounding is beneficial not only for concrete words but also for abstract words, lending support to the indirect theory of abstract concepts. Moreover, our approach offers advantages for contextualized embeddings, such as those generated by BERT (Devlin et al, 2018), but only when trained on corpora of modest, cognitively plausible sizes. Code and grounded embeddings for English are available at (https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2).

Funder

Cluster of Excellence

German Federal Ministry of Education and Research (BMBF

Publisher

Springer Science and Business Media LLC

Subject

General Psychology,Psychology (miscellaneous),Arts and Humanities (miscellaneous),Developmental and Educational Psychology,Experimental and Cognitive Psychology

Link

https://link.springer.com/content/pdf/10.3758/s13428-023-02294-z.pdf

Reference153 articles.

1. Abdou, M., Kulmizev, A., Hershcovich, D., et al. (2021). Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color. In: Proceedings of the 25th conference on computational natural language learning. association for computational linguistics, Stroudsburg, PA, USA, pp. 109–132, https://doi.org/10.18653/v1/2021.conll-1.9

2. Anderson, A. J., Bruni, E., Lopopolo, A., et al. (2015). Reading visually embodied meaning from the brain: Visually grounded computational models decode visual-object mental imagery induced by written text. NeuroImage, 120, 309–322.

3. Andrews, M., Frank, S., & Vigliocco, G. (2014). Reconciling embodied and distributional accounts of meaning in language. Topics in Cognitive Science, 6(3), 359–370.

4. Avery, J. E., Goldstone, R. L., & Jones, M. N. (2021). Reconstructing maps from text. Cognitive Systems Research, 70, 101–108.

5. Baayen, R.H., Chuang, Y.Y., Shafaei-Bajestan, E., et al. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition but in linear discriminative learning. Complexity 2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. How direct is the link between words and images?;The Mental Lexicon;2024-01-11