Author:
Yildirim Ilker,Belledonne Mario,Freiwald Winrich,Tenenbaum Joshua
Abstract
Vision must not only recognize and localize objects, but perform richer inferences about the underlying causes in the world that give rise to sensory data. How the brain performs these inferences remains unknown: Theoretical proposals based on inverting generative models (or “analysis-by-synthesis”) have a long history but their mechanistic implementations have typically been too slow to support online perception, and their mapping to neural circuits is unclear. Here we present a neurally plausible model for efficiently inverting generative models of images and test it as an account of one high-level visual capacity, the perception of faces. The model is based on a deep neural network that learns to invert a three-dimensional (3D) face graphics program in a single fast feedforward pass. It explains both human behavioral data and multiple levels of neural processing in non-human primates, as well as a classic illusion, the “hollow face” effect. The model fits qualitatively better than state-of-the-art computer vision models, and suggests an interpretable reverse-engineering account of how images are transformed into percepts in the ventral stream.
Publisher
Cold Spring Harbor Laboratory
Reference99 articles.
1. Olshausen, B. A. Perception as an inference problem. In Gazzaniga, M. & Mangun, R. (eds.) The Cognitive Neurosciences (MIT Press, 2013).
2. Vision as Bayesian inference: analysis by synthesis?
3. Barrow, H. & Tenenbaum, J. Recovering intrinsic scene characteristics from images. Computer Vision Systems 2 (1978).
4. Visual long-term memory has a massive storage capacity for object details
5. Hierarchical Bayesian inference in the visual cortex
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献