1. An image is worth 16x16 words: Transformers for image recognition at scale;dosovitskiy;ICLRE,0
2. FILIP: Fine-grained interactive language-image pre-training;yao;ICLRE,0
3. Palm: Scaling language modeling with pathways;chowdhery;ArXiv Preprint,2022
4. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia
5. Link the Head to the "Beak": Zero Shot Learning from Noisy Text Description at Part Precision