Shaping Visual Representations with Language for Few-shot Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

Language is designed to convey useful information about the world, thus serving as a scaffold for efficient human learning. How can we let language guide representation learning in machine learning models? We explore this question in the setting of few-shot visual classification, proposing models which learn to perform visual classification while jointly predicting natural language task descriptions at train time. At test time, with no language available, we find that these language-influenced visual representations are more generalizable, compared to meta-learning baselines and approaches that explicitly use language as a bottleneck for classification.

Abstract (translated)

URL

https://arxiv.org/abs/1911.02683

PDF

https://arxiv.org/pdf/1911.02683.pdf