Learning to Learn Words from Visual Scenes
Autor: | Dave Epstein, Dídac Surís, Carl Vondrick, Shih-Fu Chang, Heng Ji |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
business.industry Learning to learn 02 engineering and technology 010501 environmental sciences Language acquisition computer.software_genre 01 natural sciences 0202 electrical engineering electronic engineering information engineering Leverage (statistics) 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing 0105 earth and related environmental sciences |
Zdroj: | Computer Vision – ECCV 2020 ISBN: 9783030585259 ECCV (29) |
DOI: | 10.1007/978-3-030-58526-6_26 |
Popis: | Language acquisition is the process of learning words from the surrounding scene. We introduce a meta-learning framework that learns how to learn word representations from unconstrained scenes. We leverage the natural compositional structure of language to create training episodes that cause a meta-learner to learn strong policies for language acquisition. Experiments on two datasets show that our approach is able to more rapidly acquire novel words as well as more robustly generalize to unseen compositions, significantly outperforming established baselines. A key advantage of our approach is that it is data efficient, allowing representations to be learned from scratch without language pre-training. Visualizations and analysis suggest visual information helps our approach learn a rich cross-modal representation from minimal examples. |
Databáze: | OpenAIRE |
Externí odkaz: |