Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

Autor:	Radu Soricut, Nan Ding, Soravit Changpinyo, Piyush Sharma
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Closed captioning Computer Science - Computation and Language Data collection business.industry Computer science Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition computer.software_genre Pipeline (software) Visualization Knowledge extraction Benchmark (computing) Question answering Artificial intelligence business Scale (map) Computation and Language (cs.CL) computer Natural language processing
Zdroj:	CVPR
Popis:	The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training. However, these datasets are often collected with overrestrictive requirements inherited from their original target tasks (e.g., image caption generation), which limit the resulting dataset scale and diversity. We take a step further in pushing the limits of vision-and-language pre-training data by relaxing the data collection pipeline used in Conceptual Captions 3M (CC3M) [Sharma et al. 2018] and introduce the Conceptual 12M (CC12M), a dataset with 12 million image-text pairs specifically meant to be used for vision-and-language pre-training. We perform an analysis of this dataset and benchmark its effectiveness against CC3M on multiple downstream tasks with an emphasis on long-tail visual recognition. Our results clearly illustrate the benefit of scaling up pre-training data for vision-and-language tasks, as indicated by the new state-of-the-art results on both the nocaps and Conceptual Captions benchmarks. Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2021). Our dataset is available at https://github.com/google-research-datasets/conceptual-12m
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d0844c786b47425d2e5b039babfc420a https://doi.org/10.1109/cvpr46437.2021.00356 Zobrazit plný text záznamu