Show, Translate and Tell

Autor:	Raymond Ptucha, Shagan Sah, Dheeraj Peri
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	FOS: Computer and information sciences Closed captioning Artificial neural network Computer science Process (engineering) business.industry Computer Vision and Pattern Recognition (cs.CV) Feature extraction Computer Science - Computer Vision and Pattern Recognition 020207 software engineering 02 engineering and technology Semantics computer.software_genre 0202 electrical engineering electronic engineering information engineering Task analysis 020201 artificial intelligence & image processing Artificial intelligence State (computer science) business computer Natural language processing Sentence
Zdroj:	ICIP
Popis:	Humans have an incredible ability to process and understand information from multiple sources such as images, video, text, and speech. Recent success of deep neural networks has enabled us to develop algorithms which give machines the ability to understand and interpret this information. There is a need to both broaden their applicability and develop methods which correlate visual information along with semantic content. We propose a unified model which jointly trains on images and captions, and learns to generate new captions given either an image or a caption query. We evaluate our model on three different tasks namely cross-modal retrieval, image captioning, and sentence paraphrasing. Our model gains insight into cross-modal vector embeddings, generalizes well on multiple tasks and is competitive to state of the art methods on retrieval.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4086c24a70fb1f7763082725e16db7e7 http://arxiv.org/abs/1903.06275 Zobrazit plný text záznamu