Char2char Generation with Reranking for the E2E NLG Challenge

Autor:	Shubham Agarwal, Marc Dymetman, Eric Gaussier
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computer science Character (computing) business.industry 05 social sciences Lexical analysis Natural language generation Contrast (statistics) 010501 environmental sciences computer.software_genre 01 natural sciences Machine Learning (cs.LG) Simple (abstract algebra) 0502 economics and business Artificial intelligence 050207 economics business computer Computation and Language (cs.CL) Natural language processing 0105 earth and related environmental sciences
Zdroj:	INLG
Popis:	This paper describes our submission to the E2E NLG Challenge. Recently, neural seq2seq approaches have become mainstream in NLG, often resorting to pre- (respectively post-) processing delexicalization (relexicalization) steps at the word-level to handle rare words. By contrast, we train a simple character level seq2seq model, which requires no pre/post-processing (delexicalization, tokenization or even lowercasing), with surprisingly good results. For further improvement, we explore two re-ranking approaches for scoring candidates. We also introduce a synthetic dataset creation procedure, which opens up a new way of creating artificial datasets for Natural Language Generation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bc2234c23c1eed9b1f9c5fb5f2beb676 http://arxiv.org/abs/1811.05826 Zobrazit plný text záznamu