Char2char Generation with Reranking for the E2E NLG Challenge
Autor: | Shubham Agarwal, Marc Dymetman, Eric Gaussier |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer Science - Computation and Language Computer science Character (computing) business.industry 05 social sciences Lexical analysis Natural language generation Contrast (statistics) 010501 environmental sciences computer.software_genre 01 natural sciences Machine Learning (cs.LG) Simple (abstract algebra) 0502 economics and business Artificial intelligence 050207 economics business computer Computation and Language (cs.CL) Natural language processing 0105 earth and related environmental sciences |
Zdroj: | INLG |
Popis: | This paper describes our submission to the E2E NLG Challenge. Recently, neural seq2seq approaches have become mainstream in NLG, often resorting to pre- (respectively post-) processing delexicalization (relexicalization) steps at the word-level to handle rare words. By contrast, we train a simple character level seq2seq model, which requires no pre/post-processing (delexicalization, tokenization or even lowercasing), with surprisingly good results. For further improvement, we explore two re-ranking approaches for scoring candidates. We also introduce a synthetic dataset creation procedure, which opens up a new way of creating artificial datasets for Natural Language Generation. |
Databáze: | OpenAIRE |
Externí odkaz: |