A Human Evaluation of AMR-to-English Generation Systems
Autor: | Shira Wein, Nathan Schneider, Emma Manning |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language Computer science business.industry media_common.quotation_subject Natural language generation 02 engineering and technology computer.software_genre 03 medical and health sciences Fluency 0302 clinical medicine Categorization 030221 ophthalmology & optometry 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Quality (business) Artificial intelligence Representation (mathematics) business computer Computation and Language (cs.CL) Natural language processing media_common BLEU Meaning (linguistics) |
Zdroj: | COLING |
DOI: | 10.48550/arxiv.2004.06814 |
Popis: | Most current state-of-the art systems for generating English text from Abstract Meaning Representation (AMR) have been evaluated only using automated metrics, such as BLEU, which are known to be problematic for natural language generation. In this work, we present the results of a new human evaluation which collects fluency and adequacy scores, as well as categorization of error types, for several recent AMR generation systems. We discuss the relative quality of these systems and how our results compare to those of automatic metrics, finding that while the metrics are mostly successful in ranking systems overall, collecting human judgments allows for more nuanced comparisons. We also analyze common errors made by these systems. Comment: COLING 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |