Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories

Autor: David Wilmot, Frank Keller
Rok vydání: 2021
Předmět:
Zdroj: Wilmot, D & Keller, F 2021, Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories . in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . Stroudsburg, PA, pp. 851-865, 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7/11/21 . https://doi.org/10.18653/v1/2021.emnlp-main.65
DOI: 10.48550/arxiv.2109.03754
Popis: Measuring event salience is essential in the understanding of stories. This paper takes a recent unsupervised method for salience detection derived from Barthes Cardinal Functions and theories of surprise and applies it to longer narrative forms. We improve the standard transformer language model by incorporating an external knowledgebase (derived from Retrieval Augmented Generation) and adding a memory mechanism to enhance performance on longer works. We use a novel approach to derive salience annotation using chapter-aligned summaries from the Shmoop corpus for classic literary works. Our evaluation against this data demonstrates that our salience detection model improves performance over and above a non-knowledgebase and memory augmented language model, both of which are crucial to this improvement.
Comment: Accepted to the EMNLP 2021 Conference as a long-paper, 9 pages, 15 pages with appendices and references, 2 figures, 4 tables
Databáze: OpenAIRE