Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words.

Autor: Tae-Seok Lee, Hyun-Young Lee, Seung-Shik Kang
Zdroj: Journal of Information Processing Systems; Jun2022, Vol. 18 Issue 3, p344-358, 15p
Abstrakt: Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE- 1 score was 55.11 and the word-order-based ROUGE-L score was 39.65. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index