Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses
Autor: | Jackie Chi Kit Cheung, Yue Dong, Annie Louis, Matt Grenander |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Computer Science - Computation and Language business.industry Computer science Training (meteorology) 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Automatic summarization Machine Learning (cs.LG) Lead (geology) 0202 electrical engineering electronic engineering information engineering Key (cryptography) Feature (machine learning) Reinforcement learning 020201 artificial intelligence & image processing Artificial intelligence business computer Computation and Language (cs.CL) Sentence 0105 earth and related environmental sciences |
Zdroj: | EMNLP/IJCNLP (1) |
Popis: | Sentence position is a strong feature for news summarization, since the lead often (but not always) summarizes the key points of the article. In this paper, we show that recent neural systems excessively exploit this trend, which although powerful for many inputs, is also detrimental when summarizing documents where important content should be extracted from later parts of the article. We propose two techniques to make systems sensitive to the importance of content in different parts of the article. The first technique employs 'unbiased' data; i.e., randomly shuffled sentences of the source document, to pretrain the model. The second technique uses an auxiliary ROUGE-based loss that encourages the model to distribute importance scores throughout a document by mimicking sentence-level ROUGE scores on the training data. We show that these techniques significantly improve the performance of a competitive reinforcement learning based extractive system, with the auxiliary loss being more powerful than pretraining. 5 pages, accepted at EMNLP 2019 |
Databáze: | OpenAIRE |
Externí odkaz: |