The Effect of Pretraining on Extractive Summarization for Scientific Documents

Autor: Shikha Bordia, Pawan Sasanka Ammanamanchi, Maneesh Singh, Ramakanth Pasunuru, Manish Shrivastava, Mohit Bansal, Arjun Manoharan, Deepak Mittal, Preethi Jyothi, Yash Kumar Gupta
Rok vydání: 2021
Předmět:
Zdroj: Proceedings of the Second Workshop on Scholarly Document Processing.
DOI: 10.18653/v1/2021.sdp-1.9
Popis: Large pretrained models have seen enormous success in extractive summarization tasks. In this work, we investigate the influence of pretraining on a BERT-based extractive summarization system for scientific documents. We derive significant performance improvements using an intermediate pretraining step that leverages existing summarization datasets and report state-of-the-art results on a recently released scientific summarization dataset, SciTLDR. We systematically analyze the intermediate pretraining step by varying the size and domain of the pretraining corpus, changing the length of the input sequence in the target task and varying target tasks. We also investigate how intermediate pretraining interacts with contextualized word embeddings trained on different domains.
Databáze: OpenAIRE