GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

Autor:	Liu, Ran, Liu, Ming, Yu, Min, Jiang, Jianguo, Li, Gang, Zhang, Dan, Li, Jingyuan, Meng, Xiang, Huang, Weiqing
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach. It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts, thereby improving intra-cluster correlation and the fluency of generated sentences. Finally, it summarizes clusters into natural sentences. Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches. Furthermore, it surpasses state-of-the-art pre-trained multi-document summarization models (e.g. PEGASUS and PRIMERA) under zero-shot settings in terms of ROUGE scores. Additionally, human evaluations indicate that summaries generated by GLIMMER achieve high readability and informativeness scores. Our code is available at https://github.com/Oswald1997/GLIMMER. Comment: 19 pages, 7 figures. Accepted by ECAI 2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.10115 Zobrazit plný text záznamu View this record from Arxiv