GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

Autor: Liu, Ran, Liu, Ming, Yu, Min, Jiang, Jianguo, Li, Gang, Zhang, Dan, Li, Jingyuan, Meng, Xiang, Huang, Weiqing
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach. It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts, thereby improving intra-cluster correlation and the fluency of generated sentences. Finally, it summarizes clusters into natural sentences. Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches. Furthermore, it surpasses state-of-the-art pre-trained multi-document summarization models (e.g. PEGASUS and PRIMERA) under zero-shot settings in terms of ROUGE scores. Additionally, human evaluations indicate that summaries generated by GLIMMER achieve high readability and informativeness scores. Our code is available at https://github.com/Oswald1997/GLIMMER.
Comment: 19 pages, 7 figures. Accepted by ECAI 2024
Databáze: arXiv