Optimizing word set coverage for multi-event summarization

Autor: Wenliang Cheng, Aoying Zhou, Jihong Yan, Chengyu Wang, Jun Liu, Ming Gao
Rok vydání: 2015
Předmět:
Zdroj: Journal of Combinatorial Optimization. 30:996-1015
ISSN: 1573-2886
1382-6905
DOI: 10.1007/s10878-015-9855-0
Popis: We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.
Databáze: OpenAIRE