Optimizing word set coverage for multi-event summarization
Autor: | Wenliang Cheng, Aoying Zhou, Jihong Yan, Chengyu Wang, Jun Liu, Ming Gao |
---|---|
Rok vydání: | 2015 |
Předmět: |
Text corpus
Topic model Control and Optimization Information retrieval Computer science Event (computing) business.industry Applied Mathematics Probabilistic logic Automatic summarization Computer Science Applications Set (abstract data type) Computational Theory and Mathematics Multi-document summarization Discrete Mathematics and Combinatorics The Internet business |
Zdroj: | Journal of Combinatorial Optimization. 30:996-1015 |
ISSN: | 1573-2886 1382-6905 |
DOI: | 10.1007/s10878-015-9855-0 |
Popis: | We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness. |
Databáze: | OpenAIRE |
Externí odkaz: |