Improving Multi-Document Summary Method Based on Sentence Distribution
Autor: | Diana Purwitasari, Agus Zainal Arifin, Aminul Wahib |
---|---|
Rok vydání: | 2016 |
Předmět: |
Balanced sentence
Computer science business.industry computer.software_genre Automatic summarization Weighting Similarity (network science) Histogram Selection (linguistics) Artificial intelligence Electrical and Electronic Engineering Cluster analysis business computer Natural language processing Sentence |
Zdroj: | TELKOMNIKA (Telecommunication Computing Electronics and Control). 14:286 |
ISSN: | 2302-9293 1693-6930 |
DOI: | 10.12928/telkomnika.v14i1.2330 |
Popis: | Automatic multi-document summaries had been developed by researchers. The method used to select sentences from the source document would determine the quality of the summary result. One of the most popular methods used in weighting sentences was by calculating the frequency of occurrence of words forming the sentences. However, choosing sentences with that method could lead to a chosen sentence which didn't represent the content of the source document optimally. This was because the weighting of sentences was only measured by using the number of occurrences of words. This study proposed a new strategy of weighting sentences based on sentences distribution to choose the most important sentences which paid much attention to the elements of sentences that were formed as a distribution of words. This method of sentence distribution enables the extraction of an important sentence in multi-document summarization which served as a strategy to improve the quality of sentence summaries. In that respect were three concepts used in this study: (1) clustering sentences with similarity based histogram clustering, (2) ordering cluster by cluster importance and (3) selection of important sentence by sentence distribution. Results of experiments showed that the proposed method had a better performance when compared with SIDeKiCK and LIGI methods. Results of ROUGE-1 showed the proposed method increasing 3% compared with the SIDeKiCK method and increasing 5.1% compared with LIGI method. Results of ROUGE-2 proposed method increase 13.7% compared with the SIDeKiCK and increase 14.4% compared with LIGI method. |
Databáze: | OpenAIRE |
Externí odkaz: |