An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm

Autor: Md. Palash Uddin, Aysa Siddika Asa, Masud Ibn Afjal, Md. Delowar Hossain, Shikhor Kumer Roy, Sumya Akter
Rok vydání: 2017
Předmět:
Zdroj: 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR).
DOI: 10.1109/icivpr.2017.7890883
Popis: Text summarization, a field of data mining, is very important for developing various real-life applications. Many techniques have been developed for summarizing English text(s). But, a few attempts have been made for Bengali text because of its some multifaceted structure. This paper presents a method for text summarization which extracts important sentences from a single or multiple Bengali documents. The input document(s) should be pre-processed by tokenization, stemming operation etc. Then, word score is calculated by Term-Frequency/Inverse Document Frequency (TF/IDF) and sentence score is determined by summing up its constituent words' scores with its position. Cue and skeleton words have also been considered to calculate the sentence score. For single or multiple documents, K-means clustering algorithm has been applied to produce the final summary. The experimental result shows satisfactory outputs in comparison to the existing approaches possessing linear run time complexity.
Databáze: OpenAIRE