An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm
Autor: | Md. Palash Uddin, Aysa Siddika Asa, Masud Ibn Afjal, Md. Delowar Hossain, Shikhor Kumer Roy, Sumya Akter |
---|---|
Rok vydání: | 2017 |
Předmět: |
business.industry
Computer science Speech recognition Lexical analysis k-means clustering computer.software_genre Automatic summarization language.human_language Bengali Text mining ComputingMethodologies_DOCUMENTANDTEXTPROCESSING language Artificial intelligence business tf–idf Cluster analysis computer Natural language processing Sentence |
Zdroj: | 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR). |
DOI: | 10.1109/icivpr.2017.7890883 |
Popis: | Text summarization, a field of data mining, is very important for developing various real-life applications. Many techniques have been developed for summarizing English text(s). But, a few attempts have been made for Bengali text because of its some multifaceted structure. This paper presents a method for text summarization which extracts important sentences from a single or multiple Bengali documents. The input document(s) should be pre-processed by tokenization, stemming operation etc. Then, word score is calculated by Term-Frequency/Inverse Document Frequency (TF/IDF) and sentence score is determined by summing up its constituent words' scores with its position. Cue and skeleton words have also been considered to calculate the sentence score. For single or multiple documents, K-means clustering algorithm has been applied to produce the final summary. The experimental result shows satisfactory outputs in comparison to the existing approaches possessing linear run time complexity. |
Databáze: | OpenAIRE |
Externí odkaz: |