An extractive text summarization technique for Bengali document(s) using K-means clustering algorithm

Autor:	Md. Palash Uddin, Aysa Siddika Asa, Masud Ibn Afjal, Md. Delowar Hossain, Shikhor Kumer Roy, Sumya Akter
Rok vydání:	2017
Předmět:	business.industry Computer science Speech recognition Lexical analysis k-means clustering computer.software_genre Automatic summarization language.human_language Bengali Text mining ComputingMethodologies_DOCUMENTANDTEXTPROCESSING language Artificial intelligence business tf–idf Cluster analysis computer Natural language processing Sentence
Zdroj:	2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR).
DOI:	10.1109/icivpr.2017.7890883
Popis:	Text summarization, a field of data mining, is very important for developing various real-life applications. Many techniques have been developed for summarizing English text(s). But, a few attempts have been made for Bengali text because of its some multifaceted structure. This paper presents a method for text summarization which extracts important sentences from a single or multiple Bengali documents. The input document(s) should be pre-processed by tokenization, stemming operation etc. Then, word score is calculated by Term-Frequency/Inverse Document Frequency (TF/IDF) and sentence score is determined by summing up its constituent words' scores with its position. Cue and skeleton words have also been considered to calculate the sentence score. For single or multiple documents, K-means clustering algorithm has been applied to produce the final summary. The experimental result shows satisfactory outputs in comparison to the existing approaches possessing linear run time complexity.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c0a05ae21aba6bcb387f27eeeee70bd4 https://doi.org/10.1109/icivpr.2017.7890883 Zobrazit plný text záznamu