Classification of compressed and uncompressed text documents

Autor:	S. N. Bharath Bhushan, Ajit Danti
Rok vydání:	2018
Předmět:	Computer Networks and Communications business.industry Computer science Sentiment analysis Closeness 02 engineering and technology Similarity measure computer.software_genre 01 natural sciences Measure (mathematics) Text mining Similarity (network science) Hardware and Architecture 0103 physical sciences Metric (mathematics) 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing Data mining 010306 general physics business Cluster analysis computer Software
Zdroj:	Future Generation Computer Systems. 88:614-623
ISSN:	0167-739X
DOI:	10.1016/j.future.2018.04.054
Popis:	Computing the degree of closeness (similarity) between two sets of text documents is one of the core operations in many text mining applications like text classification, clustering and sentiment analysis. The efficiency of such applications mainly depends on the factors like selection of representation model, selection of the similarity metric and selection of learning algorithms. Among these three factors, selection of similarity measure is important since it contributes to the efficiency of most of the text mining applications. In this research article, an efficient similarity measure is proposed for computing the closeness between two sets of text documents. The proposed measure has the capacity of considering different real time situations like presence of a feature or absence of features for computing the degree of similarity between the documents. Furthermore, a compression modeling similarity measure is also proposed for text documents. Two different sets of experiments are conducted to validate the efficacy of the proposed similarity measures. Experimental results demonstrate that the f -measure score obtained from proposed similarity metric is better than the f -measure score of the existing state of the art techniques.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::3afa5479a225a0e8939998ceb50408e3 https://doi.org/10.1016/j.future.2018.04.054 Zobrazit plný text záznamu Full Text from ScienceDirect