Identifikasi Kemiripan Teks Menggunakan Class Indexing Based dan Cosine Similarity Untuk Klasifikasi Dokumen Pengaduan

Autor:	Muhammad Aziz Muslim, Harry Soekotjo Dachlan, Syahroni Wahyu Iriananda
Rok vydání:	2019
Předmět:	Data flow diagram Similarity (network science) Computer science business.industry Search engine indexing Cosine similarity Feature (machine learning) Pattern recognition Artificial intelligence business Test data Weighting k-nearest neighbors algorithm
Zdroj:	MATICS. 10:30
ISSN:	2477-2550 1978-161X
DOI:	10.18860/mat.v10i2.5327
Popis:	Report handling on "LAPOR!" systemdepends on the system administrator who manually reads every incoming report [3]. Read manually can lead to errorsin handling complaints [4] if the data flow is very large and grows rapidly it can take at least three days and sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure and identify the similarity of document reports computerized that can identify the similarity between the Query (Incoming) with Document (Archive). In this study, the authors employed term weighting scheme Class-Based Indexing, and Cosine Similarity to analyze document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values are defined as feature sets for the text classification process using the KNearestNeighbor (K-NN) method. The optimum resultevaluation with preprocessing employ Stemming and the bestresult of all features is 75% training data ratio and 25% testdata on the CoSimTFIDF feature that is 84%. Value k = 5has a high accuracy of 84.12%
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::890c1d899dcb987c47a63e5a904f8595 https://doi.org/10.18860/mat.v10i2.5327 Zobrazit plný text záznamu