Identifikasi Kemiripan Teks Menggunakan Class Indexing Based dan Cosine Similarity Untuk Klasifikasi Dokumen Pengaduan

Autor: Muhammad Aziz Muslim, Harry Soekotjo Dachlan, Syahroni Wahyu Iriananda
Rok vydání: 2019
Předmět:
Zdroj: MATICS. 10:30
ISSN: 2477-2550
1978-161X
DOI: 10.18860/mat.v10i2.5327
Popis: Report handling on "LAPOR!" systemdepends on the system administrator who manually reads every incoming report [3]. Read manually can lead to errorsin handling complaints [4] if the data flow is very large and grows rapidly it can take at least three days and sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure and identify the similarity of document reports computerized that can identify the similarity between the Query (Incoming) with Document (Archive). In this study, the authors employed term weighting scheme Class-Based Indexing, and Cosine Similarity to analyze document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values are defined as feature sets for the text classification process using the KNearestNeighbor (K-NN) method. The optimum resultevaluation with preprocessing employ Stemming and the bestresult of all features is 75% training data ratio and 25% testdata on the CoSimTFIDF feature that is 84%. Value k = 5has a high accuracy of 84.12%
Databáze: OpenAIRE