Çizge Benzerliği Yöntemi ile Doküman Sınıflandırma
Autor: | Taner Uçkan, Faruk Ayata, Ali Karci, Cengiz Hark, Ebubekir Seyyarer |
---|---|
Rok vydání: | 2018 |
Předmět: |
0209 industrial biotechnology
Jaccard index business.industry Computer science Cosine similarity Hamming distance 02 engineering and technology computer.software_genre Variety (linguistics) 020901 industrial engineering & automation Similarity (network science) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Word (computer architecture) Natural language processing Sentence |
Zdroj: | 2018 International Conference on Artificial Intelligence and Data Processing (IDAP). |
DOI: | 10.1109/idap.2018.8620926 |
Popis: | The classification of the documents is at the beginning of the topics that are studied extensively today. Using text similarity, many areas are used, such as whether citations are quoted elsewhere or the information searched in search engines is fast and accurate. A variety of methods are used while looking for similarities between documents. Similarity measurements are made by two basic methods, word-based and sentence-based, during the comparison of several documents. While word-based similarity measurements are made, many distance measurement methods such as Jaccard, Dice, Cosine similarity are used. In this study, the paragraphs in different documents will be broken down by sentence basis and they will be represented by a graph, and a study will be done on the classification of the documents using Hamming distance measurements by XOR method of neighborhood matrices obtained from these documents. |
Databáze: | OpenAIRE |
Externí odkaz: |