OCR for historical Kannada documents using clustering methods
Autor: | Y H Sharathkumar, P. Ravi, C. Naveena |
---|---|
Rok vydání: | 2020 |
Předmět: |
Multidisciplinary
Point (typography) business.industry Computer science k-means clustering Scale-invariant feature transform Pattern recognition Optical character recognition computer.software_genre Hierarchical clustering Data set Similarity (network science) Artificial intelligence business Cluster analysis computer |
Zdroj: | Indian Journal of Science and Technology. 13:3652-3663 |
ISSN: | 0974-5645 0974-6846 |
DOI: | 10.17485/ijst/v13i35.1287 |
Popis: | Motivation: In India, the Language Kannada is an ancient and official language in Karnataka State. The study of ancient Kannada scripts from stone carvings, leaf, metal, cloth, paper and other sources enhances our knowledge on the traditions and culture practiced in Karnataka. Due to Poor Quality, variability and the contrast, the Kannada ancient scripts become very challenging to extract the information or to recognize the characters. Objectives: To design a suitable Optical Character Recognition (OCR) technique to read ancient Kannada scripts. Method: Clustering by fast search and find of density peaks is a state-of-the-art density-based clustering algorithm that can effectively find clusters with arbitrary shapes. However, it requires to calculate the distances between all the points in a data set to determine the density and separation of each point. Consequently, its computational cost is extremely high in the case of large-scale data sets. In this work the given document is preprocessed. The features alike SIFT and SURF are extracted and clustered using K-Means clustering. The similarity is computed using different measures. Findings: The classification accuracy was studied under different clustering methods like Kmeans, Agglomerative, Density based clustering with distance based measures like Euclidean and Manhattan. To evaluate the performance of the proposed method, we created our own database of Ashok, Kadamba, Hoysala and Mysuru scripts and experiment was conducted in a database of 4 classes under 70, 50 and 30 different training models from each class. Novelty: We propose a K-means clustering using SIFT and SURF for Kannada ancient manuscript. Experiment was conducted in our own database to validate the performance of the presented system Keywords: Historical Kannada; Karnataka; SIFT; SURF; KMeans |
Databáze: | OpenAIRE |
Externí odkaz: |