Popis: |
Classical scholars mostly relied on the topics discussed in the verses of Quran to identify their places of revelation. They didn't have access to today’s computing facilities. That’s why they didn’t and even couldn’t efficiently use numeric properties like word length, word count, letter count, etc. to classify verses of the Quran. To the best of our knowledge, these properties haven't been fully exploited for this purpose till today. In this work, we tried to fill this gap. We wrote a Java program that creates a feature-matrix consisting of 64 features for each verse by analyzing its contents. We split this feature matrix into a training set and a testing set and used the training set to train seven classifiers in WEKA. Then we applied the trained model on the testing set. We found that with both 80% and 90% split as well as with 10-fold cross-validation, Decision Table classifier performed the best (with an accuracy around 98%) in classifying verses for both types of vectorization methods we employed. When we removed the chapter-number attribute from our feature-matrix, the highest accuracy dropped to 80-82% for each (split, vectorization-method) combination and Random Forest classifier turned out to be the best performer. When we ignored both chapter number and verse number, Random Forest remained the best performer in almost every case but its accuracy dropped slightly to 78-80%. We also applied five clustering algorithms for the same purpose and found that Canopy algorithm achieves the highest accuracy (around 74%) in every experiment. Each classification and clustering algorithm took less than two seconds to run. Together, these results suggest that it is possible to efficiently and quite accurately classify verses of Quran using only their constituent characters and other features. These results suggest that the place of revelation of the verses of Quran have high correlation with the numeric properties of their constituents. |