Autor: |
Omolayo. Abegunde, Abayomi O. Agbeyangi, Safiriyu Eludiora |
Rok vydání: |
2020 |
Předmět: |
|
Zdroj: |
2020 International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS). |
Popis: |
The task of determining whether a pair (or more) documents were written by the same author comes under authorship verification. N-grams are sequences of elements appearing in texts, they can be words, POS tags, characters, or some other elements that can be encountered one after another in texts. The tasks in authorship verification were more challenging as it focuses on whether the target author and the text to be used have a closely related style. In this paper, an authorship verification task on a Yoruba blog posts is hereby presented. N-grams features were extracted from the corpus and inductive learning techniques was applied to build feature-based models in order to perform the automatic author identification. K-means clustering algorithm was used in the study since the supervised algorithm cannot be applied to the one-class classification of the dataset. The evaluation was done with Silhouette Coefficient algorithm which is used to evaluate unlabeled data. The result obtained is positive which indicates the data points have a strong relationship with the dataset. The obtained result signifies a yes relationship between the posts. This signifies that the posts were from the same author. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|