Detection of Arabic Non-Referential Pronouns using Self-Training Method and Similarity Measures
Autor: | Chiraz Ben Othmane Zribi, Saoussen Mathlouthi Bouzid |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
business.industry Semi-supervised learning computer.software_genre Support vector machine Set (abstract data type) Identification (information) ComputingMethodologies_PATTERNRECOGNITION Similarity (psychology) Selection (linguistics) Relevance (information retrieval) Artificial intelligence business computer Natural language processing Test data |
Zdroj: | AICCSA |
Popis: | The classification of pronouns as referential or non-referential is necessary for many NLP tasks. However, there are few works interested in this problem in the Arabic language. In this paper, we present a semi-supervised machine learning approach based on a Self-training SVM method for the identification of non-referential pronouns in the Arabic texts. A set of patterns-based and linguistic-based information is used as classification features in our machine learning system. The proposed Self-Training SVM algorithm includes three steps: training, prediction and selection step. It trains SVM classifier on a small set of labeled data, predicts labels of unlabeled data, selects the most accurate and the most informative newly labeled data and adds them to the training dataset. The selection step uses some geometric measures and analyses their relevance on the classification performance. The evaluation of our approach on the training and test data presents good results that can reach up to 96.85% of precision. |
Databáze: | OpenAIRE |
Externí odkaz: |