A Novel Approach for Ontology-Based Dimensionality Reduction for Web Text Document Classification

Autor: Goda Ismaeel Salama, Khaled M. Badran, Mohamed K. Elhadad
Rok vydání: 2017
Předmět:
Zdroj: ICIS
ISSN: 2166-7179
2166-7160
DOI: 10.4018/ijsi.2017100104
Popis: Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting method. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach using several experiments. The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.
Databáze: OpenAIRE