A Novel Approach for Ontology-Based Dimensionality Reduction for Web Text Document Classification
Autor: | Goda Ismaeel Salama, Khaled M. Badran, Mohamed K. Elhadad |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Computer Networks and Communications Computer science Feature vector Feature extraction WordNet Linear classifier 02 engineering and technology Ontology (information science) Machine learning computer.software_genre 03 medical and health sciences Text processing Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering tf–idf business.industry Dimensionality reduction Document clustering Computer Graphics and Computer-Aided Design Computer Science Applications Statistical classification 030104 developmental biology Vector space model 020201 artificial intelligence & image processing Artificial intelligence Data mining business computer Software |
Zdroj: | ICIS |
ISSN: | 2166-7179 2166-7160 |
DOI: | 10.4018/ijsi.2017100104 |
Popis: | Dimensionality reduction of feature vector size plays a vital role in enhancing the text processing capabilities; it aims in reducing the size of the feature vector used in the mining tasks (classification, clustering, etc.). This paper proposes an efficient approach to be used in reducing the size of the feature vector for web text document classification process. This approach is based on using WordNet ontology, utilizing the benefit of its hierarchal structure, to eliminate words from the generated feature vector that has no relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting method. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach using several experiments. The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall. |
Databáze: | OpenAIRE |
Externí odkaz: |