A study of web pages classification based on image and text features

Autor: Chun-Te Ho, 何俊德
Rok vydání: 2004
Druh dokumentu: 學位論文 ; thesis
Popis: 92
In this thesis, we propose a new web pages grading method based on image and text content analysis to resolve Internet grading problem. The method would effectively detect unsuitable information on the Internet. The features of image and text are extracted from web page content. The system merges the two features for web page grading and gives the web page a grade. The method analyzes text and image of web page, improving previously method focused on text analysis only. This method uses machine-learning technique to achieve automatic classification. In order to analyze the web page instantaneously, we also propose an acceleration algorithm for possible on line web pages analysis. Our method has good classification efficiency, and can be used to build in a regular Internet environment. In experiment results, text classification accuracy is about 95%, and image classification accuracy is about 84%. In addition, the accuracy of web pages classification by merging text and image feature is 91.8%. In summary, our research classifies the information on the Internet. Hope to build a clean Internet environment, resolve information grading problem.
Databáze: Networked Digital Library of Theses & Dissertations