An Improved Optimized Web Page Classification using Firefly Algorithm with NB Classifier (WPCNB)
Autor: | Anju Singh, Divakar Singh, Khushboo Bhatt |
---|---|
Rok vydání: | 2016 |
Předmět: |
computer.internet_protocol
Computer science Feature vector 02 engineering and technology Crawling computer.software_genre Machine learning 01 natural sciences law.invention Search engine Naive Bayes classifier Web query classification law 020204 information systems Web page 0202 electrical engineering electronic engineering information engineering Firefly algorithm 0101 mathematics Cluster analysis business.industry 010102 general mathematics Web content Hypertext Data mining Artificial intelligence business computer Classifier (UML) XML |
Zdroj: | International Journal of Computer Applications. 146:15-21 |
ISSN: | 0975-8887 |
DOI: | 10.5120/ijca2016910668 |
Popis: | web is a huge repository of information which needs for accurate automated classifiers for Web pages to maintain Web directories and to increase search engines" performance. In web page classification problem each term in each HTML/XML tag of each Web page can be taken as a feature, an efficient methods to select best features to reduce feature space of the Web page classification problem derived here. Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining, web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. As in derived work reviewed in Web page classification, the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages. This work, our aimed to optimize best features selection for Web page classification problem. Since Firefly Algorithm (FA) is a recent nature inspired optimization algorithm, that simulates the flash pattern and characteristics of fireflies. Clustering is a popular data analysis technique to identify homogeneous groups of objects based on the values of their attributes. Here FA is used for clustering on benchmark problems which is being found more suitable than Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and other nine methods used. The web page optimization using Naive Bayes classifier (WPCNB) is an improved optimized web page classification using firefly algorithm with NB classifier. this work is tested on research banking data set where firefly algorithm used for web optimization and Naive Bayes (NB) classifier used for classification of pages in contrast to selected pages with reference to different fireflies. The entitled work is being found better in terms of feature measure(FM),accuracy, precision etc. parameters with respect to existing key concepts.it is also an search optimization approach and can be enhanced by different genetic algorithm(GA)based classifiers use in future. |
Databáze: | OpenAIRE |
Externí odkaz: |