Use Word Cloud Image Of Web Page Text Content On Convolutional Neural Network (CNN) For Classification Of Web Pages.

Autor: Apandi, Siti Hawa, Sallim, Jamaludin, Mohamed, Rozlina
Předmět:
Zdroj: International Journal of Computing & Digital Systems; Feb2024, Vol. 15 Issue 1, p347-358, 12p
Abstrakt: In the modern digital era, people easily access the internet to find information through website visits. Many individuals are attracted to online pages featuring games and video content. Prolonged exposure to such web pages can result in internet addiction, leading to negative consequences. To address this issue, it is crucial to impose restrictions on websites offering gaming and streaming content. To accomplish this, an essential tool is needed to classify web pages based on their content. In the categorization process, the text content of the web page is initially extracted. Since conventional matrix representations are not suitable for processing lengthy web page text, this study employs an innovative method involving the use of word cloud images to visually represent words extracted from the web page text after data pre-processing. Words that appear most frequently in the web page text are displayed in larger fonts and centered in the word cloud image, reflecting the subject matter of the web page. A Convolutional Neural Network (CNN) is then utilized to identify patterns in the central part of the word cloud image, facilitating the categorization of web pages based on their content. The proposed model for classifying web pages achieves an accuracy rate of 0.86, significantly improving the accuracy of web page categorization. By leveraging insights gained from web page classification, authorities can proactively monitor online user behavior, identifying individuals struggling with internet addiction and offering help if needed. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index