Abstrakt: |
Web crawlers gather and analyze a large amount of data available online to obtain specific forms of objective data, such as news. Web crawlers are becoming more important since big data is used in numerous different sectors and web data is rising dramatically each year. However, when analyzing large volumes of information and making rapid decisions, the organization frequently uses minimal data, which leads to inefficient choices. In this paper, the minibatch stochastic gradient descent (SGD) optimization and radial basis function SVM are proposed to assist organizations in the targeted crawling of relevant online artifacts and semantically matching them against internal big data for better strategy decisions. The proposed method has been used and extensively evaluated in the e-procurement field. The minibatch SGD optimization and radial basis function SVM has gradually been expanded to include more fields such as robot programming and cloud hosting. The existing methods of web crawler for pharmacokinetics (WCPK), automatic extraction method, malicious website detection techniques, and BERT with softmax layer method are used to justify the effectiveness of the proposed minibatch SGD optimization and the radial basis function SVM method. The proposed method achieves better precision, recall, and f1-measure of 99.25%, 98.91%, and 99.57% on DMOZ dataset and 96.23%, 94.71%, and 97.53% on synthesized dataset when compared to the existing methods. [ABSTRACT FROM AUTHOR] |