Feature Selection using Machine Learning Techniques Based on Search Engine Parameters

Autor: Yujian Li, Bonzou A. Kouassi, Willy K. Portier
Rok vydání: 2020
Předmět:
Zdroj: SPML
DOI: 10.1145/3432291.3432308
Popis: In the last two decades, Internet visibility became mandatory for any companies wishing to get exposure and get revenues. Among many ways to be visible on the Internet, one of the most important is to be on top of search engines' results for keywords relative to companies' business. It is the art of Search Engine Optimization (SEO), which is a collection of techniques to get more traffic from a search engine. More a website is SEO optimized, thus more search engines give it a high ranking on results' pages for a maximal exposure. So, Google, with 90% market share worldwide, is the main search engine outside of China (Baidu) and Russia (Yandex), and its algorithm is like a black box all marketers want to discover. Google claims to have more than 200 features in his algorithm made to rank results for queries among billions of pages. This article tries different machine learning methods to determine the most important parameters using a selection of 30 features in a dataset made with around 28,000 observations. A binary classification approach was done to detect if a keyword can be found or not in Top10 search engine result. During the simulation, the importance of features was determined to find the most important parameters used for building related search results. According to the research result, it leads that there are three kinds of parameters which influence the process of ranking the results on search engine Google for web pages: editorial features, notoriety features and technical features. Moreover, few features with minimum importance were found, for example, the low importance of using "https" protocol in a web resource.
Databáze: OpenAIRE