ВИКОРИСТАННЯ МЕТОДІВ МАШИННОГО НАВЧАННЯ ДЛЯ СТВОРЕННЯ АНАЛІТИЧНОЇ ПЛАТФОРМИ НЕРУХОМОСТІ В УКРАЇНІ.

Autor: Глибовець, А. М., Мухопад, О. О.
Zdroj: NaUKMA Research Papers. Computer Science; 2019, Vol. 2, p32-37, 6p
Abstrakt: Recently, methods of machine learning are used to solve various problems. One of the areas where modern approaches are being introduced in real life is the real estate market. The most reputable real estate and development companies have state-of-the-art data analysts that allow them to optimize business processes, choose the most advantageous places to start construction, target their audience, and much more. Therefore, this paper considers the process of developing a model of machine learning to predict the price of real estate rental apartments in Kyiv by using regression algorithms. We describe the process of constructing a model of machine learning for the analysis of the value of real estate, which includes the search and preparation of a dataset, features choice, training of the machine learning model and its optimization with the help of Apache Spark, an open-source framefork for distributed computing. In order to engage in analytics, it is first necessary to obtain a sufficient amount of the training data. Unfortunately, we do not find it in open access to the ready-made datasets of real estate. We decided to build a dataset. For successful experiments, an important step is to select characteristics and prepare data. At first, the columns "city" and "area" are removed, as our dataset contains only apartments in Kyiv. Since the columns "with furniture", "with heating", "for repair", "with a balcony", "with a jacuzzi" are optional, they contained many missing data. These columns are also removed from the training sample. In addition, the data have a direct relationship between the branch / subway station and the area / number of rooms. This does not give our model a new information, so these attributes are removed. We analyzed such ML algorithms: linear regression, Random Forest, Gradient Boosting Trees. Based on the nature of the training dataset and our goals, we decided to use Gradient Boosting Trees. The "dom.ria" API was used to build the dataset. The data has been cleared and normalized. After that, the Gradient Boosting Trees model is trained by using techniques called grid search and cross validation. The Coefficient of variation was selected for the prediction of the quality metric. It received a value of 11,404 %. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index