Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms

Autor: Feriştah Dalkılıç, Uygar Takazoğlu, Yunus Doğan, Kemal Can Kara, Recep Alp Kut
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Volume: 8, Issue: 3 76-84
International Journal of Applied Mathematics Electronics and Computers
ISSN: 2147-8228
Popis: Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position.
Databáze: OpenAIRE