Discovering the same job ads expressed with the different sentences by using hybrid clustering algorithms
Autor: | Feriştah Dalkılıç, Uygar Takazoğlu, Yunus Doğan, Kemal Can Kara, Recep Alp Kut |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
business.industry
Turkish Computer science Big data Mühendislik Information technology General Medicine English language Character encodings in HTML computer.software_genre language.human_language Core (game theory) Engineering language Artificial intelligence Resource pool Cluster analysis business Hybrid clustering algorithms Job ads Machine learning Natural language processing computer Natural language processing |
Zdroj: | Volume: 8, Issue: 3 76-84 International Journal of Applied Mathematics Electronics and Computers |
ISSN: | 2147-8228 |
Popis: | Text mining studies on job ads have become widespread in recent years to determine the qualifications required for each position. It can be said that the researches made for Turkish are limited while a large resource pool is encountered for the English language. Kariyer.Net is the biggest company for the job ads in Turkey and 99% of the ads are Turkish. Therefore, there is a necessity to develop novel Natural Language Processing (NLP) models in Turkish for analysis of this big database. In this study, the job ads of Kariyer.Net have been analyzed, and by using a hybrid clustering algorithm, the hidden associations in this dataset as the big data have been discovered. Firstly, all ads in the form of HTML codes have been transformed into regular sentences by the means of extracting HTML codes to inner texts. Then, these inner texts containing the core ads have been converted into the sub ads by traditional methods. After these NLP steps, hybrid clustering algorithms have been used and the same ads expressed with the different sentences could be managed to be detected. For the analysis, 57 positions about Information Technology sectors with 6,897 ad texts have been focused on. As a result, it can be claimed that the clusters obtained contain useful outcomes and the model proposed can be used to discover common and unique ads for each position. |
Databáze: | OpenAIRE |
Externí odkaz: |
načítá se...