Profiling twitter users using autogenerated features invariant to data distribution notebook for PAN at CLEF 2019

Autor: Fagni T., Tesconi M.
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: CLEF (Working Notes), Lugano, Switzerland, 2019
info:cnr-pdr/source/autori:Fagni T.; Tesconi M./congresso_nome:CLEF (Working Notes)/congresso_luogo:Lugano, Switzerland/congresso_data:2019/anno:2019/pagina_da:/pagina_a:/intervallo_pagine
Popis: With the diffusion of Web and Social Media, automatic user profiling classifiers applied to digital contents have become extremely important in application contexts related to social and forensic studies. In many research papers on this topic, an important part of the work is devoted to a costly manual "feature engineering" phase, where the semantic, syntactic, and often language-dependent features need to be accurately chosen to be relevant for profilation task. Differently from this approach, in this work we propose a Twitter user profiling classifier which exploits deep learning techniques to automatically generate user features being a) optimal for user profilation task, and b) able to fight covariance shift problem due to data distribution differences in training and test sets. In the best configuration found, the built system is able to achieve very interesting accuracy results on both English and Spanish languages, with an average final accuracy of more than 0.83.
Databáze: OpenAIRE