A Deep Learning Approach for Author Profiling using Word Embeddings

Autor: Dr. T. Raghunadha Reddy, B. Madhubala, G. Varshini, S. K . Fayaz
Rok vydání: 2023
Předmět:
Zdroj: International Journal for Research in Applied Science and Engineering Technology. 11:1553-1558
ISSN: 2321-9653
DOI: 10.22214/ijraset.2023.51765
Popis: The task of author profiling involves predicting various characteristics of an author based on their writing style, such as their age, gender, native language, and personality traits. The PAN2013 shared task focused on author profiling in social media, where participants were tasked with predicting the gender and age of Twitter users based on their tweets. In recent years, deep learning approaches have become popular for author profiling. Two popular models are GloVe and FastText are used by the researchers to generate word embeddings. GloVe is a word embedding model that represents words as vectors in a highdimensional space, while FastText takes into account subword information to represent words. Both models have been shown to be effective for various natural language processing tasks. For the PAN2013 task, participants used various deep learning models with GloVe and FastText embeddings to predict the age and gender of Twitter users. Some approaches used a combination of multiple models to improve the performance. In this article, we focused on improving the accuracy of age and gender classification on the PAN2013 dataset, which is a benchmark corpus for author profiling. We utilized deep learning models such as Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) classifiers to classify authors based on their age and gender. We also used pre-trained word embeddings such as FastText and GloVe to represent the text data. Our results showed that the LSTM model achieved an accuracy of 57.53% for age classification and 60.48% gender classification, while the CNN model achieved an accuracy of 59.32% for age classification and 52.21% for gender classification. We observed that these models have been shown to be effective for various natural language processing tasks and can be used for other author profiling tasks as well.
Databáze: OpenAIRE