Learning Age and Gender Using Co-occurrence of Non-dictionary Words from Stylistic Variations.

Autor: Prasath, R. Rajendra
Zdroj: Rough Sets & Current Trends in Computing (9783642135286); 2010, p544-550, 7p
Abstrakt: This work attempts to report the stylistic differences in blogging for gender and age group variations using slang word co-occurrences. We have mainly focused on co-occurrence of non dictionary words across bloggers of different gender and age groups. For this analysis, we have focused on the feature use of slang words to study the stylistic variations of bloggers across various age groups and gender. We have modeled the co-occurrences of slang words used by bloggers as graph based model where nodes are slang words and edges represent the number of cooccurrences and studied the variations in predicting age groups and gender. We have used demographically tagged blog corpus from ICWSM Spinner dataset for these experiments and used Naive Bayes classifier with 10 fold cross validations. Preliminary results shows that the concurrence of of slang words could be a better choice for predicting age and gender. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index