Research on social data by means of cluster analysis

Autor: Camila Maione, Donald R. Nelson, Rommel Melgaço Barbosa
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: Applied Computing and Informatics, Vol 15, Iss 2, Pp 153-162 (2019)
Druh dokumentu: article
ISSN: 2210-8327
DOI: 10.1016/j.aci.2018.02.003
Popis: This paper presents a data mining study and cluster analysis of social data obtained on small producers and family farmers from six country cities in Ceará state, northeast Brazil. The analyzed data involve demographic, economic, agriculture and food insecurity information. The goal of the study is to establish profiles for the small producer families that reside in the region and to identify relevant features which differentiate these profiles. Moreover, we provide an efficient data mining methodology for analysis of social data sets which is capable of handling its natural challenges, such as mixed variables and abundance of null values. We use the Silhouette method for the estimation of the best number of natural groups within the data, along with the Partitioning Around Medoids clustering algorithm in order to compute the profiles. The Correlation-Based Feature Selection method is used to identify which social criteria are the most important to differentiate the families from each profile. Classification models based on support vector machines, multilayer perceptron and decision trees were developed aiming to predict in which of the identified clusters an arbitrary family would be best fit. We obtained a good separation of the families into two clusters, and a multilayer perceptron model with approximately 93.5% prediction accuracy. Keywords: Clustering, Social data, Classification, Pam, Data mining
Databáze: Directory of Open Access Journals