Research on social data by means of cluster analysis
Autor: | Rommel Melgaço Barbosa, Camila Maione, Donald R. Nelson |
---|---|
Rok vydání: | 2019 |
Předmět: |
0106 biological sciences
lcsh:T58.5-58.64 lcsh:Information technology Computer science Decision tree Feature selection 02 engineering and technology computer.software_genre 01 natural sciences Medoid Computer Science Applications Silhouette Support vector machine Null (SQL) Multilayer perceptron 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining Cluster analysis computer Software 010606 plant biology & botany Information Systems |
Zdroj: | Applied Computing and Informatics, Vol 15, Iss 2, Pp 153-162 (2019) |
ISSN: | 2210-8327 |
Popis: | This paper presents a data mining study and cluster analysis of social data obtained on small producers and family farmers from six country cities in Ceará state, northeast Brazil. The analyzed data involve demographic, economic, agriculture and food insecurity information. The goal of the study is to establish profiles for the small producer families that reside in the region and to identify relevant features which differentiate these profiles. Moreover, we provide an efficient data mining methodology for analysis of social data sets which is capable of handling its natural challenges, such as mixed variables and abundance of null values. We use the Silhouette method for the estimation of the best number of natural groups within the data, along with the Partitioning Around Medoids clustering algorithm in order to compute the profiles. The Correlation-Based Feature Selection method is used to identify which social criteria are the most important to differentiate the families from each profile. Classification models based on support vector machines, multilayer perceptron and decision trees were developed aiming to predict in which of the identified clusters an arbitrary family would be best fit. We obtained a good separation of the families into two clusters, and a multilayer perceptron model with approximately 93.5% prediction accuracy. Keywords: Clustering, Social data, Classification, Pam, Data mining |
Databáze: | OpenAIRE |
Externí odkaz: |