Knowledge discovery in sociological databases: An application on general society survey dataset
Autor: | Lianjun Dai, Zhiwen Pan, Jesus Pacheco, Jiangtian Li, Jun Zhang, Yiqiang Chen |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Structure (mathematical logic)
Technology Information retrieval business.industry Data management media_common.quotation_subject knowledge discovery Feature selection data mining Engineering (General). Civil engineering (General) Data set General Social Survey crowdsourced big data and analytics Knowledge extraction Computer Science (miscellaneous) Business Management and Accounting (miscellaneous) Decision Sciences (miscellaneous) Quality (business) data management TA1-2040 Cluster analysis business media_common |
Zdroj: | International Journal of Crowd Science, Vol 3, Iss 3, Pp 315-332 (2019) |
ISSN: | 2398-7294 |
DOI: | 10.1108/IJCS-09-2019-0023/full/pdf?title=knowledge-discovery-in-sociological-databases-an-application-on-general-society-survey-dataset |
Popis: | PurposeThe General Society Survey(GSS) is a kind of government-funded survey which aims at examining the Socio-economic status, quality of life, and structure of contemporary society. GSS data set is regarded as one of the authoritative source for the government and organization practitioners to make data-driven policies. The previous analytic approaches for GSS data set are designed by combining expert knowledges and simple statistics. By utilizing the emerging data mining algorithms, we proposed a comprehensive data management and data mining approach for GSS data sets.Design/methodology/approachThe approach are designed to be operated in a two-phase manner: a data management phase which can improve the quality of GSS data by performing attribute pre-processing and filter-based attribute selection; a data mining phase which can extract hidden knowledge from the data set by performing data mining analysis including prediction analysis, classification analysis, association analysis and clustering analysis.FindingsAccording to experimental evaluation results, the paper have the following findings: Performing attribute selection on GSS data set can increase the performance of both classification analysis and clustering analysis; all the data mining analysis can effectively extract hidden knowledge from the GSS data set; the knowledge generated by different data mining analysis can somehow cross-validate each other.Originality/valueBy leveraging the power of data mining techniques, the proposed approach can explore knowledge in a fine-grained manner with minimum human interference. Experiments on Chinese General Social Survey data set are conducted at the end to evaluate the performance of our approach. |
Databáze: | OpenAIRE |
Externí odkaz: |