Behaviour analysis of internet survey completion using decision trees
Autor: | Lung-Hsing Kuo, Hung-Jen Yang, Che-Chern Lin |
---|---|
Rok vydání: | 2009 |
Předmět: |
Incremental decision tree
business.industry Computer science Decision tree learning Exploratory research Decision tree Library and Information Sciences Machine learning computer.software_genre Computer Science Applications The Internet Artificial intelligence business Decision model computer Information Systems |
Zdroj: | Online Information Review. 33:117-134 |
ISSN: | 1468-4527 |
DOI: | 10.1108/14684520910944427 |
Popis: | PurposeThe purpose of this paper is to explore teachers' behaviours in completing an internet survey using decision trees. Furthermore, to reduce the complexity of the decision trees, a statistical technique was used to decrease the number of input variables in the decision trees.Design/methodology/approachA dataset of 47,647 samples was used to build the decision trees. These samples were collected from an internet survey of teachers in Taiwan. The output of the decision trees was the answering time (the time taken to complete the internet questionnaire). Eight variables were selected as the inputs for the decision trees. Two techniques were employed to build the decision trees – the exhaustive chi‐squared automatic interaction detector (ECHAID) and classification and regression tree (CRT) analysis. To reduce the complexity of the decision models, factor analysis technique was used to decrease the data dimensions (number of input variables) and to obtain a simplified decision model. One‐way ANOVA was used to validate the effects of the dimension reduction.FindingsFrom the results of the factor analysis, a simplified decision tree is recommended using four input variables – teaching years, school level, sex and area. The classification accuracy of the simplified model is statistically equivalent to that of the original one, which used eight input variables.Originality/valueThe complexity of decision trees theoretically depends on the number of input variables. This study used a statistical technique to decrease the number of input variables and thereby reduce the complexity of the decision trees. A statistical technique was employed to validate that the classification accuracy is not statistically different between the original decision model and the simplified one. The decision models proposed in this paper can be applied in estimating the answering time for completing a questionnaire during an internet survey. |
Databáze: | OpenAIRE |
Externí odkaz: |