Popis: |
Intelligent systems today are increasingly required to predict or imitate human perception and behavior. In this, feature-based Machine Learning (ML) models are still common, since collecting appropriate training data from human subjects for the data-hungry Deep Learning models is costly. Considerable effort is put into ensuring data quality, particularly in crowd-annotation platforms (e.g., Amazon MTurk), where fees of top workers can be several times higher than the median. The common knowledge is that quality of input data is beneficial for the end quality of ML models, though quantitative estimations of the effect are rare. In our study, we investigate how labeled data quality affects the accuracy of models that predict users’ subjective impressions—per the scales of Complexity, Aesthetics and Orderliness assessed by 70 subjects. The material, about 500 web page screenshots, was also labeled by 11 workers of varying diligence, whose work quality was validated by another 20 verifiers. Unexpectedly, we found significant negative correlations between the workers’ precision and R2s of the models, for two out of the three scales (r11=−0.768 for Aesthetics, r11=−0.644 for Orderliness). We speculate that the controversial effect might be explained by a bias in the indiligent labelers’ output that corresponds to subjectivity in human perception of visual objects. |