Popis: |
Deep learning has dramatically improved the performance of automated systems on a range of tasks including spoken language assessment. One of the issues with these deep learning approaches is that they tend to be overconfident in the decisions that they make, with potentially serious implications for deployment of systems for high-stakes examinations. This paper examines the use of ensemble approaches to improve both the reliability of the scores that are generated, and the ability to detect where the system has made predictions beyond acceptable errors. In this work assessment is treated as a regression problem. Deep density networks, and ensembles of these models, are used as the predictive models. Given an ensemble of models measures of uncertainty, for example the variance of the predicted distributions, can be obtained and used for detecting outlier predictions. However, these ensemble approaches increase the computational and memory requirements of the system. To address this problem the ensemble is distilled into a single mixture density network. The performance of the systems is evaluated on a free speaking prompt-response style spoken language assessment test. Experiments show that the ensembles and the distilled model yield performance gains over a single model, and have the ability to detect outliers. |