Popis: |
Probabilistic models of competence assessment join the benefits of automation with human judgment. We start this paper by replicating two preexisting probabilistic models of peer assessment (PG1-bias and PAAS). Despite the use that both make of probability theory, the approach of these models is radically different. While PG1-bias is purely Bayesian, PAAS models the evaluation process in a classroom as a multiagent system, where each actor relies on the judgment of others as long as their opinions coincide. To reconcile the benefits of Bayesian inference with the concept of trust posed in PAAS, we propose a third peer evaluation model that considers the correlations between any pair of peers who have evaluated someone in common: PG-bivariate. The rest of the paper is devoted to a comparison with synthetic data from these three models. We show that PG1-bias produces predictions with lower root mean squared error (RMSE) than PG-bivariate. However, both models display similar behaviors when assessing how to choose the next assignment to be graded by a peer, with an “RMSE decreasing policy” reporting better results than a random policy. Fair comparisons among the three models show that PG1-bias makes the lowest error in situations of scarce ground truths. Nevertheless, once nearly 20% of the teacher’s assessments are introduced, PAAS sometimes exceeds the quality of PG1-bias’ predictions by following an entropy minimization heuristic. PG-bivariate, our new proposal to reconcile PAAS’ trust-based approach with PG1-bias’ theoretical background, obtains a similar percentage of error values to those of the original models. Future work includes applying the models to real experimental data and exploring new heuristics to determine which teacher’s grade should be obtained next to minimize the overall error. |