Popis: |
Equating methods are designed to adjust between alternate versions of assessments targeting the same content at the same level, with the aim that scores from the different versions can be used interchangeably. The statistical processes used in equating have, however, been extended to statistically “link” assessments that differ, such as assessments of the same qualification type that assess different subjects. Despite careful debate on statistical linking in the literature, it can be tempting to apply equating methods and conclude that they have provided a definitive answer on whether a qualification is harder or easier than others. This article offers a novel demonstration of some limits of statistical equating by exploring how accurately various equating methods were able to equate between identical assessments. To do this, we made use of pairs of live assessments that are “cover sheet” versions of each other, that is, identical assessments with different assessment codes. The results showed that equating errors with real-world impact (e.g., an increase of 5–10 per cent in the proportion of students achieving a grade A) occurred even where equating conditions were apparently favourable. No single method consistently produced more accurate results than the others. The results emphasise the importance of considering multiple sources of information to make final grade boundary decisions. More broadly, the results are a reminder that if applied uncritically, equating methods can lead to incorrect conclusions about the relative difficulty of assessments. |