Effects of Differentially Time-Consuming Tests on Computer-Adaptive Test Scores

Autor: Brent Bridgeman, Frederick Cline
Rok vydání: 2004
Předmět:
Zdroj: Journal of Educational Measurement. 41:137-148
ISSN: 1745-3984
0022-0655
DOI: 10.1111/j.1745-3984.2004.tb01111.x
Popis: Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the Graduate Record Examination if they were to finish before time expires. At the higher-ability levels, even more guessing was required because the questions administered to higher-ability examinees were typically more time consuming. Because the scoring model is not designed to cope with extended strings of guesses, substantial errors in ability estimates can be introduced when CATs have strict time limits. Furthermore, examinees who are administered tests with a disproportionate number of time-consuming items appear to get lower scores than examinees of comparable ability who are administered tests containing items that can be answered more quickly, though the issue is very complex because of the relationship of time and difficulty, and the multidimens,ionality of the test. The Graduate Record Examination General Test (GRE) is a computer-adaptive test (CAT) of verbal, quantitative, and analytical reasoning skills. Unlike some CATs, the GRE has a fixed number of questions and strict time limits on each section. According to the GRE Technical Manual (Briel, O'Neill, & Scheuneman, 1993) "GRE General and Subject Tests are not intended to be speeded" (p. 32). When the CAT version of the GRE was first introduced, "time limits were set with the intention that almost all examinees would have sufficient time to answer all items" (Schaeffer et al., 1995, p. 18). Nevertheless, fairly strict time limits were imposed in order to maintain comparability with the existing paper-and-pencil forms (and a linear computer-based test), which were somewhat speeded tests. Research on the linear computer-administered analytical section of the GRE General Test (GRE-A) suggested that, although completion rates were reasonably high, many students had to make random guesses at the end in order to finish (Schnipke & Scrams, 1997). Because the GRE is a CAT, different examinees receive different sets of questions. The three parameter logistic scoring model takes account of the difficulty differences in these questions. Examinees who get difficult questions are not disadvantaged relative to examinees who get easier questions, but unidimensional scoring models do not take into account differences in the amount of time it takes to respond to different questions (Hambleton & Swaminathan, 1985). A fair assessment on a speeded test would seem to require that no examinee should, by chance, receive a set of items that takes longer to answer than the items given to another examinee. Bridgeman and Cline (2000) presented evidence that some questions on the quantitative and analytical sections of the GRE CAT could be answered more quickly than others. Much of
Databáze: OpenAIRE