A Comparison of Reliability Coefficients for Ordinal Rating Scales

Autor:	Alexandra de Raadt, Matthijs J. Warrens, Henk A.L. Kiers, Roel Bosker
Přispěvatelé:	Research and Evaluation of Educational Effectiveness, Psychometrics and Statistics
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Kendall’s tau-b Correlation coefficient Intraclass correlation Library and Information Sciences 01 natural sciences Cohen’s kappa 010104 statistics & probability Mathematics (miscellaneous) Cohen's kappa 0504 sociology Rating scale Statistics Spearman’s rho 0101 mathematics Reliability (statistics) Mathematics 05 social sciences 050401 social sciences methods Linearly weighted kappa Interval Scale Quadratically weighted kappa Inter-rater reliability Pearson’s correlation Psychology (miscellaneous) Statistics Probability and Uncertainty Kappa
Zdroj:	Journal of Classification, 38(3), 519-543. SPRINGER
ISSN:	0176-4268
Popis:	Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3145dc307a1c29be85acd550ab672d32 https://research.rug.nl/en/publications/c2c89a56-95f7-4ad7-b43b-4e3d6f2ba409 Zobrazit plný text záznamu Full text from SpringerLink