A Comparison of Reliability Coefficients for Ordinal Rating Scales

Alexandra de Raadt, Matthijs J. Warrens*, Roel J. Bosker, Henk A. L. Kiers

*Corresponding author voor dit werk

Onderzoeksoutput: ArticleAcademicpeer review

68 Citaten (Scopus)
577 Downloads (Pure)

Samenvatting

Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

Originele taal-2English
Pagina's (van-tot)519-543
Aantal pagina's25
TijdschriftJournal of Classification
Volume38
Nummer van het tijdschrift3
Vroegere onlinedatum22-apr.-2021
DOI's
StatusPublished - okt.-2021

Vingerafdruk

Duik in de onderzoeksthema's van 'A Comparison of Reliability Coefficients for Ordinal Rating Scales'. Samen vormen ze een unieke vingerafdruk.

Citeer dit