A Comparison of Reliability Coefficients for Ordinal Rating Scales

Alexandra de Raadt, Matthijs J. Warrens*, Roel J. Bosker, Henk A. L. Kiers

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

66 Citations (Scopus)
557 Downloads (Pure)

Abstract

Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

Original languageEnglish
Pages (from-to)519-543
Number of pages25
JournalJournal of Classification
Volume38
Issue number3
Early online date22-Apr-2021
DOIs
Publication statusPublished - Oct-2021

Keywords

  • Cohen’s kappa
  • Inter-rater reliability
  • Intraclass correlation
  • Kendall’s tau-b
  • Linearly weighted kappa
  • Pearson’s correlation
  • Quadratically weighted kappa
  • Spearman’s rho

Fingerprint

Dive into the research topics of 'A Comparison of Reliability Coefficients for Ordinal Rating Scales'. Together they form a unique fingerprint.

Cite this