IntraClass Correlations

10/07/2022

Most measurements in the behavioral sciences involve measurement error, but judgments made by humans are especially plagued by this problem. Since measurement error can seriously affect statistical analysis and interpretation, it is important to assess the amount of such error by calculating a reliability index. Many of the reliability indices available can be viewed as versions of the intraclass correlation, typically a ratio of the variance of interest over the sum of the variance of interest plus error (Bartko, 1966 ; Ebel, 1951 ; Haggard, 1958). There are numerous versions of the intraclass correlation coefficient (ICC) that can give quite different results when applied to the same data. Unfortunately, many researchers are not aware of the differences between the forms, and those who are often fail to report which form they used. Each form is appropriate for specific situations defined by the experimental design and the conceptual intent of the study.

Intraclass vs. Interclass correlations.

To measure the bivariate relation of variables representing different measurement classes, one must use an interclass correlation coefficient, of which there is but one in common use, the Pearson r. E.g. for the correlation between LDL and SBP, two variables representing different classes of variables, they do not share metric or variance. But when one is interested in the relationship among variables of a common class, which means variables that share both their metric and variance, intraclass correlation coefficients (ICCs) are alternative statistics for measuring homogeneity, both for pairs of measurements but also for larger sets of measurements. E.g. one wants to see how three medical devices score the same patient in order to estimate how consistent the device is. The method can use the ICC to check reliability (test-retest) and also stability, how the device works over longer time periods. The ICC can take values from 0 to 1, so say >0.80 is excellent reliability. Often the ICC is categorized as Poor <=0.40, Fair 0.41-0.60, Good 0.61-0.80, Excellent 0.81-1.0. These categories are dependent on the situation since human scoring is often more soft than scoring by medical devices, where the latter would produce higher ICC values.

To know which ICC to apply it is recommended to use a stastistician since there is up to 10 different ICCs based on one- or two-way ANOVA-models with different assumptions regarding randomization factors and generalization. Hans Fagertun can assist in the planning and analyses of such statistics.


Capturo is a Norwegian contract research organisation (CRO) serving the pharmaceutical industry. Our expertise is focused on biostatistical services and data management of clinical studies, and we have 25 years of experience. Pharmaceutical and biotech companies are among our clients. We are centrally located at Skjetten, next to Olavsgård hotel,
between Oslo Airport and Oslo City.


Powered by Webnode
Create your website for free! This website was made with Webnode. Create your own for free today! Get started