How to compute inter-rater reliability metrics (Cohen’s Kappa, Fleiss’s Kappa, Cronbach Alpha, Krippendorff Alpha, Scott’s Pi, Inter-class correlation) in Python

Recently, I was involved in some annotation processes involving two coders and I needed to compute inter-rater reliability scores. There are multiple measures for calculating the agreement between two or more than two coders/annotators. If you have a question regarding “which measure to use in your case?”, I would suggest reading (Hayes & Krippendorff, 2007) …