Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs-Reference-Cited by-同舟云学术

Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs

Published:2022-07-22 Issue:3 Volume:39 Page:487-509
ISSN:0176-4268
Container-title:Journal of Classification
language:en
Short-container-title:J Classif

Author:

Warrens Matthijs J.^ORCID,van der Hoef Hanneke

Abstract

AbstractIn unsupervised machine learning, agreement between partitions is commonly assessed with so-called external validity indices. Researchers tend to use and report indices that quantify agreement between two partitions for all clusters simultaneously. Commonly used examples are the Rand index and the adjusted Rand index. Since these overall measures give a general notion of what is going on, their values are usually hard to interpret. The goal of this study is to provide a thorough understanding of the adjusted Rand index as well as many other partition comparison indices based on counting object pairs. It is shown that many overall indices based on the pair-counting approach can be decomposed into indices that reflect the degree of agreement on the level of individual clusters. The decompositions (1) show that the overall indices can be interpreted as summary statistics of the agreement on the cluster level, (2) specify how these overall indices are related to the indices for individual clusters, and (3) show that the overall indices are affected by cluster size imbalance: if cluster sizes are unbalanced these overall measures will primarily reflect the degree of agreement between the partitions on the large clusters, and will provide much less information on the agreement on smaller clusters. Furthermore, the value of Rand-like indices is determined to a large extent by the number of pairs of objects that are not joined in either of the partitions.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Psychology (miscellaneous),Mathematics (miscellaneous)

Link

https://link.springer.com/content/pdf/10.1007/s00357-022-09413-z.pdf

Reference77 articles.

1. Albatineh, A.N., & Niewiadomska-Bugaj, M. (2011a). Correcting Jaccard and other similarity indices for chance agreement in cluster analysis. Advances in Data Analysis and Classification, 5(3), 179–200.

2. Albatineh, A.N., & Niewiadomska-Bugaj, M. (2011b). MCS: A method for finding the number of clusters. Journal of Classification, 28, 184–209.

3. Albatineh, A.N., Niewiadomska-Bugaj, M., & Mihalko, D. (2006). On similarity indices and correction for chance agreement. Journal of Classification, 23 (2), 301–313.

4. Alok, A.K., Saha, S., & Ekbal, A. (2014). Development of an external cluster validity index using probabilistic approach and min-max distance. International Journal of Computer Information Systems and Industrial Management Applications, 6, 494–504.

5. Anderson, D.T., Bezdek, J.C., Popescu, M., & Keller, J.M. (2010). Comparing fuzzy, probabilistic, and possibilistic partitions. IEEE Transactions on Fuzzy Systems, 18, 906–917.

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development of a novel representation of drug 3D structures and enhancement of the TSR-based method for probing drug and target interactions;Computational Biology and Chemistry;2024-10

2. A review of model evaluation metrics for machine learning in genetics and genomics;Frontiers in Bioinformatics;2024-09-10

3. Block-diagonal idiosyncratic covariance estimation in high-dimensional factor models for financial time series;Journal of Computational Science;2024-09

4. Proposal of a workplace classification model for heart attack accidents from the field of occupational safety and health engineering;Heliyon;2024-09

5. CHAI: consensus clustering through similarity matrix integration for cell-type identification;BRIEF BIOINFORM;2024