Assessing clustering optimality with instability index

Many clustering algorithms need to define the number of desired clusters before fitting the model. This requirement can appear as a contradiction in an unsupervised scenario, however, in many real-word scenarios, the data scientist has often already an idea about a reasonable range of clusters. Unfortunately this doesn’t imply being able to define the optimal number of clusters without a further analysis. Cluster instability analysis is a very powerful tool to assess the optimality of a specific algorithm or configuration. The idea, developed in [1], is quite simple and can be easily understood considering a mechanical analogy. If we have some ball on a table, we want to group them so that small perturbations (movements) cannot alter the initial cluster assignment. For example, if two groups are very close, a perturbation can push a ball in another cluster, as shown in the following figure: In this case, we can say […]