Clustering Example: 4 Steps You Should Know

This article describes k-means clustering example and provide a step-by-step guide summarizing the different steps to follow for conducting a cluster analysis on a real data set using R software.

We’ll use mainly two R packages:

Install these packages, as follow:

install.packages(c("cluster", "factoextra"))

A rigorous cluster analysis can be conducted in 3 steps mentioned below:

  1. Data preparation
  2. Assessing clustering tendency (i.e., the clusterability of the data)
  3. Defining the optimal number of clusters
  4. Computing partitioning cluster analyses (e.g.: k-means, pam) or hierarchical clustering
  5. Validating clustering analyses: silhouette plot

Here, we provide quick R scripts to perform all these steps.