Unsupervised Learning — K-Mean Clustering

Hemendra singh
2 min readOct 16, 2021

Goal is to find the right value of K and make K clusters.

Steps below are based on Basic Euclidean distance metric

Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers.

  1. Select ‘c’ cluster centers randomly.
  2. Calculate the distance between each data point and cluster centers using the Euclidean distance metric as follows
Euclidean Distance

3. Assign the point to a center where the calculated distance is minimum (points grouped to the same center are now called a cluster)

4. Calculate the cluster center

5. cluster center now become new cluster center, reassign the datapoint to nearest center.

6. If no data point was reassigned then stop, otherwise repeat steps 3 to 5.

7. Calculate WCSS (Within Cluster Sum of Squares) for each value of C- WCSS measures the squared average distance of all the points within a cluster to the cluster centroid.

WCSS vs number of cluster for a given set. Select cluster size = 3, after 3 wcss does not decrease a lot

Create Model
from sklearn.cluster import KMeans
model = KMeans(n_clusters=4)

Train data
model.fit(raw_data)
wcss = model.inertia_

select the cluster size based wcss [Elbow method]

Reference: K-means with Three different Distance Metrics (ijcaonline.org)

--

--