# Unsupervised Learning — K-Mean Clustering

--

Goal is to find the right value of K and make K clusters.

Steps below are based on **Basic Euclidean distance** metric

Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers.

- Select ‘c’ cluster centers randomly.
- Calculate the distance between each data point and cluster centers using the Euclidean distance metric as follows

3. Assign the point to a center where the calculated distance is minimum (points grouped to the same center are now called a cluster)

4. Calculate the cluster center

5. cluster center now become new *cluster* center, reassign the datapoint to nearest center.

6. If no data point was reassigned then stop, otherwise repeat steps 3 to 5.

7. Calculate WCSS (Within Cluster Sum of Squares) for each value of C- WCSS measures the squared average distance of all the points within a cluster to the cluster centroid.

Create Model

from sklearn.cluster import KMeans

model = KMeans(n_clusters=4)

Train data

model.fit(raw_data)wcss= model.inertia_

select the cluster size based wcss[Elbow method]

Reference: K-means with Three different Distance Metrics (ijcaonline.org)