Unsupervised Learning — K-Mean Clustering
Goal is to find the right value of K and make K clusters.
Steps below are based on Basic Euclidean distance metric
Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers.
- Select ‘c’ cluster centers randomly.
- Calculate the distance between each data point and cluster centers using the Euclidean distance metric as follows
3. Assign the point to a center where the calculated distance is minimum (points grouped to the same center are now called a cluster)
4. Calculate the cluster center
5. cluster center now become new cluster center, reassign the datapoint to nearest center.
6. If no data point was reassigned then stop, otherwise repeat steps 3 to 5.
7. Calculate WCSS (Within Cluster Sum of Squares) for each value of C- WCSS measures the squared average distance of all the points within a cluster to the cluster centroid.
Create Model
from sklearn.cluster import KMeans
model = KMeans(n_clusters=4)Train data
model.fit(raw_data)
wcss = model.inertia_select the cluster size based wcss [Elbow method]
Reference: K-means with Three different Distance Metrics (ijcaonline.org)