How To Work Out K-means Clustering Algorithm
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.Distance measure here used is Euclidean, there are many other Distance measures used for k-means clustering.
Lets workout k-means using a simple example.
Problem 1:
Lets take a medicine example.
Here we have 4 medicine as our training data. each medicine has 2 coordinates, which represent the coordinates of the object.These 4 medicines belong to two cluster.We need to identify which medicine belongs to cluster 1 and cluster 2.
k-means clustering starts with a k value where k is user defined ie how many clusters . here we need to classify the medicines into 2 clusters so k = 2.
Next we need to identify initial centroids. lets selects the first 2 points as our initial centroids.
k = 2 Initial Centroids = (1,1) and (2,1)
Iteration 1:
For each feature we need to calculate the distance to the cluster and find the minimum distance.
For Feature (1,1)
C1 = (1,1) C2 = (2,1)
= sqrt((1-1)^2 (1-1)^2) = sqrt((2-1)^2 (1-1)^2)
= 0 = 1
For Feature (2,1)
C1 = (1,1) C2 = (2,1)
= sqrt((1-2)^2 (1-1)^2) = sqrt((2-2)^2 (1-1)^2)
= 1 = 0
For Feature (4,3)
C1 = (1,1) C2 = (2,1)
= sqrt((1-4)^2 (1-3)^2) = sqrt((2-4)^2 (1-3)^2)
= 3.6 = 2.82
For Feature (5,4)
C1 = (1,1) C2 = (2,1)
= sqrt((1-5)^2 (1-4)^2) = sqrt((2-5)^2 (1-4)^2)
= 5 = 4.24
Next we need to identify the cluster having minimum distance for each feature.
C1 = (1,1)
C2 = (2,1),(4,3),(5,4)
New Centroids :
Continue the iteration.for all features with new centroids.
C1 remains the same.
For C2 the new centroid will be the average of the above listed points.
ie (2+4+5/3 , 1+3+4/3)
C1 = (1,1)
C2 = (3.66,2.66)
Iteration 2:
For each feature we need to calculate the distance to the cluster and find the minimum distanceIdentify the cluster having minimum distance for each feature.
C1 = (1,1) , (2,1)
C2 = (4,3),(5,4)
New Centroids :
C1 = (1.5,1)
C2 = (4.5,3.5)
Check if the points converges.
How to check: Compare the old centroid with new one.If both are equal then k-means converges else continue the iteration.
Old Centroid : (1,1) and (3.66,2.66)
New Centroid: (1.5,1) and (4.5,3.5)
Old Centroid != New Centroid
Iteration continues.
Iteration 3:
For each feature we need to calculate the distance to the cluster and find the minimum distance.
Identify the cluster having minimum distance for each feature.
Check if the points converges.
Old Centroid : (1.5,1) and (4.5,3.5)
New Centroid: (1.5,1) and (4.5,3.5)
Old Centroid = New Centroid
CONVERGES.
Therefore the algorithm stops.
Medicine A and B belongs to Cluster 1
Medicine C and D belongs to Cluster 2
hiii
ReplyDeletegood evening sreevani..kindly post the code in java for k-means algorithm in java
where can i run?? is it on mahout or normal hadoop/mapreduce
Siva you can find many source code related to kmeans over internet. Let me know if you need any help.
Deleteit not working
Delete..i searched a lot
please send the code if u have
sivanagaraju46@gmail.com