Hello,
I am (name) from LearnVern.
In our Machine Learning's previous tutorial we studied distance based learning, that means What is the distance based metrics that we used inside the clustering.
And in that, we saw, K means Algorithm, or FP growth algorithm, or hierarchical clustering, these are different types of algorithm, with which we can do clustering or groupings.
So, let us begin with K means Clustering,
Now its name is K mean, in statistics we have studied about mean, that means to take out the average, so this is the work that is done by K means, and here k means number of means, so if I say two means clustering then it will find two centre points, and on that basis it will start making the cluster, and if I say three means, then it will choose three centre points, and on its basis will start making the clustering.
This is an iterative algorithm, meaning the logic that it uses upon the data, it uses them again and again, and tries to meet its expected conditions. So that the learning of the algorithm gets completed.
So, let's understand this in more detail.
So, k means clustering is a part of unsupervised learning only, and the dataset that we have here is not labelled, and in that data only we have to do the grouping.
And this can be applied on those data that are numerical and in continuous format,
This is a really fast algorithm,
And it can be very easily understood also, which I will do also here.
And we can use this in banking and insurance fraud detection, and can be used in image segmentation and customer segmentation.
Now, I will explain to you as to How it works.
So, let us go here and see, here I will take some data points such as here I took one data as 2, then I took 3 and then I took 50, and next I took 51.
So, you can see I have 4 data points.
So, if we assume these 4 only because if we had 4 lakhs then it would have been really complex.
So, I have these 4 data points and I want to run K means on this.
So, to perform k means, first I should know the number of groups that we have to create.
So, for now I assume k as 2, meaning we have to create 2 cluster, so if I have to create 2 cluster, then I will take any two data points from these, so I will make them as two centre points, and they are called as centroid,
So, let's do it in this way…
So, here I took 2 as one centroid, and 3 as other centroid, so C1 is centroid 1 and C2 is centroid 2,
And on the basis of 2 I will find the distance of all of these.
So, everyone's distance I will find on the basis of 2.
So, these 2 and 3 should be here,
And let's find the distance here, is equal to, this minus this, enter, so here you can see the distance found is 0, so in the same way here we will have distance for everyone here.
So, maybe there is some mistake here, let me just check….here this is working for A5, A6, so this is working for A5, so this ok?
What has happen over here, that it has come to C5, so we will have to freeze this C4, so I have freezed this, and we will find the rest, so C is freezed, and thereafter we will put dollar for 4,
Ok!
So, this is actual and in the same way, we will have to minus this with 3 and again we will put dollars,... we should have selected this, minus this. Dollar, and dollar here also.
In this way..
So, we have got the correct values.
So, we want the absolute for this also, so we will remove the negative value by ignoring it, so this C1 and C2, meaning what is C1 and C2 here?
So, for C1 we have this 2, and for C2 we have 3, on the basis of them we can see all the calculations.
Now, we will move ahead,
So, we want to remove the absolute value,is equal to, so we will put this entire thing into the bracket, and will take the absolute, so ABS, and here we will put brackets, and in the end also we will put brackets.
So, in this way it's absolute value is found, in the same way for everyone, we will just have the absolute values. (6 sec pause)
So, we took the absolute values,
Here also let us find the absolute value…
So, here we have got the absolute values.
Now, in this we will see which are the values that are less, these are distance actually.
So, we will take C1 and C2 again over here, here we took C1 and C2, so we will see that, between 0 and 1, which is smaller, so it is zero, so 2 will come in this cluster,
After that you will see, 3 is close to 1 in 1,0 so it will come here.Next value 50 is close to this, so it will come here, next value 48, is close to this, so it will come here.
So, enter.
So, in this way we got our segregation,
So, here you will see we will have to create a new centroid, so to do that, these data points that we have, so, the old ones are of no use so we will keep them as it is…
Ok!
So, from this we will find the new centroid,
How are we going to do that?
So, this is 2, so if we find the centroid for 2, then I will click on this sum over here, so we got two here.
In the same way, if we click sum over here.. then we get 101,
So, we got these two sums.
So, 2 became the centre value because there is only one element,
Then, this 101, what are we going to do with that, so we will basically divide this, so here we got the average, this is the average.
So, this is our one more average..
So, we have these two averages.
Now, this will become our new centres, 2 and 33,
So, you make one of 2, and the other of 33.
Now, with this again find the distance,
So, we will play these iterations, until we get the same values inside the cluster.
So, here we have the new centroid, so with this new centroid we will subtract the original value,
So, this minus this value, and enter.
And then we will freeze this by putting a dollar here, and here.
And here we will also use ABS, along with ABS function…. In this way.
And we will bring this up till eighth, same thing we will do here also,
So, we will remove this, so it is equal to ABS absolute, and after that this minus this, and in bracket close, enter, and after this we will freeze this.
Dollar, dollar, so we freezed this.
And after this.. we will fill it till the end.
Now, you will see as to How you will make the cluster.
So, we have new C1 and C2.
And those are 2 and 33, these are our C1 and C2…..
Yes, now see, in between them 2 is small, so it will come here, next 3 is small so, it will come here, next 50, 51 will come here.
So, this is how it's going to be.
So, this is our new cluster.
Here, 2,2,3 is one cluster, 50, 51 is the second cluster.
So, we got two clusters here.
And we got them on the basis of distance.
Again, you take their average, so if you will take their average… it will come 2.5 for this, and here again if you will take the average, then your average will be 50.5
Ok!
Now, if you perform one more iteration for it, then you will see that you will get exactly the same cluster, one of 2 and3 another of 50 and 51.
When your cluster is repeating, then you will stop there, as it means your algorithm has formed the groups and created their clusters.
Or otherwise you will continue this process.
So, this was the simple process with which you can run the K -means algorithm.
So, friends, let's conclude here for today.
We will stop today's session here, and it's further parts we will see in the upcoming sessions.
So, keep learning and remain motivated.
Thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.