Hello,
I am (name) from LearnVern, (7 seconds pause, music).
We will study today in continuation of our previous session of Machine Learning.
In today's session of Machine Learning we will see Hierarchical Clustering.
In this we will see how we can cluster or do grouping when we have unlabeled data.
And this method is of hierarchy formation, that means it is graphical.
We earlier saw K means also, but that was not graphical, as it took the help of distance metrics such as euclidean distance or some other distance metrics to form grouping or Clustering.
Here, we will visually see the grouping or Clustering that will be formed heirarcially and thereafter we will decide as to the number of clusters that we want to keep.
So, let us begin now.
So, first I will load the dataset, and we will be using for it the dataset of mall customer,
And we will explore this dataset.
So, I am copying it's path from here, and firstly we will do importing,
so import numpy as np, (read slow, typing).
Second library that we will import matplotlib dot pyplot as plt. (read slow, typing)
Third library that we will use, import pandas as pd,(read slow, typing)
Soz with the help of these three libraries we will perform most of the implementation.
So, firstly let's load the dataset.
Dataset is equal to pd dot read underscore csv. (read slow, typing),
So here we will load our dataset in a dataframe.
And here with dataset dot head, (read slow, typing) we can see the kind of dataset that we have
Customer ID,
Gender,
Age,
Annual income,
Spending score,
Now, here on the basis of annual income and spending score, we will group the customer or do segmentation or clustering of these customers.
on the basis of annual income and spending score !
In this, you can see we don't have any y, that is the output variable.
That we used to do in supervised approach,
So, let's extract our x from this.
So, here our x will be, dataset dot iloc, and here we will take all the records but only third and fourth column.
So, this is,
Zeroth,
First,
Second,
Third, and,
Fourth. Meaning annual income and spending score, these column only we wil take.
Dot VALUES, values, so now you will see this is our x, where we have annual income and spending score, these two columns.
So, here with x dot shape, we will once see the number of records that we have.
So, we have a total of 200 records and for 200 columns we have 2 columns.
So, this is our dataset, on whose basis we will train our algorithm, and on its basis only it will form the clusters.
So, let's move ahead, and along with that we will import hierarchical model now.
So, our model will come from,
Sklearn dot CLUSTER cluster, so from sklearn dot cluster import, here we will import Agglomerative Clustering, AGGLOMERATIVE CLUSTERING .
So, we will import Agglomerative Clustering.
Now, in this Agglomerative Clustering, we will have to inform it as to How many clusters we want to make.
But, we don't know that, so when we don't know, then before that we will create a dendrogram first, and by creating a dendrogram we will first find out the number of clusters that we should make in the Agglomerative approach.
So, let's make a dendrogram first, and with it let's find as to how many clusters we should atleast have in this dataset.
So, we will import sci py dot cluster dot HIERARCHY hierarchy as SCH. (read slow, typing),
So, we have imported sci py dot cluster dot hierarchy as SCH. (read slow, typing),
Now, with the help of this we will create a dendrogram.
So, DENDROGRAM dendrogram is equal to SCH dot, we will use its dendrogram method only, so sch dot dendrogram, and inside this we will pass sch dot linkage, meaning a link between one cluster and another cluster, so sch dot linkage, and inside it we will pass the data that we have or the metrics, so x is our data, and in this the method that we will use is "ward", so as it is written single method by default, but we will use method is equal to "WARD".
So, i will initialise this much for my dendrogram, and now I will move forward,
Now, by moving ahead I will plot and see, as to How the clusters are formed in my dendrogram.
Now, here you can see the different clusters that are formed in my dendrogram.
To make it little more detailed, so here I will put plt dot, here I will give the title, so plt dot title, and in this we can give “this is a dendrogram”, after that, plt dot x LABEL label, in this we will put CUSTOMER customer, next after this plt dot y label, in this.. we will put the second part that is of euclidean distance.
So, here we will mention the name of Euclidean (read slow, typing),
Ok!
Let's display our diagram again.
So, this is our dendrogram, now in this dendrogram if we will see, the upper part, then there is only one cluster, and as I come below that, then here I have one green cluster, and here one blue, so I have two cluster, if I come even further below, so I have one green cluster, one red cluster here, and one light blue colour cluster here. So you can see there are three cluster that are formed in this way,
If I come even further below, then you will see this is the first green, this is the second green, third is the red cluster, and fourth is the blue cluster.
So, from this we can easily identify how many clusters that we need to form.
So, we can see there are maximum one, two, three, four and five, five major lines, so we can form at least five major clusters.
So, let's move ahead, and prepare an object for our Agglomerative model,
So, for hierarchical clustering, here we will, hc is equal to AGGLOMERATIVE Agglomerative Clustering, so in Agglomerative Clustering, as I had earlier said, we will have to pass the number of clusters, so the number of clusters, here we will give five.
Thereafter, affinity by default is "euclidean", so we will keep that only, linkage by default is "ward", so we will keep it as ward itself.
So, in this way my HC model is ready,
Now, we will train or fit this model through fit predict, and also take the prediction, so for this we can use fit function and predict function separately, or we can use them together also.
So, see here, hc dot fit, so this is one method, and the other method if fit predict, so this time I am showing you the fit predict method.
In Most of the algorithms you can use the fit predict method together.
So, in this fit predict, I put my x and press enter.
So, here you can see it has 5 clusters, 0, 1, 2 , 3 and 4, so from 0 to 4, that means it displayed us by making 5 clusters.
So, you will see that we will store this in y clusters, CLUSTERS clusters we will store in this,
So, this way I stored the output in y underscore cluster.
Ok!
Or I can give it a short name, as cluster will be a very long name for it.
So, y underscore hc, so I have stored this in hc,
Ok!
So, this is the output of hierarchical clustering which is stored in y output.
Now, we will see these clusters through visualisation.
So how will we visualise?..
So, here we will visualise and see.
So, for that, plt dot SCATTER scatter, “x of”, so here we will put y underscore hc, is equal to, is equal to, zero, comma zero, or comma x of y underscore hc is equal to is equal to zero comma one. Ok! (read very slow, typing),
Now, after this the next parameter, is of size, that is s, where we will give it as 100, one zero zero ok!
After that, next we will mention the colour, so C is equal to the first one I will give red, next we I'll give it a label also, and label we will give it as cluster one, so that we can easily understand.
So, "cluster1" will be given as a label.
So, now we will execute this also, so that we can simultaneously visualise also, now you can see here.
So, through plt dot scatter you can see cluster1 over here.
Now, I will go on the second, ok!
So, for that, this entire thing will be the same, so I am taking it as it is.
So, plt dot scatter, x of y is equal to here, it was zero,zero, that will be changed into 1 and zero,
And,on the y coordinate, we will have X of y is equal to here it will be 1 and 1.
Now, we will keep the size the same, but we will change the colour, we will give it as blue, and we will add this by the name cluster2..
So, here you will see, our cluster2 will be formed.
So, here you can see, we can see 2 clusters over here, first is cluster1 which is in red colour, the second is cluster 2 which is blue.
Similarly, again we will paste this, now we will configure for cluster 3,
So for cluster3 colour, we will give green colour, as red and blue we have already given, size will be the same, and in X coordinate, x of y here it was 1,0 we will replace it with 2,0. And in y coordinate. We had x of y underscore hc, here we will give 2,1 instead of 1,1.
And the rest of the things will be as it is.
And again I will execute this and show,
Now, you will see that we have the third cluster also plotted,
Now, we will talk about the next, so, next name of cluster will be cluster4, so it's,colour we have already use red, blue and green, so for this we will use cyan,
And here ,we have used up till now, 0-1, 1-1, 2-1, so we will give this as 3,1.
And, in X coordinate, we have used 0-0, 1-0, 2-0 uptil here we have reached, so we will give this as 3,0.
So, I will execute this and show.
So, you will see we have got cluster4 also in light blue color.
Now, we will move ahead and this will be cluster5, so this is our last cluster, and uptil now we have used red, blue, green, ad cyan, so now, we will use "magenta" MAGENTA as colour, and here we will have 4-1, and in X coordinate we will have 4,0, and we will execute this also and show you.
So, now you can see in total I have 5 clusters.
1,2,3,4 and 5
And this is visible very much distinctly also, that means this clustering is looking really proper.
So, let's do some labelling here, so plt dot title, so in TITLE title this is *cluster of customer", so CLUSTER OF CUSTOMERS..(typing).
And, after that, plt dot in this we will give X label, which is annual income, so x label, this is annual income, ANNUAL INCOME(typing).
And, our y label will be.. spending score, so plt dot y LABEL label, here it will be spending score, SPENDING spendingSCORE score (typing).
Ok!
So, these are the two, now after that we will put plt dot legend, with this we will get to see the clusters, and that's it.
Now, we will end this finally with plt dot show.
Now, here you will see we have the names also as cluster 1, cluster2, cluster 3, cluster 4, and cluster 5..
And below in X axis we can see Annual income, and on y axis spending score, on their basis this is the cluster of customers that has been formed.
And which is looking really apt.
So, this was hierarchical Clustering, which we performed upon mall customer dataset, you practise it even more in different dataset.
So, friends, let's conclude here for today and now we will continue in the next session.
And keep watching and remain motivated.
Thank you very much.
If you have any questions or comments related to this course.
then you can click on the discussion button below this video and post it there.
So, in this way you can discuss this course with many other learners of your kind
Share a personalized message with your friends.