Hello,
I am (name) from LearnVern, (7 seconds gap, music).
This tutorial is the continuation of the previous session, so let’s begin.
In the hierarchical clustering practical, we will see another practical which is named as dandrograms. It helps us in clustering visually, and we as human beings understand visual forms instantly and there is more clarity in understanding.
So, let’s start.
We are doing it in continuation of the practice that we did earlier, so this is an extension of it.
So, I am importing here the necessary libraries.
So here, “import numpy as np”. (pronounce - numph-pie)
The second library we will need, from matplotlib import pyplot as PLT. Ok! (read slowly, typing)....
Next library that we will need, from scipy (pronounce- saaye pie) dot cluster dot hierarchy,... from here we will import dendrogram.
So we imported the dendrogram.
After this we will type, from SKLEARN dot datasets import, sklearn dot datasets, here we will import iris dataset, so, load underscore iris…
Next, we will need a clustering package from sklearn also,
So, I will type from sklearn dot CLUSTER cluster import AGGLO, agglomerative Clustering.
So, these are the packages we have imported, and now, we will create a function to create a linkage matrix and then after that we will do our plotting.
So, here, def plot underscore DENDROGRAM, dendrogram, and here we will pass out our model inside it we will pass the arguments that we want, so "KWARGS, kwargs".
OK!
So, first we will create a linkage matrix. Right? So, Linkage matrix and then after it we will be plotting the dendrogram, Ok!
Now, here, we want to see, count of samples under each node. (read slowly, typing).
So, how many samples are there, in our graph, that is, 2 samples or 3 samples, that is the cluster, that we will get to see.
So, here, COUNTS, counts is equal to NP dot zeros, and here MODEL ,model dot CHILDREN, children underscore dot SHAPE, shape and here i am putting zero, ok! So, this is going to initialise the count matrix.
Next is n samples, n samples is equal to LENGTH, length of model dot labels, so, whatever label is being produced are the n samples.
Now, for i comma MERGE merge in enumerate, so here I will be putting model dot CHILDREN children.
Now, CURRENT current count is equal to 0, so initially the count will be zero. And for CHILD child underscore index, IDX in MERGE merge, so for child index in merge we will be creating if CHILD child underscore idx is less than the n samples:
Then what to do next?
In this particular case, CURRENT current count, plus is equal to 1, ok…
This would be the leaf node.
So, the current count in the leaf node will keep on increasing. ok.
And otherwise, what to do? , So else, here we will write,
CURRENT, Currentcount plus is equal to COUNTS counts off CHILD child underscore idx minus n underscore samples, ok, fine.
Here I will be passing the child index. Ok!
This much is done!
Then I will come back to the following, and here to the COUNTS counts off i, is equal to current underscore counts.
So, this much is done.
Now, from this I will be creating the linkage matrix. So, I am at this particular for, let me go back. So, LINKAGE linkage underscore MATRIX matrix is equal to np dot, we will create a COLUMN column stack, we will be creating a column stack and here inside it I will be passing a list "model dot CHILDREN children underscore, and then I will be creating model dot DISTANCES distances and passing that, and will be passing COUNTS counts.
So, these things I am passing here.
So, we will keep the data type as FLOAT float,
So, as type, and keeping this as FLOAT (float).
So, this is how we are keeping and finally this is done. Ok!
So, after this we are keeping DENDROGRAM dendrogram as we have imported this, so dendrogram linkage matrix I will pass, and I will pass the arguments "KWARGS kwargs", let's execute this. ok.
Now, moving further let me have, ya this is done.
iris is equal to load underscore iris, and so the data is loaded.
X is equal to IRIS iris dot data, (repeat)
Now, let's create a model, so model is equal to AGGLO, Agglomerative Clustering, so with the help of this we will be creating a model, and here, DISTANCE distance threshold, I am keeping it as zero, and n clusters, I am keeping it as none,(repeat), fine!
Now, model dot fit, we just want x, and we don't want y because this is an unsupervised approach, so we have passed x into it model dot fit.
Next, now we will be calling upon our function, which is plot dendrogram, so let me call upon that function; plot underscore dendrogram, dendrogram and here let me pass the model, and along with the model, let me pass other arguments, such as one is truncate underscore MODEL model is equal to LEVEL level, and further I will be passing p is equal to 3, ok, fine!
Let me add the label also, so plt dot xlabel, so I will be passing x label here, so here we will be able to see the number of points in a node, that is something that will be visible to us.Ok!
Or, we will be able to see the index, that is index of point, so let me keep or just add, "index of point if there is no parenthesis",..
So, if there are multiple numbers of points then we will see a parenthesis, and we will get to know the number of elements, Ok!
Otherwise, we will just see the index value of that particular point, when there is no parenthesis in a value.
So, plt dot show, so hopefully now everything is good and we will be able to see, so here there are no attribute labels so, which line is this referring to, let me just check….
So, it is referring to the length label, so let us go back, and check there,
So, at this particular point, length of label, so if this is not label then it must be, labels, so, let me just check, so labels underscore, Ok!
Let me execute all of these, and let us wait and check,
So, that error has gone, now "local variable current count referenced before assignment".
So, where is the function, this current count is again in the function,
So, current count is here, it has been initialised as zero, so they might be referring it to some other place, so it has been assigned to zero itself, so let me check, here it is saying that CURRENT current count plus is equal to 1, so do we have any change in the spelling? Yes here should be double r!
So, there is a slight typing mistake which we can always go through and cross check.
Now, again we can do that,
So, 'truncate model', unexpected keyword argument, so it is not accepting this at the moment, so I can keep it on hold for now, so 'truncate model', I believe the spelling should be correct! TRUNCATE So the error is that it's not model but mode actually, so it should be truncate mode,
So, let me execute this once again,
Yes!
So, now you can see that after we have corrected these errors.
So, here you can see we have different clusters over here,
So, here this blue line shows, that all are coming in one cluster, then after this, this green and red depicts that there are two separate clusters, and below them if we see, so this a separate cluster, and these also.
So, here in this from wherever you will cut it a cluster will be formed.
Here, at the bottom of you will see, then the number of nodes is 7, then 8, here 41 element is there which do not have any other points, here it is 5, in this it is 10, so here at the lowest level also the model has done the clustering that you can very well see, and it's grouping then and giving us output.
So, this over here at 41 is the index value which comes alone in the cluster.
So, this way we saw how we can create dendrograms, in hierarchical clustering itself.
So, friends, we will stop this session here, and it's further parts we will continue in the next session.
If you have any questions or comments related to this course.
then you can click on the discussion button below this video and post it there.
So, in this way you can discuss this course with many other learners of your kind
Share a personalized message with your friends.