Hello,
I am (name) from LearnVern.
We will study in continuation of our previous session of our Machine Learning course.
In today's tutorial of machine learning we are looking at K means.
We know that K means algorithm is unsupervised algorithm and clustering algorithm.
So, here we will have the input data but will not have any labels in it.
Today we will take an example of mall customers;
Meaning we will take the dataset related to the customers that visit the mall.
And on that we will perform clustering,
So, first I will load the dataset and will explore and show you, as to what are the features and the columns that we have?
So, our dataset is getting loaded here,
So as my dataset is loading,till then I will import the libraries over here.
Import numpy as np; this is our first library that we will import.
Second library that we will import, is matplotlib, so import matplotlib dot pyplot as plt, and thereafter the next library that we will import is pandas, so import pandas as pd.
So, these three libraries will help us, and we will take their help, here once again I will see the data of mall customer data, so you can see it is getting uploaded, now it so uploaded,
So, we will copy the path now, now we will read our dataset,
Dataset is equal to pd dot read underscore csv.
And here I will copy the path.
And with the dataset dot head, we will read the dataset.
So, this is our dataset.
In this we have customer ID, Gender, Age, Annual income, Spending Score.
So, this is our dataset.
Now, where we have to apply customer segmentation or grouping, it can basically be applied there.
1, 2, 3, 4, 5.
We basically have 5 columns.
So, these columns can be increased also, depending upon the use case.
So, this is our data.
Now, from the data , we will extract from this a specific column, one which we will perform clustering.
So, here 0, 1, 2, 3, 4., so we have 5 columns, and their indexing would be done this way.
So, from this we will leave 0, 1 and 2.
And third and fourth, meaning annual income and spending score, on this basis we will group them.
That means how much is their income and from that income how much they spend it.
So, this is in thousands, and this is the score from 1 to 100.
Ok!
So,let's move ahead,
Now, here x and y,
And in X (5 sec pause typing) we will have dataset dot iloc, here we will take all the records and after that only 3 and 4, these columns, and here also in this I will put dot values and pass this.
And now with x dot shape, I will show you this.
So, the shape of x is 200 and 2.
So, there are 2 columns and 200 rows.
Now, we don't have the requirement of y, so I am not taking y.
So, therefore we will not perform the steps of classification.
That means we were earlier splitting the data into train and test, so we are not going to do that here.
So, next directly if we want we can scale this, and perform k means on it.
However the data over here is not varying that much, but still only for the learning purpose I will scale this here.
So, I will do the scaling and show you.
From sklearn dot preprocessing.. Import.. dot standard scaler.
So, we will import standard scaler here.
Now, after this we will create an object.
Sc is equal to standard scaler.
After creating an object,
Then we will scale our x with standard scaler,
So, x is equal to sc dot fit underscore transform, so with fit transform we can scale this, here we put x and executed this.
Now, you will see the shape of our x will be the same, but as it will get scaled, so its values will come in a specific range, you can see here all the values have come in a specific range.
Now, let's move ahead.
After scaling, now is the time to apply k means,
So, for k means we will make an object first.
So, firstly we will import,
From sklearn dot.. cluster..import…K..means.
Ok!
So, we have imported this.
Now, after importing here the biggest challenge is;
the meaning of K is number of clusters,
So to decide this number of clusters is the biggest challenge.
It is a challenge to decide the number of clutters.
So, if we are not able to decide the number of clusters then it is the biggest challenge, because we don't know how many groups to be created as we don't know much about the data.
So, because of this to decide the number of groups to be created, we will use one elbow method to decide, and for which we will use WCSS.
WCSS will be an empty list in the start,
So, this is an empty list,
After this, what are we going to do?
After this we will do for I in range, so here we will mention the range in between the numbers you want in the cluster for demo for grouping.
So, for instance I want between 1 to 11.
So, I will write for I in range, 1 to 11.
So, it will form the cluster from 1 to 11.
Now, I will create an object for K Means that we have imported up here.
K M E A N S.
So, we created an object.
Next, we have to give number of clusters, so number of clusters I will keep as i,
Now there is a reason to keep I (aai), because I will become first as 1, then 2, so in this way I will vary.
So, that is the reason we will keep the number of clusters as I.
After this next… how will we initialise this?
So, to initialise it, by default it's already done through k means, so k hyphen means plus plus, next for random state we will keep zero, and initialise that also.
So, in this way we created our model.
So, how many times will this model work?
So, it will work, according to this for I in range 1 to 11, these many times it will work.
Now, after this k means dot fit, and here I will fit the entire data as x here.
So, here we are letting our model learn.
After it gets fit, an inertia will be developed due to this, so we will append this in WCSS, so in the list of WCSS here, we will append the output that we got that is Kmeans dot inertia, and append in this.
Now, what will happen by appending it in WCSS?
So, with appending this we will get one benefit that is whenever the value of I is one it will create only one cluster, and after creating one cluster, then, after we train this cluster, it develops an inertia,
And then after that we got I as 2 in that also it generated inertia.
So, what will happen in this is that, this WCSS within the cluster itself, is some of squares, meaning inside one cluster how many squares are coming.
So, as the value of clusters will increase, the sum of squares will decrease.
As you can see here.,as the value of clusters is increasing the sum of squares is decreasing
So when we plot a graph between WCSS and the number of clusters, that plotting will take the shape of an elbow, and that will help us decide how many clusters we want to keep.
Now, we will keep appending the inertia inside the list of WCSS.
Now, after this, we will execute this.
And after executing it, we will plot.
So, plt dot plot inside this range one to eleven, because I had kept the same range above also to get initialised, and its corresponding values of WCSS, we will plot this.
Now, plt dot titles, we will give one title also to it. So its title will be elbow method.
E l b o w (elbow), M e t h o d(method).
And, here in x label we will have a number of clusters, C L U S T E R S (typing) And after that plt dot y label, so y label would be WCSS…
Now, we will Show this, plt dot show, now you can see here, our diagram is created, so here you can see it is taken the shape of elbow type, now from this we will decide as to how many clusters there should be,
So, you can see this curve one over here and the second here, so here the angle curved is more we will choose that, so this one is between four and six, we will take this and keep it as five, ok!
So, we kept this as five
Now ,when we will try to create a model, k means is equal to k means, and here inside this we will find the clusters and rest of the things are the same, and we just need to initialise random state, so random underscore s t a t e state is equal to zero.
So, this is my final model that I have created,
So, how I created this model, so first I decided through elbow method, as to how many clusters I should be having, and I got the idea that it should be 5 as it is looking great, so here I found the n clusters.
Now, we will train this,
So, to train it k means dot fit and through X we will train this, after training, here y underscore cluster is equal to, and here also k means dot and here we can use fit predict, or I have already done fit, so I will just put predict.. ok, so this is the predict function, and here we can out whatever input we want, so for now I will put the entire x,
So, now you can see the values there in y. (9 sec pause typing)
..
So,here you can see these are the values in y, so these are predictions,ok!
So, how Many clusters are formed.
So, 0 ,1, 2, 3 ,4 these many clusters are formed.
Now if you want to see them, I will show you those clusters.
So, plt dot scatter plot, and here x and with our x..
So, let's display our x first.
these are the values in our x
So, here x dot iloc and here all the records but zero columns…
this is not the dataframe for pandas,
So, we can directly fetch this through x of zero or z of one,
And here all the records and zero th column,
So, we don't need iloc here, and directly do like this.
So, I will denote this as small x1, and in the same way, here for X2… in this way I will initialise and now here we will have x1 and x2, which we also call y, so after x1 and x2, the colour I will take from the prediction that we got, so y underscore cluster, so through y cluster we will take the colour, now we will get to see as to How the grouping is done.
And we can visually also see that grouping is done properly, so 1,2,3,4 and 5. So the grouping is done accurately.
So, here we saw how the elbow method helped us to define the number of clusters, that is k.
Now, here we can also see diagrammatically that the grouping that it has done, is very distinct and accurate.
So, we will see uptill here for this video, and you can practice this in many more dataset.
So, friends, we will stop this session here, and we will continue in the next session.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.