Hello I am (name) from LearnVern.
This tutorial is the continuation of our previous tutorial so we will see ahead.
So, today we are going to study SVM or if we call it a classifier than SVC that is a support vector classifier.
So this is an interesting concept where we see support vectors, support vectors.
Now, what are the support vectors?
These are basically some points, with the help of these points we identify the class that data point should belong to.
Supposingly if we have some point over here, if you are not able to see this point I will make a pipe symbol here, and the other point is here, and all the dash data points are on the right side of this line, and all the plus or equal to data points is on the left side, and from them, one single dash data point is close to this line on the left side and this equal to is close to this line, so these points are the nearest to this central line, so the line on whose basis this line has been drawn is called as support vectors, and this line helps in classifying as to which point belongs to which class, either in the dash class or in the equal to class, and the space between these lines is called as margin. That this is the margin between these two & this is the margin from here. So this is called Margin.
So let's implement this and see.
For this we will use the data set of social media ads.
We will upload this. See, it has been uploaded and copied its path.
We will make this into a comment so that there are no errors.
So, the first step is to load this data set so import pandas as PD.
Next after this data set is equal to PD dot read CSV, read underscore CSV with the help of this we will read the data set, and our data set will be converted into a data frame.
So our data set over here is now formed as a data frame.
So we have the data set now.
In this we will keep the purchased column as an output and the column of age and salary would be kept as input.
Ok! so, now we will extract our input for data preprocessing.
X is equal to a data set dot with the help of i Loc function, and here we will take all the rows and columns we will take second and third.
So in X you can see that we have got your two columns; age and estimated salary.
And in Y with the help of the dataset iloc function, we will take all the rows but only the fourth column.
Now, you can see I have just taken the last column that is purchased.
So, this is my data that has been divided between input and output.
Now, the next step is that we have to keep some data for training and some data for testing.
So, for that, from sklearn dot model selection, through this we will import train test split.
Now, here we will have X underscore train and X underscore test, and Y underscore train and Y underscore test.
So, we have 4 dataset here, and these are pairs of X train and Y train and X test and Y test.
So, we have a pair of input output and input and output.
Now, we are taking the help of a train test split.
And with the help of this we will finish.
I will pass all the data X and Y, and also I will give it the test size as 0.25 percentage.
And to generate the numbers I will give a random state as zero, so that it generates numbers in a similar pattern from its calculation.
Now, our data is divided between X train and Y train and X test and Y test.
So, I will show you the X train and Y train here.
Age and estimated salary is here and from here we have Y
So, in all we have 300 lengths, in both X and Y.
Similarly we will see X test and Y test .
So here we have 100 rows and here also we have 100 rows.
So, in this way we also have X test and Y test.
So, let's proceed ahead and scale this down.
To do that, from SK learn dot preprocessing, we will import standard scalers.
Now, we will make an object of this, SC is equal to standard scaler.
Now, we will scale down X , so we will use SC dot fit transform for it, and here we will put X underscore train.
Now, you can see our X has been scaled down.
Similarly we will do this for X underscore test, and with SC dot fit transform and put X underscore test, and scale this down.
So, this is also scaled down.
So, our preprocessing part is fully completed.
Now is the time for model creation.
And this time, we will create an SVC model, that is Support Vector Classifier…
Ok! So why are we calling it a classifier?...
Because we have two classes, one meaning car purchased and zero meaning car is not purchased.
So, we have two classes so we will use, 'Support Vector classifier'.
From SK learn dot, we will have to use svm to import SVC, because we want to support the Vector classifier, so we will use SVC.
Now, after we have imported SVC, we now need to make an object for it, so we will give it a name in small. It's ok to give a name in small letters if the original is in caps. It will work. I took all the rest by default.
So, once I will also show you these defaults, so we have C is equal to 1, kernel, RBF, gamma scale.
And we will give random state also .
So, our SVC object is created.
Now, we have to train this SVC, so SVC dot fit with that we will train it, so to train it, we will put X underscore train data and Y underscore train data .
So, with this it will get the training…
So,.our training is completed now.
After completing the training, now is the time for predictions.
So,for that first we will take a variable Y underscore pred, and in this we will take the help of svc dot predict and in this predict we will pass X underscore test.
So, you see what we have got in Y underscore pred.
Also, what is the output in Y underscore test dot values.
So, here we have both the outputs.
Now, after this we will have to check the performance, as to how the model is performing, so for that we will remove its accuracy, so from SK learn dot metrics, import..accuracy score…and along with that..confusion matrix.
Now, here we will print and see the accuracy…So we will use accuracy underscore score and pass Y underscore test and Y underscore train.
So, the accuracy that we get is 93 percent, compared to the decision tree and ID3 which gave 90 percent.
So, now we will remove the confusion matrix also, so in the confusion matrix also we will have to pass y test and y pred, and then print CF.
So, you can see it has given 3 less wrong classification from our previous test of ID3, you can watch the video, so this is the reason its accuracy has increased.
So, once let's see the test, where we have Y underscore pred which is our prediction and Y underscore test which is our observed.
So, let's see how Visualisation is.
So, X, I will take it from X underscore test and will take all the records and first zero column, and y also from X underscore test, and all the records and only first column.
And colouring we will do from the Y underscore test only.
In the same format we will keep for pred also, and only the colouring we will change it from pred.
Now, we will Import our matplotlib with whose help we are going to do visualisation.
From matplotlib import… pyplot as PLT.
So, from here PLT dot scatter, we will put the values of X, Y and C
So, this is our graph formed which is a true graph.
In the same here also, PLT dot scatter, X and Y and C is equal to C, which is pred.
Now, you can see the graph.
So here it has done only a few mistakes like this over here, this and this.
So, it has done only the logical mistake which I was earlier telling you, so it has performed better than the previous times.
So, this is how we are able to interpret this through output and visualisation.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Let’s conclude our today’s topic here. We will see the next parts in the next video.
Share a personalized message with your friends.