So we will study today in continuation of our previous session on machine learning.
So, today in the Machine Learning session we will see, decision tree regressor.
So, by the name itself you must have understood that this is a regressor.
So, the meaning of regressor is that, here our y which is our output also, it is in! Yes you guessed it right,. Continuous values.
So, for example if I want to score good, then for that I will have to study for longer duration and also do quality studying.
And, if I want a promotion or salary hike, then also my performance should be better enough so that my salary gets hiked and I also get a promotion.
So, here also if you will see, the salary is a continuous variable, scores is also a continuous variable.
So, in this way what happens in regression is that, in regression we have a continuous variable similar to output.
And even if the features of input are categorical then also we can convert it by encoding, so by encoding technique we convert them, or transform them.
So, let's go, we will load our dataset, and with loading we will also try to understand what we have to predict.
So, this dataset over here is position salary dataset, meaning what is your position or designation, and accordingly your salary would be decided, and especially in the industry it is also a general trend.
Though some exceptions can also be there based upon your skill sets.
But according to the general trend, it always says that the more the experience, the more the level, and the more your salary will keep getting increased,
So, let's begin by importing the libraries,
So, first I will import, import numpy as np. (num-pie)
Thereafter I will import pandas as pd.
After that I will import, import matplotlib dot pyplot as plt.
So, these are the three libraries that I have imported.
So, first we will use pandas, with the help of it we will create a dataframe,
So, the dataset is equal to pd dot read CSV, and here we will mention the path of our data.
And, see our dataset is loaded now.
So, with the dataset dot head, we will be able to view the data, so our data can be seen here.
So, we have a position, level and salary.
So, these are our three columns, now salary is dependent upon the level, so here by looking at it you can easily understand, that a new joinee gets a salary of 45 thousand per month, second level gets 50 thousand, third level gets 60, fourth level 80, 5th level 1 lakh 10 thousand,
So, as the level increases so does the salary, that means there is a correlation between salary and level.
So, let's now extract the level column as input, and salary as output.
Here, our x will be our input, so dataset dot iloc, and here we will select all the rows and will select the 1st column.
So, one colon 2, so it will pick only 1, and will leave 2, because the last one gets skipped.
With dot values we will convert this into numpy array format.
So, now you can see this is my x.
So, in this same way y should be my last column, so y is equal to dataset dot iloc, here we will have all the records but only the last column, so zero, one and two, so the name of last column should be 2, and here also dot v a l u e s values.
And, now you see our y.
Y is in this way, so y is the value that denotes the salary.
Now, we have both x and y.
Now, what do we have to do after this?
Now, we will have to divide this through train test split, so let's split this;
So, here we will use from sklearn dot, now we will use model selection; model underscore selection, and through this we will import train underscore test underscore split.
So, we will import this train test split this way.
Now, after this we will create x underscore train data, x underscore test data, y underscore train data, and y underscore test data.
After this we are going to divide the data through the train.. test.. split, and here test size we will keep it as 0.2 percent, after this next random underscore state, we will keep the random state as zero.
So, let's execute this,
So, here you can see.
So, here I will show you x train and y underscore train.
So, this is our x train and y train so, in X train we have 1,.....(counting).8, so 80 percent of 10 is 8, so it is 8 and their respective outputs are these.
So, this is level and this is salary.
So, let's move ahead.
Now, we can scale our x and y with the help of a standard scaler.
So, here from sklearn dot preprocessing, so this is the part of preprocessing, import standard… scaler, so import standard scaler.
Now, I will create an object as sc off x, for x, so standard.. scaler.
And in the same way, I will also Create it for y…sc underscore y is equal to standard scaler.
So, I have created two objects.
One for x and the other object for y.
So, I have created these two objects.
Now, I will transform it,
X underscore.. train is equal to sc underscore x dot... here we will do fit transform; fit underscore transform, so inside this fit transform we will pass X underscore train again.
So, this is our x underscore train that is scaled.
And, now we will transform x underscore test also; x underscore test is equal to sc underscore x dot fit transform, and here we will put x underscore train.
Here, we made a mistake, we will put x underscore test.
So, our x train and X test is scaled, and in this same way we will also perform it for y.
So, y underscore train is equal to sc underscore y dot fit underscore transform, and here we will put y underscore train.
So, our y train is also transformed, but at the time of transform we will also have to reshape this, because it is 1 so we will reshape this, and again we will execute this,
So, our y train is transformed now.
Now, what do we have to perform next?
We will have to create a model, so to create a model, from sklearn dot tree, so we are talking about decision tree, so sklearn dot tree import, we will have to import decision tree regressor. Ok!
Decision tree regressor.
So, here we have imported a decision tree regressor.
Now, we will create an object for this by the name DTR is equal to decision tree regressor, so here we created an object for this.
Now, through this object of the decision tree, we will do its learning for the model.
So, DTR dot fit, and here we will pass X underscore train, comma y underscore train, so this is the data that we will pass in this.
Now, I have passed the data over here.
So with this the learning is completed.
So, DTR model is now a learned one.
Now, after this we will have to do y underscore pred, so for prediction this variable has become famous.
So, y underscore pred is equal to in this dtr dot predict, and here we will pass one data, ok!
Here I am passing 6.5 simply like this over here.
So, I have passed this now.
And execute this.
So, here at the time of passing this also I should have taken it as an array.
So, let's take this as an array, so here I will take this as an array, NP dot a r r a y(array), and here we will put 6.5 like this as an array.
So, here we took the data as an array.
So, now let's understand what is the output that we have received in y underscore pred, so you can see a very small value has come, you can see that.
Now, y underscore pred is equal to sc underscore y dot, with this we will inverse transform this, and in its inverse transform we will pass y underscore pred.
Now, at the time of passing y pred, we will reshape that also and then pass, so we reshaped it.
Now, we will see what is the value of y underscore pred?
Now, y pred's value is 10 lakh rupees.
So, our prediction tells us that at 6.5 level in position, the individual will get a salary of 10 lakh rupees.
So, this is what the prediction says.
So, in this way we trained our model and removed predictions also from it.
So, let's plot this and see.
To visualise it, we will start with plt dot scatter. In the scatter plot here I will put x comma y, so we are inserting the original data here, along with that for colors we are putting it as red.
So, this way our one plot is ready.
So, our one plot is ready here.
This plot is of the original dataset.
Now we will use plot function, so plt dot, in this we will keep x, and here for our regressor which is dtr dot we will use predict. Now on what we will run the prediction, so this will be performed upon x itself.
So, this will basically make a change here.
So, we are running the prediction on x because whatever inputs that are displayed there, on them only again we are performing prediction.
After this What are we going to do?
So, as we have given red color for the above plot, here we can give color is equal to blue.
So, our plt dot plot is completed for this also.
So you can see here.
So, this line is a straight line.
So, this means there is something wrong.
So, here we have put x and here also we have put x, so this line goes straight, which means that our model did not learn properly.
And this dataset is small, so because of this the model did not learn it properly.
So, in this way we are able to get the graph out of it.
Now, in addition, I will give labels to X and y.
So, plt dot t i t l e title,so in title we can write truth or bluff.
Now, the next thing can be plt dot x label.. so x label, so what will that be?, So it is our position level.
So, in this way what is plt dot y label?So, y label is our salary.
So, in this way my graph plot is ready.
So, you can try this in other datasets also, and after trying, tweak and see, and here it is not necessary that you apply standard scaler always, I applied because of learning purposes, and I splitted the data for training and testing also though the data was small, but I still splitted then, and after that I let it train.
So, you will have to follow all the steps that are performed here, and in that you will have to tweak and see at which position our algorithm is better performing.
So, friends, we will end this session here,and we will continue in the next session.
Till then keep learning and remain motivated.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has
Isakki Alias Devi P
yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.
Superb and amazing 😍🤩 enjoyable experience.
Muhammad Nazam Maqbool
Absolutely good course... will suggest it to everyone. has superb content that is covered in a fantastic way.
super course and easily understanding and Good explaned
Ruturaj Nivas Patil
Very well explained in entire course. Great course for everyone as it takes from scratch to advance level.