Today we will be studying in continuation of our previous session of Machine Learning.
In today's Machine Learning tutorial, we are going to see a Simple linear regression model.
So, let's quickly see, What is this Regression?
Then we will move ahead to perform the practical.
So, to understand the meaning of Regression, if I take an example,
That, If I walk in great speed,
So, my speed is fast.
So, if my speed is fast, will I be able to cover more distance?
the answer is yes, I will be able to cover more distance.
If speed is more, then distance will also be more.
If I write this thing in the following manner,
That I write them in the form of columns.
One column for speed, and the other columns for distance,(Pause 5 seconds : typing)
So, here I suppose my speed is 10 km per hour, so in such a case supposingly if I cover 10 kms.
So, if my speed becomes 20 km per hour, then my distance covered would obviously be 20 kms.
If my speed is 30 km per hour, then I will cover 30 kms.
So, here as my speed increases in the same way my distance covered also increases in per hour.
So, this is what is known as regression.
Where with the change in one variable, the other variable also will get affected, and also the kind of change that is affected on the other variable is of similar type that it increases with it or decreases along with it.
So, this point is What comes under Regression.
So, let's understand this thing in the form of equations,
So, it's equation is Y is equal to MX plus C
And also there is one more way to write this, that is Y is equal to beta 1 X plus beta not, so you can write it in this way also.
And this is known as regression line or equation of line,
So, this depicts the relationship between two things for instance one X and the other Y.
So, if X is increasing then y will also get affected on the basis of M and C, where c is a constant value, and M is the slope.
So, both these two affect the output.
So, lets import now, this we can do from sklearn where we have a linear models library, under which we can import linear regression Library.
And that library will help us learn to work on the basis of this formula.
Thereafter we will learn about multiple linear, where we will not just have one X as inputs, but many inputs like input 1, input2.
Meaning X1 and X2 it will get increased.
And then during an increased input case, the equation will get expanded, which we will cover then, during that practical.
Now, let me introduce you to the dataset.
Our today's, practical's dataset is the salary dataset.
First, I will import pandas as PD.
Next, we will require you to upload our dataset.
From here, I will upload the dataset.
So, our salary dataset is getting uploaded.
Now, we will copy path from here,
And here we will get our data in the form of a data frame.
So, pd dot read underscore CSV, and here inside the brackets in single quotes we will load our dataset.
So, now you will be able to see the dataset.
So, our data is loaded in this way.
So, in this we have only two columns.
And we had just discussed that our equation is in this format, Y is equal to MX plus C.
In this we have one Y variable and one X variable.
So, here also our salary is Y, which we want as an output or that we want to make predictions about.
So, the variable that we want to predict about or want to get as an output, that variable we call as Y.
And this variable we also call it as a dependent variable.
And you can also see that this is continuous , like we had earlier seen in classification problems where our output variable used to be categorical, where we used to have binary classes or three classes or more than three classes.
But here we have continuous data, and here if we want to make classes, then the complexity will increase.
So, we don't divide this in categories but deal with it in a continuous way only.
That is the reason we call these problems regression.
To remember this, we can use this trick that if our output or Y is continuous then we will call it a regression problem, and we will run on it linear regression or multiple regression and solve them using such a regression algorithm.
So, this was our dataset.
Now, in this dataset we will have to decide input and output.
So, as we had discussed, our salary is our output so that will become our Y.
And this year of experience is our input, so it will become our X.
So, let's extract them separately,
So, here I will extract X and Y separately,
To extract them separately,
X is equal to the dataset, and here I will directly write years of experience, so this will become my X.
So, you can see my X with only years of experience is displayed here.
Now, we will work upon Y,
So in the same way, Y is equal to dataset off salary, so this is our y,
So, we have received an array for Y.
So, our data is splitted now between X and Y.
Now, we should import all the libraries that we need.
So, import numpy as NP
This is an important Library.
Then, import matplotlib dot pyplot as PLT.
So, we will import this also.
Thereafter if we require anything else, then we will import during run time.
So, before that we will plot our data and see it once,in the form of x and y that is available to us.
So, you can see PLT dot plot and here we will pass X and y, along with that we will pass one more parameter, so we will pass that we want the plotting in red colour, and X meaning we want it in cross symbol.
Now, you can see our data is visible in this format, so here our X and Y is displayed.
Again we will emphasise on our original X and Y, where one is salary and the other one is years of experience.
So, we know a general trend, that as our experience increases in the same way our salary will also increase.
So, that is what is seen on this diagram, that as the experience is increasing, in the same way salary is also getting increased.
So, let's move ahead.
Now, we should split our X and Y, for training and testing.
So, let's split this between train and test.
For that, we will use S K learn dot model selection import train test split.
Ok! So we will use a train test split.
Now, we will put x underscore train comma x underscore test comma y underscore train comma y underscore test.
Ok! So x train and X test and y train and y test, is equal to train test split.
And in this we will pass complete X and complete y.
And here we will also mention the test size that we want, so we will keep the test size as 0.25, or let's keep it as 0.30 percent.
And here, we will initialise the random underscore state as zero.
So, normally we keep the random state as zero.
So, in this way our data is splitted between training and testing.
Now, if we want to check this,
Then we can do x underscore train dot shape, so 21 records have come under training.
Then, x underscore test dot shape,
So, here we have 9 comma 1, so we have 9 records in the test.
So, now let's move ahead and let us work on the model now.
Now, here we want the model of linear regression.
And that model also we will get from sklearn dot and here we will import linear model import linear regression.
So, we will import linear regression.
Now, we will create an object of linear regression..
So, linear regressor (5 sec pause typing) is equal to linear regression.
In this way, we will make one object for it.
Now, through this object we will undergo training of our model.
So, for training we will have to do linear underscore regressor… dot we will use fit function.
So, the fit function will accept our data.
And here we will have to put x underscore train and y underscore train.
So, with this our model will learn.
Now, after learning, the next step is to make predictions.
So, prediction will be done through y pred and linear regression dot predict method.
So, before that we will find out the intercept and coefficient.
Because we had already discussed that our equation is y is equal to mx plus c.
So, in this let's find what is M and what is C.
Now, I will print and show you.
Linear underscore regressor dot intercept.
So, when I will show you this intercept then…
This value you can see is 26777, is the value of intercept.
So, intercept is your C,
Now, I will again print and show you the value of the coefficient.
So, print linear underscore regressor dot C O E F coefficient. (5 sec pause)
So, coefficient is our beta1, meaning it's our M.
So, in this way we got our two values.
So, the major task of regression is to find out these values.
So, these values are, that are going to determine, be it whatever your x is.
So, your y would come out of these two values.
Ok! So this is their importance.
Now, let's move and we will make our predictions.
So, for prediction y underscore pred is equal to linear regressor dot predict, so linear regressor dot predict and here we will put x underscore test.
So, we got our prediction in y pred.
Let me show you.
So, the same is there in our y test also, which are our observations.
So, this is our y pred and this is our y test
Now, on the basis of these two, we will make one as a regression line and the other as scatter plots.
With which we will be able to understand how our data will be spreaded.
So, let's do it,
First, we will put plt dot and pass here x underscore test and y underscore test and this also we will plot in the form of R and X.
Next, after this PLT dot plot x underscore test and y underscore pred, after this we will have to mention the colour that we want so, C is equal to black.
And, let's execute.
So, now you can see with X test, y test and y pred, we can see this graph that we got.
So, this line is fitting in between here properly.
So, this line that is formed out of predictions, this line is Called as regression line. This line is Called as regression line
So, whatever output you will get in future, we believe that this line will proceed in the same trend in future also.
So, in this way with new inputs that will come, we will be able to find out the value of y.
Or with y also we would be able to determine x.
So, that's the benefits with the regression line that we just saw, how we can make predictions, we predicted the salaries, and along with that we also saw how we can do the plotting and interpret it.
Now, after this the next part is of evaluation.
So to evaluate from SK learn… we will import metrics,
So, because this is metrics, so from this we will remove mean absolute error, so print metrics dot mean underscore absolute error.( pause 9 seconds ; typing)
To remove the absolute error mean here I will have to pass y test and y pred.
Ok, so I will have to pass this, and now I have passed this, and execute it now.
So, here we have 3737, which is a high value
Because these metrics when they are close to zero then it is good, they are perfect.
And, if they are going far from zero, then it is not a good model.
So, error when close to zero is good.
And, here it has gone till 3737, which is a big value.
So, in this way we have removed the mean absolute error.
Similarly, we can remove mean squares error.
From sklearn dot metrics,( pause 3 seconds ; typing)
What will we do?
this time we will import mean squared error,
So print mean underscore…squared error, here so we will have to pass both the variable y underscore test and y underscore (pause 8 seconds ; typing ) pred.
Here, also you can see that we have got a very high magnitude mean squared error.
So, if magnitude is high, meaning accuracy is very less,
So, in this way we can evaluate the model and understand how it is performing.
So, keep watching and remain active.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Ruturaj Nivas Patil
Very well explained in entire course. Great course for everyone as it takes from scratch to advance level.