In the last unit, we saw what is simple linear regression.
But many times, it's so happens that we don't have only one important variable to predict a good model or to make accurate predictions in that case, we use multiple linear regression.
So, simply when is multiple linear regression used when we have one dependent variable, which means when we have to predict one y’s value based on two or more independent variables.
Based on different parameters and different values of x.
We have different predictors and we have to predict one value.
At that time, we use multiple linear regression.
Now, if we want to find that how strong is the relationship between the variables even then, we will use multiple linear regression.
One very important factor is that if we wish to find y’s value and I have been given different X values then in which way we will use multiple linear regression we will cover them all in this unit.
Now, we have seen simple linear regressions formula which was very easy, where we had beta knot plus beta 1x plus error.
Here since the different other parameters also comes, other different X values come.
So, my formula gets extended like y is equal to beta knot plus beta 1x 1, here x one is my first independent variables.
In the same way x 2, x3, up to x 10.
We can have n number of variables and for that we can form a linear regression equation.
So, now what is beta knot here, beta knot is simply my Y intercept, which mean all my x’s parameter value becomes zero then whatever is my y's value that will be called as beta knot.
Beta 1x1 is simply regression coefficient of the first independent variable, which is x 1, beta n x n for last independent variable whichever is my regression coefficient that will be called as beta n x n.
E is called my model error.
The way in simple linear regression case, we had understood it properly with an example, in the same way we will also see in this module.
How does multiple linear regression work.
Let's say there is one trainer and he wants to see different training sessions effect on his player’s performance, like yoga session and weight training sessions or he wants to see how the players will score the number of points if they do different trainings.
Here, our points scored is considered as the value that we have to find.
Which would be our response, which means my dependent variable Y will be points and yoga sessions and weightlifting sessions will define x1 and x2 value.
So, my equation of multiple linear regression becomes points scored is equal to beta knot plus beta one multiplied by yoga session, which is my x 1 plus beta 2 multiplied by weightlifting sessions.
So, in this way, by simply using multiple linear regression equation, we can put it in our formula.
Now, let's understand what does beta knot, beta one and beta two like values, what do they define or what is their significance? My coefficient beta knot, that represents the expected points scored by any player when he attends zero yoga sessions and zero weightlifting sessions, which means, if both my x’s value becomes zero, then whatever are my points scored that will be defined by beta knot.
What is beta one, beta one will represent when my weekly yoga sessions will increase from one, which means let's say if one yoga session is done by a player then what will be the effect on points scored, keeping in mind that he has not done the weightlifting session, which means weightlifting session or x 2 is constant.
Then whatever effect has x 2 put on point scored, that will be described by my beta1.
In the same way like beta 2 depends on my weightlifting sessions.
It is a coefficient where we will get to know the average coefficient point, when the player has done weightlifting session and not the yoga session.
Which means when my yoga session is constant, and the weightlifting sessions value increases by one.
So, whatever are my points for that will be defined by beta 2.
So, in this way by depending on the values of beta 1 and beta 2 values, the trainer can assume, trainer can give the conclusion that which training the player has to do more and which training the player has to do less, so that their points can be increased.
So, in this way you saw that two different x’s values we used and how we used our multiple linear regression or how widely uses principle our concept, it is very important for you to know this.
So, we have seen simple linear regression and multiple linear regression.
Now few things which are similar in both and few things that are different in both.
Let's see that one's.
So, simply, if we were talking about simple linear regression then we've got two variables x and y and my data was in 2d plane where we will simply draw a line and find the best fit line.
In multiple linear regression instead of line, I got hyperplane, hyperplane is such a plane which can be fit in any of my 3d model.
Coefficients are still found by same methods which is minimising the sum of squared error or by using ordinary least squares criteria.
Assumptions that we had learned in simple linear regression.
Those are also valid for multiple linear regression, which means, zero mean, independent and normally distributed for error terms with constant variance.
But there are such few concepts that are different from simple linear regression when we move towards multiple linear regression.
Here the concepts that I'm going to tell you, those you will learn in detail in machine learning algorithm very widely.
Here I just want to give you a basic overview.
if you use multiple linear regression in the place of simple linear regression, how your model gets affected.
So, first of all, the thing which is different is overfitting.
All the topics that are there, all the four things that we are seeing here, all four are very important but you will learn them all in machine learning in extreme detail that how all these four things are used in multiple linear regression.
So, the first thing is overfitting.
overfitting means as soon as you add variables in multiple linear regression, which means in simple linear regression I had only one variable, in multiple linear regression I have got more than one variable.
If let's assume that I have used 10 variables in my model to predict my one variable, which means my model becomes extremely complex because of which the more data you will already feed to your model.
So, what it does on one time, it memorizes all the data points which means the more data you will give to any of your linear regression model.
It will remember the things that easily but when you throw any unknown value towards it, which means, when you gave the value to predict in the model, then that will not be an accurate result.
Why? Because the value that it has already found corresponding to it, it already knew the result corresponding to it but when you put some unknown value then that model will not be able to give exact or accurate results.
We call this concept as overfitting.
This is a very widely used concept, it is used a lot in machine learning.
I recommend that you learn this topic nicely in machine learning.
Second thing that comes is multi collinearity, which is created while coming from simple linear regression into multiple linear regression.
Multi means many and collinearity means relationship.
This means that whichever independent variables we use, which means the different x’s values that are there, it is possible that those values are such that are related to each other.
It is possible that if I have to predict the whether it will rain tomorrow or not.
I have got different variables for it like temperature, pressure, humidity, humidity at 4am, humidity at 6am, humidity at 9am.
So, if you will see these variables where I have three variables of humidity.
It is possible that all these three carry the same value in them or they have a strong correlation in them.
So, what will happen in this, when in your model or in your equations, these three put such a value which are creating are creating a lot of effect, then our model will remember these values.
With this our prediction model will get affected, this concept is highly used in our multiple linear regression, in machine learning.
So, to solve it what we will do is, whatever is the redundant value, whatever are the redundant variables, whichever are the repeated variables we drop them or we delete them.
Third important thing that comes which creates a difference between simple and multiple linear regression that is feature selection, like its name suggest.
We have so many different variables, but it is not necessary that all the variables are used to predict Y.
In that particular case whatever repeated variables which are there we have to remove them and the important variables which are there we have to select them.
So, we call this concept as feature selection.
Next and finally, important things that in linear regression we calculate R square if we want to find best fit model but in multiple linear regression, we use adjusted R square concept.
Adjusted R square means that when we start adding values in our model, we will add different variables then our R square will constantly increase, with this we won't understand that what is the exact value that is making the difference in our model’s prediction because of this we will modify our r square formula a little.
In which we read this formula in this way, adjusted R square is equal to 1 minus, 1 minus r square, multiplied by n minus 1, divided by n minus p minus one.
Here the r square that is here it is simply the R square value that we have found, it is that.
N is my sample size or the number of rows in your data set.
In simple terms, they are those.
P is my number of variables, which means whatever is your x’s value if you have used five variables, then your P’s value will become five.
In this way in adjusted r square, you put your values.
In this way you get to know the corrected R square value.
So, we have covered a lot of concepts like overfitting, multicollinearity, feature selection, adjusted R square.
All these concepts are important and they make a base to learn any machine learning.
First of all, if you want to learn any machine learning algorithm then it is linear regression.
I hope you have understood all these concepts you have understood in detail in the machine learning module.
This brings us to the last module of our course which is fundamentals of regression analysis.
In this module we learned a lot of important things like what is regression analysis, when we use simple linear regression, when we use multiple linear regression.
We learned about all these topics in great detail.
In simple linear regression we saw how by using dependent and independent variable we find out a best fit line and then we covered what are my assumptions of simple linear regression.
By the interview point of view, this is a very important question and you completely remember it that what are the assumptions of simple linear regression, then we have covered when multiple linear regression used.
When I have more than one independent variable and I have to predict them in my regression analysis.
At that time, we use multiple linear regression.
We saw it through the formula that how intercept and slope play its important role and, in this way, we learnt a lot of things in the last module of our course and we understood a lot of things and with this, I wish you all the best.
If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.
In this way, you can connect with other learners like you and you can discuss with them.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than