Come, let’s start our sixth module, in which we'll discuss in detail about linear regression’s fundamentals.
So, first of all, what is regression analysis? Regression analysis is such a tool in which we investigate how two or more than two variables are related to each other.
Let's say, suppose I have a graph in which on the x axis there is the size of a house and on the y axis there are prices.
If you see on the x axis, we have different house sizes, which means, if it is a small house then you will have to pay the price of $70,000 for it.
If you will see at the end, as and when the size of your house increases, you will have to pay the price accordingly.
So, in this way if I have to relate, how the size of my house affects the price of my house.
So, in that situation,we follow linear regression.
Let's say, one more example.
If we assume, whether there is a relationship between a person's height or a person's weight or not.
In all those situations, what we do is, we normally collect our data points.
Which means, we collect a relationship in the form of data points.
And then we plot those data points on the graph, which we call the Scatter plot, and then we will analyse, in which way any of my curves is getting related.
Which means how, my x variable is related to my Y variable.
So, in this, the x variable that we have, we call it as a Predictor Variable.
This is a general term.
This is widely used.
If you see, statistics for data science.
This term is widely used in data science.
Regression analysis, in itself, is the most important algorithm of machine learning.
So, we were seeing that, we call x axis as a predictor variable, we also call it as an explanatory variable also.
Which means, it is such a variable, which is helping us to predict.
This means, if I know the size of the house, then the Y, which is the price, what can be its value, we can predict that.
One more important thing.
X is mostly called an independent variable.
Why? Because it is not dependent on anyone, it is itself giving us the result.
Y on the other hand is called a response variable.
Because if x is predicting something, then Y is giving me its response.
Y is mainly also called a variable of interest.
Why? How is it found? we will see it further.
We also call it as dependent variables like in this particular example, size of a house is my predictor variable or independent variable and price is my response variable or dependent variable.
Now, based on the number of independent variables, our regression is of two types.
Which means, if I have only one predictor variable, like in the current example, only from the size of the house, I was predicting, what would be the price of the house?
So, we call that scenario as, simple linear regression, which means one such regression, where I have only one predictor variable, one such independent variable with which we can calculate our response variable, but, in some scenarios, it so happens that there is not just one predictor variable, we have multiple predictor variables.
We call that technique, a multiple linear regression technique.
So, in this module, first of all, we will cover the basics of simple linear regression.
Simple linear regression is such a regression technique, where there lies one dependent and one independent variable.
Whichever is my dependent variable, which means, y’s value, which is the price of the house, must be in numerical form.
When and why do we use simple linear regression? let's see that once.
If I have to find a relationship between two quantitative variables, which means one dependent and one independent variable, where both are numerical quantities, I want to find the relationship between the two, then, we will use the simple linear regression.
Between those two variables, how strong is the relationship, to find that we will use simple linear regression.
Which means, if you have to find the strength.
Those are tightly correlated or loosely correlated.
Third, If you have to find the value of a dependent variable which means Y’s value based on given values of x.
Which means, if we see, if I have the size of the house, already, which is x and I would be asked, what would be the price of the house? In that situation, we have one linear relationship, for any value, we can predict that value in our curve or plot.
Simple linear regression, explains the relation between dependent and independent variables, through a straight line.
Which we also call a linear line.
So, first of all we will revise our straight-line concepts.
So, my equation of straight line, which we have learned in the primary classes or in the maths classes, is always, Y is equal to MX plus B
Here my Y is the variable whose values lie on the y axis.
X is that value, which lies on the x axis.
If you see B, we generally call it an intercept or if you see on the curve then such a value of Y, when my x’s value is 0.
N is a very important term, which will form the basis of linear regression.
We call it a slope.
So, in which way you will calculate m and b’s value? If x and y values are given, then finding b is very easy.
Normally, whichever line cuts my curve on the y axis.
Whatever will be my y’s value that becomes my B.
M is the slope means if I make any change in the x, what will be the corresponding change in y.
So, we call it slope based on which, our slope, as i have said earlier, plays a very important role, in describing any linear relationship.
Suppose let’s assume that you are riding a cycle and you have been asked to ride the cycle on the mountain.
Then what will happen in that case, my slope will be positive, which means you will take the cycle in the upper direction.
So, the positive slope means there is a positive linear relationship.
If one value is increasing then even the second value will increase.
Which means if the mountain’s height is increasing in a steep way.
So, in the same way, you're also moving upwards.
If my slope is negative then what we can understand with this is that you're coming down from the hill, this means that in a way you are taking a downfall state, which will form a negative linear relationship.
Which means, as one value is increasing, the other value is decreasing.
As and when you are getting down this means that one value is decreasing.
If my slope is 0, then we can consider it a normal situation, when we are walking on a road.
There we have, extremely flat surface then that slope is said to be zero.
Slope is equal to zero means, if one value is increasing, then the other value will remain constant.
We have revised straight line’s concepts, now we will see in which way, simple linear regression formula is derived from that straight line. let's see that.
Here if you will see, it is a very generic form.
You will find this equation, everywhere, in this way.
Even in machine learning when you will learn linear regression, even then this equation is very widely used.
Where my equation of linear regression is, Y is equal to beta knot plus beta one X plus E.
Y is my predictor variable or a dependent variable.
Beta knot is my intercept which we were calling as B that is called my beta knot.
Which means, if my x value is zero then what will be my y's value? Beta 1, we call it, regression coefficient over here or it is simply our slope.
The significance of beta one is that as and when I increase the value of X, what changes are we expecting in Y, that is defined by beta 1.
E here is my error of the estimate, which means, any equation that you will find, it is not possible that we won't get any error.
Like we see in this particular curve.
Here you can see an X axis and a Y axis.
Corresponding to that, you can see a few particular dots, which are my data points, which means for one particular x, I have one particular y’s value, drawn.
Here the line you can see in yellow, that is my linear regressions line.
So, when we see a linear relationship between any two variables, there is a very rare possibility that our data points fall exactly on that line.
So, for that reason what happens is some or the other error in our prediction is left out ,
let’s say, the value of XI on X axis, corresponding to it, if you see the blue dot then it is corresponding to the y’s value.
Which means, it is my observed value of Y.
If my x’s value has come X.
But if you will see, on the straight line, we have a different point that exists which is showing X corresponding to Y, which we have called as the predicted value of Y, for a certain value of xi.
My predicted or actual value which is there, the difference in both, is called my error or we even call it a random error.
In the same curve, if you see a beta knot, then that is seen on the left side in the form of an intercept.
What is the slope over here, whichever is my straight line, corresponding to that, if we draw one x and one y’s value, which means we draw a certain line, then that will be called slope beta 1.
Now, one regression formula which is there, in which way we use it, we will see an example of it.
Businesses generally use linear regression. If they have to understand, the money that I have invested in my ads to grow my business, compared to that, how much is the revenue I get? which means if I have any business and I have invested in an ad.
So, that the information is spread amongst the maximum population about that business, ads to run on TV or the internet. For that the companies invest money.
That will be called my advertising spending.
Through that ad, whatever is the growth in the business will be called my revenue.
So, if we want to find this relationship, what is the relationship between ad and revenue? This basically forms my linear relationship.
the more money you invest, the more revenue you will get.
So, let's suppose, the advertising spending is your independent variable which means, on that basis, you want to predict how much revenue you will get? In this case the revenue becomes my dependent variable.
My linear regressions equation that is formed, will be revenue is equals to beta knot plus beta one multiplied by ad’s spending.
So, in this case, this is my linear regression equation that is formed. Now, how different different terms play their roles, we will see that.
Beta knot coefficient represents, if I have kept x’s value as zero, which means if I have not invested any money on the ad, so what will be my expected revenue?
Normally when x equals to zero, what is the value of y? That is called my beta knot.
What is the beta 1 coefficient? Average change in total revenue, which means if I have increased my ad spending by 1 unit or let’s say I have increased it by $1 or by 1000 rupees, as per your wish you can apply the ad spending units.
So, let’s say that I have increased my ad spending by 1 unit.
Then what is the total average change that we are seeing in the revenue that will be explained to us by beta 1.
Now since beta 1 is my slope, that can be negative, positive or even zero.
So, if the beta one is negative, this means that you have spent a lot of money on the ad but your revenue has decreased.
If beta one is zero.
This means whatever money you have spent on ads; it didn’t have any effect on the revenue.
If beta 1 is positive, this means that whatever money you have invested on ads, you are getting the same amount of revenue.
So, depending on beta one’s value, any business can decide whether they have to increase their ad spending or reduce it, so that they get a profitable business.
So, in this way, we use linear regression on different examples and in different ways we see how we can increase our profit in the business.
Linear regression will always form a line around certain data points, which proves to be the best fit line.
Like in this same example.
If on the x axis I've got my ad spending in lakhs and on the y axis I've got my revenue in crores.
I have a certain data point which is fulfilling the X’s equation, that is, y is equal to beta knot plus beta one. So, in this situation, this particular line, which is there, that is fitting all the data points, that should be my best fit line.
But before understanding the best fit line, let's understand what are the residuals.
Residuals basically are errors, that are found in the predicted value and the observed value.
Like we have seen in one particular example right now or we will see in this example.
Let’s say if my x’s value is 2, corresponding to that my actual y’s value, or the observed y’s value becomes 7.
But by whichever ways my best fit line is for the linear regression, if we draw that and corresponding to that, we find the value of y.
When my x is two, then that value comes as four.
So simply if I do seven minus four, that will be my residual error for x is equal to two.
This vertical line is called my residuals or it is called an error.
So, mainly whichever best fit line we are fitting or by the linear regression whichever best fit line comes.
Its main goal is that it should keep the error as low as possible, which means lesser the error, better would be your model, that much better best fit line you will get, so that your linear relationship is easily defined by it.
Now, the next question comes, how to find the best fit line? To find the best fit line, we use one method which is called ordinary least square or OLS.
So, what do we do in this method? Let's say y is equal to beta knot plus beta one, corresponding to x, our error which is EI is equal to YI minus y pred.
Here “ I “ will change for different data points.
If we're talking about the first data point then EI is equals to E1 and yi will become equal to y1.
So, now I have an error, I have y’s predictive value.
Now if I want to use the ordinary least square method, then what does it simply do? It squares all the errors and creates their sum for every data point, which means if I take e1’s square e2’s square up to en’s square values and sum them up, then that particular value that comes, we calculate with ordinary least square and we call the entire term RSS, which is the residual sum of squares.
Why? because it is a sum of my residuals and we have squared them.
If we are finding the value of RSS in this way, the main intention of RSS is that it reduces its error, which means if in e1 square, e2 square , I put the values like square of y1 minus y pred.
Then I get one equation, simply where you will see that yi is my data point value, which is already given to me.
How will you calculate y pred?
Normally, if you put your value in beta knot plus beta 1 x i, it becomes your Y pred value.
Similarly, we have put, Y is equal to beta knot plus beta 1 X’s value in Y pred, And that entire value of E, we have put in this RSS formula.
So, when we put the values in this way, we get one formula of RSS, which is summation from 1 to n, yi minus beta knot minus beta 1, into xi’s whole square.
Where beta knot minus beta 1, xi will be called my y pred.
So, in this way, we found the RSS, to understand in which way we can reduce the error in the data.
But what is RSS? It is my absolute quantity, which means if I change the scale of my y, let's say from dollar to rupees, then that will affect my entire data, that will affect my entire equation of line.
So, for this we particularly want such a value, which is relative quantity, which means which is not affected by the scale.
So, what will happen in this situation? We assign a different matrix, which we also call as R square.
So, now we have seen RSS, before understanding the R square concept.
There are a few critical questions which are not normally given to us by RSS, like how well is our best fit line representing our scatter plot.
Which means how strong is the relationship between both my variables? If I want to find that, then what will I do? In that case, our R square comes in the picture.
So, first of all, let’s see how we can find how strong the relationship is in both the variables?
To find that strength we use, R square.
R square is simply called the coefficient of determination as well.
R square’s simple formula is one minus RSS upon the TSS.
RSS, as we saw, what is it?
Now once we will simply see what is TSS, which is total sum of squares.
So, let's say for a second we will assume that if we don’t have any idea about linear regression, which means we don't know, how to draw a best fit line or how we can represent our data points.
The easiest thing that we can do is, once we have plotted the data points on the curve, then we can pass such a line which is passing from mean of all points, which means one such line which is passing from that line which is calculating y bar.
This obviously can be the worst possible approximation for us.
Why? Because whatever values we have, all our points cannot lie on that point.
Except, very few but since we have to find deviations, in which ways, we can find the best fit line.
So, we have to have a comparison value, which would be a worse approximation.
So, because of this we use TSS.
Simply what does TSS do? It gives a deviation of all the points from the mean line.
The way RSS gives the deviation from the observed point of the predicted values, in the same way from the mean line, if we find all the data points deviation, then it gives me the total sum of squares.
So, simply what is RSS? Residual sum of squares, it is equal to sum of squares of the residuals for each data point, it is the measure of difference between the expected and the actual output.
Simply y minus y pred.
If we find it, then that will be my RSS. If my RSS’s value is small, then it will indicate to us, there is least error, in our best fit line.
We had seen RSS’s formula in this way.
My TSS is the sum of squares of the data points from the mean of the response variable, which means, if we do the whole square of yi minus y bar, and do its summation, then it will give me TSS.
So, in this way by finding RSS and TSS value and putting it in the R square formula, we can tell that in which ways, whichever is my model or whichever is my best fit line, how properly is it explaining my regression model.
So, simply what does ‘R square’ tell? how good my best fit line, fits.
So, to understand that r square’s value lies between zero to one.
This is a very important point that R square can lie between zero to one and how do the different values of r square keep what significance? Let's see that one's.
Let's say my r square value is one, which means it is the extreme value and it is considered to be your best value.
If in any model your value is r square 1 then it means that whatever are your data points, all lie on the same best fit line, which means we see a very strong relationship between x and y.
Let's see.
We have reduced the r square value a little bit where r square is equal to 0.70, which means somewhere around 70% relationship is defined between both the variables.
In this way you can see the line that there are very few data points that lie on the line and the other data points lie around it.
If R square’s value is further reduced, let's say 0.36, which means around 36 % which is my variance, that is the difference in my value.
That is defined by r square, which means if you will see the curve, all the data points are defined in a scattered manner.
But the least particular error which is defined by a line, that is shown by a red line.
R square is equal to 0.05, which means just 5% of the relationship is shown between both the variables, because of this the data points are spread all over.
We are not able tosee a very good relationship.
So, in this topic we have covered that in which way linear regression is done, simple linear regression is done.
How we can find out RSS if we wish to reduce the error.
If you want to find the strength of a strong relationship, then how r square is found.
If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.
In this way, you can connect with other learners like you and you can discuss with them.
Share a personalized message with your friends.