Hello,
I am (name) from LearnVern,
In our previous session of Machine Learning we saw about Evaluating Classification Models Performance.
In which we saw the different types of classification methods that are available using which we can evaluate the performance or the accuracy of the algorithm that we are using.
Now, we are moving ahead to learn about Regression.
The meaning of the word regression itself is to predict a continuous variable, meaning if you are running, then I ran 10 kms today, and the next day I ran 12 km, and thereafter I ran for 15 kms.
So, if I run in this phase, then how much I will be able to run the next day.
So, How much will I be able to cover?
So, if we go by the trend from 10 to 12 then 15, so the next coming day I can reach almost 17 or 18 kms.
So these values 10, 12, 15, 17, 18, these are continuous values.
So in such examples or situations, where we have to predict about continuous values, there we take a regression method.
Now, we are going to learn about simple linear regression.
So, let's understand this concept.
So, in simple linear regression we have a dependent variable and independent variable.
Here, dependent meaning needs to be predicted.
For example, if there is a flat in XYZ Location, what will be the price of that particular flat?
And, the same flat is located in some other ABC Location, then the price of the same flat will become
Here, if you see price is a dependent variable because price needs to be predicted.
And our independent variable is Location, because we know that based on location the price varies.
So, in this way we have one dependent variable and one independent variable.
So, this is what happens in simple linear regression, where we have one dependent and one independent variable, and there should be a linear relationship between them, so the basic meaning of this linear is that there should be a linearly ongoing change that should be visible.
So, let's take one more example, to understand it better.
For example, a person is running, and his speed is 2 km per hour.
So, based on all the circumstances, there are chances that it can reach its destination in 2.5 hrs.
Now, if he increases the speed, then the time taken to reach the destination will decrease, maybe around 1 hr 50 min.
If he increases the speed even more, then the time taken will decrease even further.
So, this type of relationship is known as linear relationship, so as the speed increases the time decreases.
Or, as the speed is increasing the distance covered is increasing.
So, in a linear relationship if one variable increases then the other variable also increases or decreases along with it.
So, this is known as a linear relationship.
So let's move ahead, and we will understand this with the help of a diagram.
So in this diagram, it has X variable units and here it has Y variable units.
And these orange colour blocks determine order and depicts the data points.
Then we have tried to draw a line in between these data points as to how it fits between them.
So this line is known as the regression line.
So by sitting this regression line, the basic benefit that we get is that apart from these known data points if at all we get any other new unknown data points, then we would be able to predict them, because we can extend this line upward and downward also.
So, if any new value comes at X axis then based on its data point on the line, we would be able to find out the corresponding Y value which will help us to predict its output.
This is the basic benefit of simple linear regression.
Now we will move ahead.
So here is the straight line drawn for fitting the datas.
So we have tried to fit a line between the data points so that it can fulfil the purpose of classifying them.
By looking at this case over here, such a kind of fitting is known as underfitting.
So the precision or the accuracy of this type of line, as this underfitting would be very less, also this underfitting will give a lot of wrong predictions.
Next we will see about appropriate fitting,
So here based on the data points as they are arranged, so if we are required to make a line of regression in a curve form, then we should be doing that.
So, such a type of regression line is an appropriate regression line which will classify the data points correctly.
And this is overfitting now. What happens in overfitting is that it can give the correct outputs only for the inputs that are present here and would not be able to predict for any new inputs.
It will give wrong output based on guesswork as in those cases as it has no experience.
So, we should avoid over-fitting or under-fitting.
And we should also choose appropriate fitting.
So, let's move ahead.
So, as we had discussed what is simple linear regression?, where it has only one independent variable,
So, it's formula is something like;
Y is equal to B not plus B1 x1 plus E.
Here Y is a dependent variable, or the target value whose value we have to find out.
B, is intercept
B1, is coefficient of X1, and
E, is an error.
So, this is the formula of simple linear regression.
And we know this formula by other names also, like Y is equal to MX plus C
So, using this formula we can remove output.
And our output is a prediction.
Now, this linear relationship can be positive.
Here, in the diagram you can see as X is increasing in the same way Y is also increasing.
So, this is a positive linear relationship.
Growth oriented linear relationship.
Next, is the negative linear relationship.
So our example of running fast decreases the time taken, so such a type is a negative relationship.
Where your speed is increasing that is X but Y decreases that is the time taken would decrease and distance covered will be more.
So, this is a negative relationship, where X will increase but Y will decrease.
Whereas, in positive both will increase.
Now, we will look at applications
Marks scored by students based on the number of hours they have studied.
So, marks scored become dependent and the number of hours will be independent.
Now, let's understand this with an example.
So, here we have marks Scored as dependent variable and number of hours studied is independent variable.
So, marks scored means the number of marks a student is going to achieve based on the number of hours he has studied.
So, such types of applications we can use for simple linear regression.
Second example is predicting the salary of a person, where salary would become a dependent variable and experience would be an independent variable.
So, in this way, based on the experience , the salary of a person can be estimated.
So, this is an example of simple linear regression.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.