Now, before understanding different types of hypothesis testing, we will once see what is covariance and correlation.
So, suppose if I have any two variables, any two features and I have to find out to linear relationship in them, which means do I have any relationship between two features or not? What is its direction, which means is it directly proportional both the variables or they're inversely proportional to each other? For that we will use covariance.
We have already covered variance.
Covariance means when we have to find variance between any two variables.
So, it is a simple thing that covariance gives us direction to find the linear relationship between any two variables.
If we are increasing the value of any one variable, that can have any positive or negative impact on the other variable’s value.
So, simply my covariance can lie between negative infinity and positive infinity, which means any of my value lies between minus infinite and plus infinite that becomes my covariance value.
So, with this we get to know the directions that how our two variables are related to each other.
So, there's a simple formula of this, which is in this thing, if I have two variables, one was denoted by x and the other is denoted by y.
If I subtract that variable from the mean and multiply them both, and then I take its submission, which means submission of the product of differences from the mean, it seems a big thing to say what if we understand it simply with this kind of an example then this becomes very easy and we will divide that value by n.
If I have population data, we will divide it by n minus one if we are calculating the sample covariance.
Now, I will put this formula in data set and find the values, that how the covariance gives us those values.
Suppose, I have two data sets, which simply are sample data sets where x’s value is 2.1, 2.5, 3.6 and 4.0 and my mean is 3.1.
Y is another data set whose value are 8, 10,12 and 14 and its mean is simply 11.
So, if now, I have been told that I have to find a covariance in this.
So, what we will do is we will and then we will mean from the x value, from the y's value will subtract y’s mean.
We will multiply them both and then we will sum all of those values and divide it by n.
So, if I start putting these values then 2.1 - 3.1 becomes my first value, I multiplied it with y’s first data point, which means eight then I subtracted it with 11, which is y’s mean.
In this way, I added another value into it which means 2.5 - 3.1 multiplied by 10 – 11.
In the same way I took summation of multiplication of those four values and put its value by 4-1, which means 4 data points, 4 sample points.
After doing all the calculations I get one covariance which is 2.267.
So, this means that my covariance value is positive which means this both variables and related to each other positively.
So, with this we got to know that what is the direction of our relationship, if we see it in curve then if the covariance, between X and Y is less than zero, then my curve will look somewhat like this, which is in the left, if there is no covariance between each other, between two variables.
Then simply all the values will be clustered on one side and my curve will look like the second curve.
If my covariance is positive then these two X and Y will be shown together like third graph.
So, simply covariance is useful when we have to know the direction.
What is the relationship between both my variables but what is the magnitude? Which means how strong is the relationship and how much more is the strength between them, covariance doesn’t tell us that.
For that we will see how we use corelations instead of covariance.
There is one more disadvantage of covariance, which is suppose you have one unit of both the variables and if you change even little from both the units, then your covariance will change completely.
So, to overcome these two drawbacks, we learn correlation coefficient.
What is simply corelation coefficient? It is one number which lies between -1 and +1.
If you remember, my covariance variance was between minus infinity and plus infinity, which means any number can be my covariance but correlation coefficient is between -1 and +1.
It also over comes the draw backs of covariance.
Which means correlation coefficient tell me the strength along with the direction between the two variables, which means how well they are related to each other? How strong is the relationship between them, that is told to me by correlation coefficient? So, the values of correlation coefficient which means 1, 0 and -1, what is the significance of all three and how we use them, let's see that.
If my correlation coefficient is one.
This means that it forms a perfect positive correlation.
What does this mean? If my one variable is changing, so, in the same direction my other variable will also change.
If corelation coefficient’s value is zero, it will mean that there is no correlation which we call as zero correlation.
Third, if my correlation coefficient’s value is -1, then it will simply show that it is a perfect negative correlation.
It means that if my one variable is changing, then the other variable will change in the opposite direction.
So, in this way by using different values of correlation coefficient, we can tell that how well is my curve correlated, like you can see in perfect positive correlation, on x and y I can have any values but the curves, all the data points are laying on the line.
What we will see in zero correlation? We will see that all the data points are scattered all over, which means there is no correlation between them.
Perfect negative correlation means my slope is negative.
This means that if my x’s value is increasing, then corresponding to it my y’s value is reducing.
So, by using different correlation coefficient, we can easily visualise how strong is my relation between two variables.
So, we will see once if I have linear correlations and if I have different values then what is its significance? Simply correlation coefficient tells us that how easily and how properly will it fit on one line.
Like if you see the curve of perfect positive correlation and perfect negative correlation then simply all my data points are laying on the line.
High positive correlation and high negative correlation, my data points will lie around the line which means both of our variables are some or the other way coordinated.
In low positive correlation and low negative correlation, you will see that all our data points are farther away from the line, which means we can easily see the curve and say that the correlation is not that strong.
So, we have seen what is covariance and correlation.
Now, there is a one particular relationship in them, how we calculate correlation with covariance? We will see it with this simple formula.
If I multiply x and y’s covariance with x and y’s standard deviation.
So, the value that I will get that will be a corelation’s value.
Where you will notice one thing that covariance and standard deviation.
Both together, numerators and denominators cut each other's value.
So, the correlation’s coefficient is a dimensionless, which means it is called one unitless coefficient.
So, we have seen now, that how correlation overcomes my covariance drawbacks.
Now, we will see once what is the difference between the two or where all are they used.
Covariance simply tells us a linear relationship’s direction between any two variables.
Along with the direction covariance tells us the strength between any two variables.
Covariance lies between minus infinity and plus infinity and correlation lies between -1 and +1.
My covariance gets affected a lot on changing the scale, which means if I have converted my X and Y’s scale from kilo grams to grams then my entire data or the entire covariance final value gets affected and it increases but correlation doesn't get affected by any scale.
So, that’s why correlation is a dimensionless quantity or it is such a coefficient which is unit free.
Covariance makes the units is of coefficient as his unit.
For any independent variable my covariance and correlations both are zero.
Why? Because those two variables or values don't keep any relations between each other.
Then it is justified by correlation or covariance’s coefficient that its value is zero.
Now, we saw what is covariance and correlation.
If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.
In this way, you can connect with other learners like you and you can discuss with them.
Share a personalized message with your friends.