You must have heard this term Variance a lot of times in your childhood.
Since childhood we have been learning variance and standard deviation.
So today we will see what exactly this means and how it is used in statistics.
Variance is a statistical measurement that tells us about the degree of spread.
What does the degree of a spread mean? It means how spread the data points are, in our entire distribution.
How is this basically calculated? It is calculated by taking the average of squared deviation from the mean.
First - mean: we know how to calculate it.
We take out the average of the squared deviation.
How to do it? We will be seeing it further.
A high variance indicates that the data points are widely spread.
A small variance indicates that the data points are closer to the mean of the data set.
What is the formula of variance, x minus x bar’s whole square divided by n and the summation of all these values? So, how do we calculate all these? Let's consider it through one data set.
Suppose, I have this particular given data set, what will be the first step to calculate let's find out the mean.
In this data set we will find the mean which is x bar. We have taken the sum of all the values and we have divided it by 6 because what is the total number of data points that I have? It is 6.
My Mean comes out as 50.
Next step is that I have to find each score’s deviation from the mean.
Just Like we did in mean absolute deviation.
But what did we do in that case? We have taken out deviations and converted them into absolute values.
Here we will not be converting them into absolute values.
Whatever value has come I will let it be as it is.
How do you have to take deviation from the mean? Whatever is your score, whatever is your value, you subtract it from the mean.
Whatever value comes you can note it down.
My third step here is, we will square every division So, basically if the first value that has come now is -4.
-4’s square? 16, 19 is my second value, its square is 361.
In the same way you will square all the values.
Next step is, we will add all the values of the squares that we have taken.
Whatever value has come, in the final step, we will divide it by n -1 or n.
Now, what is this n and n-1? This is basically sample size or population size.
We will see about it further.
In this particular scenario, as we had six samples, So, we will divide n-1 which is five, and our one variance will be calculated which comes out to be 177.2.
Now, we will see the major difference in population variance and sample variance.
When we collected the data of all the members from the population, in that particular scenario we will calculate population variance.
Now, with what the population variance is denoted? It is denoted with a sigma square, which is nothing but the summation of x minus meu square upon capital N.
Let's see the description of each term, sigma square is nothing but our population variance.
Summation is like sum of, X is my each data point whichever score’s value I had taken, those data points are there.
Mue is my population means and capital N is my number of values in the population.
Now, we will see sample variance.
Suppose you have collected data for a sample.
So, the sample variance that is there we use them basically to find the population variance.
Basically, infer population variance from our sample data.
Now the difference’s formula is in this way.
S square is equal to summation x minus small x bar square upon N minus one.
you must be watching both of these terms here that N minus one and N, and why was n minus one used here? We will know that right now.
S square here will be called as sample variance.
X would be my data point, x bar is my sample mean, n is my number of values in the sample.
Now let us know one very interesting thing.
Why did I use n-1 in the sample formula whereas in my population formula there was simply capital N.
Now if I consider the entire population, okay.
So, my result would be accurate.
Because I have complete data of the population.
But what happens in the case of a sample, we take a small fraction of the population on which we work which means my answers will not be accurate.
I will get such a kind of data which will be representing my population.
If I consider mathematically, n minus one is a smaller number than n. right?
When you divide any value with a smaller number, if suppose n minus one is a smaller number, if I divide any particular value with it, my result will be a larger number.
This is a mathematical concept you must be knowing.
Go through and revise this thing once.
If I divide any number with a smaller number then my result would be a larger number.
Which means that after dividing n minus 1, the sample that I will get will be a larger value.
What will happen with that, if I suppose I have a larger sample variance, it means that I have more chances, I have greater chance that it will capture my true population.
Which means I can create an unbiased sample estimate in that particular scenario because my population variance that I've estimated will be exact and it will not be biassed.
But suppose if I divide it by n and not by n minus one.
My sample variance would be biassed.
As we are trying to reveal the information about the population.
How? By calculating the variance of a sample.
So, we don't want to underestimate the variance.
But if we divide it by n, we will be clearly under estimating it.
So, this is the reason we have to divide the sample formula from n-1.
So that we can get unbiased results in the long run.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than