Third measure of dispersion is Standard Deviation.
Standard deviation is nothing but the square root of variance.
Basically, whichever value you have calculated of variance, if we under root it, if we take its square root, whichever value I get, would be called Standard Deviation.
It measures, how data is dispersed relative to its mean.
It is the same thing.
How far away the data points will be from the mean, that much would be the deviation.
How closer it would be to the mean, that much less will be the deviation.
The formula is nothing but the under root of variance.
In the same way, there is population standard deviation and sample standard deviation, okay.
Here you will see that the population standard deviation which is there, is denoted by sigma but the population variance was denoted by sigma square.
Why? Because, if I take the under root of the sigma square, it will become my Sigma.
It is nothing but the same thing, x minus mew’s whole square that is the summation and then divided by capital N.
Sample standard deviation is also the same thing.
In variance it was denoted by S square.
In standard deviation it is denoted by S.
Rest of the formula is different from the under root in the variance formula.
Next now, you must be thinking that, if I had variance then what was the need of standard deviation but standard deviation is such a statistical measure which defines, how spread out the numbers are in a data set.
In my entire data set “how spread out my numbers are” but what does variance give? It “gives an actual value to how much the numbers in a data set vary from the mean”.
Which means, if you want the exact number, the actual value, variance will tell you the same thing but how far it is spread out it will be told to me by Standard Deviation.
“Standard deviation is provided in the same units as the actual data in the data set”.
Suppose, if I take the wages example where there was salary, if I take out the standard deviation of that particular data set. So, then my unit of standard deviation would be in rupees only but the variance would be in squared units, okay.
Because we will take its particular square and that value will always be higher than that of the Standard Deviation.
That's why, the root of Variance means the standard deviation and, in that scenario, we use Standard Deviation because we can restore the units in the original form and it is easy to interpret it.
So, generally in any industry, Standard Deviation is used more because you can use it for the same units and to interpret it easily.
Let's see ahead, if I have standard deviation, if I have different values, what is its significance basically?
If I say I have standard deviation 5, I have standard deviation 10 or 20, in which way you will be able to define it, what is its exact meaning, we will compare it and see.
So, standard deviation is a very useful measure of spread for normal distributions.
Now, what are normal distributions? One symmetrical curve.
Suppose these three values like I said, the standard deviation as 5,10 and 20.
I drew the curve for all these three, that curve ideally is seen like these three curves.
Now the more would be my standard deviation.
That means that the data is very widespread because as much is the standard deviation the curve would be that much flat.
Like in this particular scenario, the 20 standard deviation curve is most wide.
How low would be my standard deviation is mean that its peak would be that high and that there would be that much or less spread.
So, in these particular three scenarios, you will see that my green curve is the flattest and the data is widespread.
Which one is of the highest peak, it is of the blue curve, which is the least standard deviation.
And the one in the middle, which is the red curve that has the standard deviation 10 and it lies in between those two graphs.
One very interesting property of standard deviation is the empirical rule.
If you know the concept of standard deviation then you can achieve a lot of things in the field of statistics, machine learning and data science.
I have one particular example, one particular use case of empirical rule.
What does this rule mean? Standard deviation and the mean together can tell where most of the values in the distribution lies.
If they follow a normal distribution.
What does this mean? This means that suppose if I have a normal distribution curve, okay.
On that curve how many values are lying in how much distance, how far are they lying, how is their spread, all these information I can define easily if I know both standard deviation and mean.
We call empirical as 68 95 99.7 rules as well.
Now what are these three values called.
Suppose I have this particular curve which is a normal distribution, okay.
Here I have considered my mean as 50 which you can see in the centre and if I have figured it out through my analysis, that my standard deviation is 10.
Now what does 10 standard deviation mean? It means that if my mean is 50, if I go to the right of the mean, then I'll be moving 10 standard deviations.
If I go to the left of the mean then I moved 10 standard deviations.
It means after 50 the 60 that you're seeing and before that, the 40 that you're seeing, it is nothing but my first standard deviation.
Which means this particular entire value that is there.
If you see 60, it is nothing but mean plus one standard deviation.
If you go behind it, on its left then that is mean minus one standard deviation.
And in this entire distribution the value of this particular curve, of this one standard deviation from the mean, this will be nothing but 68% of the score will be distributed in this area.
Which means if I calculate one standard deviations value then 68% of my data will lie in this range.
Okay, now let's consider that I've got two standard deviations.
What does two standard deviations mean? If I move two times standard deviation from 50.
Okay, which means 50 + 2X10 which is nothing but 70.
So, if I reach 70 and reach its back which is mean minus two standard deviations of… which is nothing but 30.
If I move two standard deviations behind then it will define the 95% of the score of the entire distribution.
In the same way if I move three deviations then it informs me about 99.7% of the distribution.
This is a very important property of standard deviation.
This is going to be of a lot of use to us in the coming times.
This is very practical to use.
Why? Because you must know how normal distribution is divided into different distributions.
So, we saw here 68% Is the score for my first standard deviation, 95% is my score for second standard deviation, 99.7% is my score for the third standard deviation.
So, this is why we call it a 68 95 99.7 rule.
Okay, so this is the very important rule which we call the empirical rule of standard deviation.
And it is very important for us to be updated about this.
Let's come to the final, fourth measure of variability which is nothing but the range.
Range is also a measure of variability which tells us about the spread of the data but it tells it from the lowest to the highest value in the distribution.
It is a commonly used measure and it is the easiest one, okay.
How do we calculate range? I have a maximum value and the minimum value in my data set.
If I subtract them both in highest and the lowest values, when I subtract them, the value that I get will be called my range.
Large range means there is more variability in the distribution and smaller range means there is low variability in the distribution.
So, in this chapter, we have covered four different measures of variability, which means absolute deviation, variance, standard deviations and range.
In the next step, we will see what the measures of position are.
So, thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.
Share a personalized message with your friends.