In this chapter, we will learn about Measures of Variability.
First, What is Variability? “Variability describes how far apart data points lie from each other and from the centre of the distribution.”
So, suppose I have a particular data, in that data I have one centre, which I have defined.
So, all my data points, how far are they from that centre, that is told to me by Variability.
What was there in Central Tendency? We had seen where most of the data points lie.
Okay, but what will variability tell us? How far do the data points lie from each other? And amongst themselves, how far do they lie? Even this is told to us by variability.
This is one important measure of dispersion because it tells us that where all the data points are clustered around the centre or how wide spread it is, okay?
Suppose I have low variability.
Low variability is ideal because it means that “you can better predict information about the population based on the sample data”.
If you have low variability, it means that my data is clustered, if you have high variability, it means that the values are very less consistent, which means that our values lie far apart from each other and we cannot create the predictions easily if I have a high variability case.
Let's take an example, if we test the data of the amount of time spent on phone in three different groups of people.
So, in a day, how long can one person use the phone? We have checked three different groups for this.
Suppose, there is one sample of high school students, second sample I have taken of college students, third sample I have taken of adults who are into full time employment which means of the corporate employees.
In that particular scenario, I have drawn a curve of all three samples, the blue data that you can see is my sample of “A” data, “B” is represented with Green.
Yellow data is my sample “C”.
I have drawn one curve, probability versus minutes used on the phone, okay.
In that I've got three different curves.
Now you can see that all three curves are not alike.
But there is one thing that is common in all three curves.
What is that? Their average, if you see average in all three curves, if you see the mean which is lying at the centre.
It is 195 minutes, ideally, each person has used the phone for 3 hours, which I am getting to know from the average.
But all three have different spreads.
And what is there in it? “A” is a highly variable curve, “C” has the lowest variability.
So, in this way, through different curves spread, we can get to know which one of my measures is better.
Which sample is better and which sample is not that good for the predictions? Okay.
Now, we will come on ‘different types of measures of variability’.
We have seen in the last chapter there are four types of variability.
First is the mean absolute deviation, variance, standard deviation and range.
We will see about all four one by one.
First is my Mean Absolute Deviation.
Mean absolute deviation, these are three different terms.
In this, we just saw what Mean means, Absolute means suppose I have any particular value.
If you apply mod to it and you keep that value positive always, that would be called absolute.
Suppose you put mod against -4 value.
So, my converted value becomes 4 which is plus 4.
Deviation, which we were seeing right now, is: How much is the difference? How much is the dispersion? How much is the Variability?
Combining all these three terms together, what will be the definition? “Means Absolute Deviation of a data set is the average distance between each data point and the mean”.
So, if we take the difference of each data point with its mean.
So, that will be called our Mean Absolute Deviation.
And it's all about the variability in the data set.
How can it be calculated? It can be calculated like this. You'll first calculate the mean, then you will subtract it from all the data points, then you will take out its mod and divide it by the number of samples.
Let's describe this formula through one example.
Suppose I have data of how many likes you have got on the six pictures that you have uploaded on Instagram.
Suppose the first picture got 10 likes.
The second picture got 15 likes.
The third picture got 15 likes.
The fourth picture got 17 likes.
The fifth picture got 18 likes and the sixth picture got 21 likes.
So, what will be the first step that you will take, to find mean absolute deviation? First of all we will find its ‘Mean’. How will we arrive at the Mean?
Mean would be sum of likes divided by total number of pictures, which is nothing but 96 divided by 6 which is 16.
My second step would be, I will calculate the difference of all the data points from the mean.
Basically, What does this mean? Whatever will be my distance from the mean, how will it be calculated? 16 was my mean, 10, which is the data point, if I subtract it from 16 and then take its absolute value which is 6.
In the same way, I calculated the different values.
If there's 15 Then 15 minus 16 which is equal to -1.
I took it as my mod and the value that I got was one.
Next, how much ever is the distance from mean, I have added all those values.
From that, my final sum was 16.
Now my last step will be, to calculate Mean Absolute Deviation, to this particular sum, the total sum that we have calculated, I will divide it by the total number of data points, which is 6 which comes as 2.67 likes.
So, ideally, we can say that, through mean absolute deviation, on an average, we can say that every picture is three likes away from the mean.
So, if you have any comment or question related to this course, you can post by clicking on the discussion button given at the end of the video.
In this way, you can connect and discuss with more learners like you.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than