Today we are going to cover the topic measures of shape.
So, first of all we will see what measures of shapes are defined by.
It means that any of my distribution or pattern of any data within a data set defines my measures of shape.
What does that mean? Suppose I've got one data set in which there are years and I've demand in 1000’s.
Suppose in 2015, my demand was 10.
In 2016, the demand was 18, in 2017 it was 12.
I have drawn this entire data set in the form of a distribution, in the form of a plot.
I drew a year versus demand curve.
I'm easily able to define the patterns in it.
In 2015, it was less than 2016.
In 2017, it was less than 2016.
So, in this way the measures of shapes gives me information about patterns and distributions.
These are of three types, first is skewness, second kurtosis and third box and whisker plot.
Now, we will know about them one by one.
First comes my skewness, when we were covering our mean, mode, median, central tendency measures.
We had seen there that if my data is skew then I will be using median over there.
But then what does that skewness mean? It simply means that skewness is defined as a measure of symmetry.
It means that suppose I have a one particular distribution like we can see on the right.
I've divided it by half and I have overlapped both the halves over each other.
In that particular scenario, both the parts which are there, they will be called as symmetrical.
Why? Because they both overlap each other and there is no Skewness in them.
Skewness is basically the measure of symmetry and symmetrical distribution has zero skewness.
Symmetrical distribution has one very important property which is that the mean equals the median and it is equal to the mode.
So, basically all the three measures of central tendency are equal for a symmetrical distribution.
Skewness is defined as a third central moment.
Basically, why is it called the third central moment? Because if you see its formula.
There I can see it in the power of third.
Basically, I see it in the power of a cube.
If you pay attention, the variance that we had seen in the last chapter was in the square power.
That means that it is called my second central moment So, basically the “S” that we can see here is basically my sample standard deviation.
Now, the skewness are of two types, positive skewness and negative skewness.
So, first of all we will see what does it mean by positive skin? Positive skewness means if my majority of data lies to the left and my less information lies to the right, okay.
We'll simply call this a positively skewed distribution or right skewed distribution.
There is an easy way to identify this, if your distribution tail is bigger on the right side compared to the left side.
Then that is called positive skewness.
One very important property which is going to play a very important role when we will learn data science, okay.
In that mean is bigger than median and mode in positive skewness.
This is one very important property of it.
This is what we must know and always remember that since the mean takes the complete data, because of that the value increases and my median becomes smaller in case of positive skewness.
We will understand positive skewness with some real time examples, where we will experience it in our day-to-day life.
Suppose that there is one difficult exam.
We have given the exam, in that particular difficult exam, it will always be the case that since the exam is difficult, there will be very few kids who must have got good marks.
Otherwise, most of the kids will get less marks.
So, if I draw it on a plot.
The curve that will come will be positive skewness.
Basically, the distribution of an exam score on a difficult exam is my positive distribution, positive skewness.
Second example is a very lovely example.
Let’s consider whatever salary we earn.
Lets compare the salary of Elon Musk or Bill Gates or Ambani.
Those people are basically the ones who earn the highest salary.
They are the outliers because their salary is extreme.
But most of the salaries which are there lie in the normal range.
Since they earn a lot of salary.
So, this distribution of an individual's income is an example of positive skewness.
Let’s understand it with one more example, suppose if we buy the tickets to a movie, so most of the movie tickets are sold, everything is fine.
But suppose Avengers is coming to the market, Spiderman is coming.
These are some of the movies which are blockbuster movies and they give us a lot of earning, they sell a lot of tickets.
This is also a case of positive skewness.
So, basically, we saw three examples.
Distribution of exam scores in a difficult exam, distribution of individual income, distribution of movie tickets which are sold.
So, in the same way we will see more examples in our day to day life.
We can see a lot of examples of positive skewness in your day to day life.
Second is negative skewness.
Negative skewness is obviously opposite of positive.
So, if my majority of data lies to the right, okay.
And the less information lies on the left, I call it negatively skewed distribution or left skewed distribution.
The simple way to identify this is if your curve is bigger to the left, compared to the tail then I will call it as negative skew data.
Its property is that my mean is smaller in this case compared to the median and mode.
A relationship between mean, median and mode is that mean is less than minimum, less than mode.
Like we can see in this particular curve.
Let's understand it with a few examples.
We had taken one example of a difficult exam.
Suppose my exam is easy.
Okay, in that easy exam every kid in the class has to study for it.
Suppose, everyone is getting a score in the 90s.
Everybody is getting 100 score.
There would be very few kids in the class who must have obtained very few marks.
So this would be my negative skewness example.
Why? Because most of the information or most of the marks lie in the range of 60 to 100 because the exam was easy, but there are very few kids who must have got 20 marks.
In this way when I drew its curve, that was my negative skewness curve.
One very common example, suppose distribution of age of death.
Most of the people die at the age of 60 to 70.
Correct? Because more or less this is the human lifespan but there are very few such people who because of accidental causes or because of any illness, they die in the early ages.
This also forms a curve of negative skewness.
Why? Because my initial tail is smaller.
And most of the information is on the right.
Let's see one more example of daily basis, distribution of daily stock market returns.
It rarely happens that my returns on the stock market crashes, okay.
If my stock market crashes.
My returns will go negative.
But most of the time since my market is up.
I get positive returns.
So, even in that particular scenario, distribution of daily stock market returns is an example of negative skewness.
So, in the same way you can relate more examples from your daily life and understand it.
Now we have seen the skewness but what is its importance and why do we use it? Because it helps us find the distribution of patterns.
So basically, the most important requirement and importance of skewness is that this tells us the direction of the outlines.
In which way? Where my extreme values are there, where my tail would be higher, there my outliers would be more.
But it is also important that this doesn’t give me numbers, though it tells me the direction of outliers.
But it doesn't tell me the number of outliers.
Skewness plays a very important role in the field of data science.
When will we learn ahead that how exploratory data analysis is done, feature selection, feature extraction? In all these methods we make a lot of use of skewness.
To deal with skewness we will use different transform techniques like power transform, log transformation, exponential transformation, because of which we convert positive and negative skewness in normal distributions.
So this completely covers the idea of skewness.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than