Kurtosis is such a measure which identifies whether my tails contain extreme values compared to a normal distribution.
It means kurtosis tells me about the heaviness of the tail.
Suppose my data is light tailed, okay.
There are not many heavy tails in my distribution.
Its simple meaning would be that there are no outliers.
If the data is heavily tailed it means that there are a lot of extreme values that are present and that there are a lot of outliers that are also present.
Like I have different curves with me, in which, in the green curve I don't have tails in it.
It means that this is an Outlier free graph.
Since my red and black curves are extreme values and extreme tails.
We can easily tell that there are outliers present in it.
We had seen the second and third central movement.
Now the fourth Central movement is called my kurtosis.
Why? Because you will see in its formula that fourth’s power comes and S will be called my standard deviation here, it will be called as Sample Standard Deviation.
The important thing that we just saw now.
We had seen skewness and now we are seeing kurtosis.
What is the major difference between the two? Skewness defines my symmetry.
It measures my symmetry of the distribution.
And Kurtosis explains the heaviness of the distribution tails.
We will take a particular use case of a real time scenario, where does kurtosis play a very important role? In finance, it impacts as a measure of financial risk.
What does this mean? If my kurtosis is large, if my k’s value is large, its simple meaning is that there is high risk for any investment.
Why? Because there might be high probabilities that extremely large or extremely small returns will be gained by me.
So, in that particular scenario all the banks or stock markets that are there, they plan their investments accordingly.
Suppose there is a small kurtosis.
Okay, so that is the indication that there is a moderate level of risk in that particular scenario.
Why? Because my probabilities of extreme returns are low.
So, we use kurtosis very extensively in finance.
My kurtosis is of three types.
Mesokurtic, leptokurtic and platykurtic.
We will explain these three next.
Mesokurtic, the ideal value of my kurtosis is 3 which is for a normal distribution.
Normal distribution is nothing but my symmetrical distribution.
In mesokurtic distribution then width and the breath is moderate.
And the medium peaked height of the curves is maintained.
What would be leptokurtic? Leptokurtic is one extreme curve in which there are heavy tails on either side.
This means that this is such a curve in which I have a large number of outliers.
Kurtosis is higher than that of a normal distribution in leptokurtic.
Simply, obviously because its peak is higher than my normal distribution, that's why we say that its kurtosis is more.
If my k is greater than three, then the data will be said to be in high kurtosis.
It simply means that most of my information is available in tails.
Leptokurtic Is that curve which gives us information about heavy risks in the investment.
Why? Because my extreme returns update, I get a leptokurtic curve.
Third is my platykurtic.
Platykurtic’s simple meaning is that it is that curve of the kurtosis which has flat tails and we have smaller outliers in the distribution.
In this curve of the kurtosis, kurtosis is lesser than the normal distribution.
Third property of platykurtic is if my k is less than 3 then it will be called low kurtosis.
And it will give less information in the tail.
This is the same situation in finance in which it is a desirable curve for the investors and we have a smaller probability of extreme returns.
So, we use this curve when we have to do safe investments.
Now comes box and whisker plots.
We call box and whisker plots as box plots.
This is such a type of a curve or a convenient way which we visually draw so that we can easily see quartiles in our data.
A box plot is nothing but it summarises the data with a five number summary.
And we have already seen those 5 numbers.
Now, what are they all? First is my lowest value which is the minimum value, then my q1 which is my 25th percentile, then median, which is my 50th percentile, q3 which is my 75th percentile and q4 or the highest value which is known as my maximum value.
So, suppose this is my particular curve.
The vertical lines that are there in this curve, q1, median and q3.
Those are highlighted plus the minimum and maximum values that are there.
The extreme values that are shown, we call them as whiskers.
The entire area that comes between q1 and q3 we call it inter quartile range.
That basically forms the box and the extreme values form the whisker.
That's why we also call it a box and whisker plot.
In the box plot the width simply tells us how much is the dispersion or is it even there.
If our width is small, then that means that there is lesser dispersion.
If my width is large, then it means more dispersion.
Box and whisker plot is a very important plot which we highly use in data science to tell us the direction of the skewness.
Why? How will we get to know about it? Suppose my box is closer to the right side.
This means that the box is closer which means most of the value is on the right side.
So, this is a negatively skewed distribution, like we can see in this curve.
If I see my negative skew distribution, its box plot means in extreme left the values lie.
Similarly, if my box is closer to the left side, then that shows that is to the positively skewed distribution.
Now what is the importance of the box plot? Now, let's give some importance to that.
There are few observations that we can easily find through box plots.
Like what is my average value, what is my median? What is my 25th percentile? We can easily find measures of central tendencies or majors of dispersions through boxplot.
We can get to know that whether there are outliers in my curves or our distribution, what are their values? We can make use of boxplot and get to know about this very easily.
Whether my data is symmetrical or not.
Box plots easily tell even about the skewness.
How well grouped is my data, how tightly packed it is or not even that we can get to know by simply looking at the boxes.
If my data is skewed then in which direction it is skewed.
All this we can get to know from the box plot.
So, box plot basically plays a very important role and when we see it practically it has a highly important role in data science as well.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than