Firstly, I want to congratulate you all on completing this course.
And then, we have learned a lot of such things in this course, which we have seen for the first time.
We have learned a lot of things.
Now, the knowledge that we have gained, now is the time, since our course is completed, how we can use this knowledge in the interviews.
How do we have to present ourselves in the interviews, how to impress the interviewer. We will see that in this module.
I have curated a few questions for you, which are asked a lot in the interviews.
So, even if you are going to the interviews of data science, or if you are going to give interviews related to statistics.
There are a few very important questions which are asked for sure and which you need to have the knowledge about.
I have made those questions, plus their answers.
How do you have to present it, in a well formulated way, how confidently we have to answer them.
We will cover all this in this interview model.
So, technically you know things. Now, in which way you have to answer them, in a very structured way. We will see that in a wrapped up way, we will see it in this module.
So, first of all I will tell you one question.
No matter, which statistics interview you go to, you will be asked this question.
This is the base of statistics.
This has been asked to me personally, in my interviews, so the first question that there is, you pay attention to that.
The first question is what is the central limit theorem? This topic is so vast in itself, this is such a huge topic, that generally this is asked in the interviews.
Now, how do we have to present it in front of the interviewer, let's see that.
First, we will bring the sampling distribution, which makes the base of the central limit theorem.
So, we will answer it in this way.
The sampling distribution, which is the distribution of sample means of population, which means, from one population, if I have taken out the sample data, and I have taken a mean of all those samples.
From that, the distribution that is made.
That forms my sample distribution.
We have discussed a lot about it in detail in the module.
So, I will just tell you that, in which way we will present it in front of the interviewer.
So, the sampling distribution has some interesting properties, which are collectively called the Central Limit Theorem.
So basically, we will say that, this property of sampling distribution is formed by my central limit theorem.
Which states that, no matter how the original population is distributed, the sample mean distribution will follow these three properties, which means, my population distribution can be of any shape, it can be a uniform curve or normal curve.
My sampling distribution will follow these three properties, which are in this way.
Sampling distribution mean is equal to, the population mean.
Second property is, sampling distribution standard deviation which is also called a standard error, is given by sigma upon under root N.
Sigma is my population standard deviation and N is my sample size.
Third property which is very important, that is for N greater than 30 the sample distribution becomes a normal distribution.
So, if you give the answer to any interviewer in this way.
He will understand all your explanations in a crisp way.
He will understand what you are trying to say.
Basically, what are those three properties, of sampling distributions that form the base of the central limit theorem.
So, before going to any interview you definitely revise it and if anytime you have a doubt in this module.
Then you can definitely refer to the inferential statistics module for revision purposes.
Now, my second most important question, that is asked a lot and which is also widely used, that is, what is the difference between type one and type two errors.
We have learned a lot about this, through many examples.
But I will tell you, how in a simple way, we have to present it to the interviewer.
I will tell you that.
So, first of all what we will do is, we will tell the definition of type one and type two errors to the interviewer, and then we will present it with an example.
We will conclude it with an example.
So, type one error occurs, when the true null hypothesis is rejected.
Which means, if any null hypothesis, which is good, but still if we reject it, then my type one error happens.
It is also known as false positive.
We tell the interviewer that we know this is also called a false positive.
Why? Because the null hypothesis that we have with us, we are rejecting it, when it is true.
Third is, the risk of committing this error is the significance level, which means, if there is any error in your analysis, or in your testing, then, it is denoted with alpha, which we call as the significance level.
Now, my type two error occurs, when we fail to reject a false null hypothesis, which means you have a null hypothesis which is wrong, but we are not able to reject it.
If you're not able to reject the wrong null hypothesis, then you also call it a false negative.
So, we will say, it is also known as false negative.
The third property is, the probability of type two error is denoted by beta.
So, we have explained the definition in a crisp and clear way to the interviewer.
Now, we will present this through an example, to our interviewer.
In which way? Let's say, in a criminal trial, the jury has to decide whether the defendant is innocent or guilty, which means you have to present that, if any defendant is there, who has committed a crime, that person is innocent or guilty, if there is a criminal trial that happens on him.
In that situation, what you will do is, you'll present one example, which we have given of a criminal trial, in that you will define type one and type two data.
My type one error is, if the defendant is innocent of the murder, but is still convicted and given the death penalty.
It means, defendant is innocent, but still you have proved him guilty, and you gave him the death penalty.
What happens because of it?
He has to go to the jail, and he was given the death penalty.
And this is very risky and it is wrong.
You have proved one innocent person guilty.
Type two error in this situation will be, if the defendant is guilty, but the jury accepts him as innocent, which means, the person who has to be behind the bars, you have left him free.
And this is a very difficult, and risky situation.
Because, it can be possible that the person you think is innocent, he can commit more murders in future.
In this way, you will simply tell your definition of what is type one and type two error, and then we will present our example, in front of the interviewer.
In this way, your interviewer will be extremely impressed with you, and he will think that you have a complete knowledge of this concept.
My third question is, how to convert normal distribution to standard normal distribution.
Now we have learned a lot about normal distribution, and also about standard normal distribution.
But, how we have to present it, in an easy language in front of the interviewer and in limited words that we will see.
Standard normal distribution also called Z distribution is a special normal distribution, which means, as you have to convert the normal distribution, into standard normal distribution, which means, you have to find a relation, as how we can convert. Then we will simply say that, if you make normal distribution’s mean Zero and standard deviation One.
Then we will say that, this distribution is getting converted into a standard normal distribution.
Or we can simply say, that if you standardise a normal distribution, by converting its values into Z scores.
Which means, if I use this formula, where I use Z score, which is x minus mew upon sigma.
Z is my individual’s value, and mew becomes my mean and sigma my standard deviation.
So, if you put values in this particular case. So we will simply get to know how my normal distribution get converted into a standard normal distribution, whose mean is 0 and standard deviation is 1.
And now we will come to our fourth question, which is asked a lot.
It is very important to have this knowledge in the practical sense.
The question is, what is the left skew distribution and the right skew distribution.
So, if you get such a question, where you have to tell the differences.
It is very important that, whatever difference we are telling the interviewer, we present it through an example.
So, if you are telling, what is the difference between left or right.
We will first of all start with its definition.
Which means, if the majority of the data is on the right, and less information is on the left, then it is called the left skewed distribution.
Which means most of my data lies on the right.
And less information is lying on the left.
So, that will be called the left skew distribution.
This is the reason, we also call it a negatively skewed distribution.
Right skewed, on the other hand,when my majority of the data lies on the left, and less information lies on the right.
Which means, when my less information lies on the right, and more information lies on the left, then we call it a positively skewed or right skewed distribution also.
In a simple way, you told the definition in the first point.
What we will be doing in the second point is, we will tell one way, that if you have any distribution, how we can simply see and tell.
So, my second point is when the tail on the left curve is bigger, than the tail on the right side.
Which means, the tail on the left side is, bigger compared to the right side.
That will be my left skewed distribution.
Opposite of it, when my tail is bigger on the right side, compared to the left side, than that will be my right skewed distribution.
You have covered two points.
Third comes, the very important point that, what is the relationship is mean, mode, median in both the cases.
So, if you are using a left skewed distribution then, your mean is less than median which is less than mode.
But if it is a right skewed distribution, then you can say that your mean is highest, then median and then mode.
So, in the three points, you have given a complete overview to the interviewer.
That he knows the things theoretically.
In the final point, you can present your one example, that in which ways any distribution will be called left skewed, and which distribution will be called as the right skewed.
So let's take an example, age of the death, which means the people who are in normal state, the average life cycle of any person is around 60 to 75 years, and there are very few such people initially because of some reason they do not survive.
So, this case forms a left skewed distribution.
Like, initially the value in left, its tail is bigger and the remaining values, most of the information lives between 60 to 75.
Okay, so this is my example of left skewed.
In the right skewed, what you can say is, that any distribution of an income.
If I consider that the average middle-class family, or the normal people, their income lies till a particular range.
Most of the people lie in that range.
But the millionaires or the ultra rich, like Elon Musk or Jeff Bezos, all these people who have a lot of money, like Ambani, these are very few people who got a lot of money and they formed the right skewed distribution, because their wealth lies more on the right side.
They act as an outlier.
So, in this way you have seen the left and right skewed definition, their relation plus with its example, easily you have told the interviewer, how you know this.
Now, we will see one more very important question, which is, when do we use a T-test versus a Z-test.
This is a question that is asked a lot.
Which, generally interviewers prefer to ask, to know how much knowledge you have in testing.
How will you answer this question, let's see it once.
So, generally what we do is, first we will tell about the Z-test, what it is.
Z-test is used for hypothesis testing in statistics, with a normal distribution.
Which means, where I have a normal distribution and I have to perform a normal distribution, then I will use the Z-test.
If you have to tell why T-test came into picture, then what we will do is, keep this graph in mind, remember, memorise it properly.
And simply if you say that, when my population parameters are unknown, which means you don’t know the population mean and standard deviation.
Plus, my sample size is less than 30.
In those situations, we use t-test. Which means whenever you have a small sample, we use t-test.
As soon as the sample gets bigger, greater than 30.
By default, because of the central limit theorem, it forms a normal distribution, and for normal distribution we use the z-test.
So, in this way we saw that, we can present different questions in front of the interviewer, and how in simple words, with very few words, we can present our topic in front of the interviewer.
In this module, we have covered a lot of things.
We discussed a lot of important questions, mostly if you feel that you're still not able to understand these things.
You see the videos again, and very easily you'll be able to understand that, anywhere if your concept is missing, then by constantly watching the video you will understand how these concepts are easy, and we can use it widely in data science field.
If you have any queries or comments then click the discussion button below the video and post there. This way you will be able to connect with the fellow learners and discuss the course. Also our team will try to solve your queries.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than