In this module we will see what is hypothesis testing and why do we use hypothesis testing in our data science field.
So, first of all, before understanding hypothesis testing, we will see what is hypothesis? When we perform any descriptive or inferential analysis on our population or sample data, we get a certain information from there.
We get some claims about the entire population.
Those claims or those assumptions, we call those claims as hypothesis, which means if you see any example, like we have seen one example where I wanted to know the time taken by an employee to go from home to office which we had called as the mean commute time.
So, if we want we can call the entire population data as 35 minutes.
My entire population’s, whoever goes to the office, mean compute time is 35 minutes.
So, this was my claim, it was my assumption which we had found from the inferential statistics.
So, this claim is called one hypothesis.
And we are not sure about this claim 100%.
To make this claim, 100% sure, we perform hypothesis testing.
So, what is hypothesis testing? Sometimes we have a starting assumption, like we've seen in the last example that with central limit theorem, we were finding population mean but, in few cases, suppose I already have a population mean, we have assumed that what is the population mean but to confirm that population mean which means confirming my claim we perform few types of tests which we call this hypothesis testing.
Major goal of hypothesis testing or the main purpose is that whether we have enough data or not to prove our claim.
Lets understand the hypothesis testing through one example.
There is one food manufacturing company called Nestle, which makes Maggi Noodles.
Few days back then, they were in news, where it was claimed that in Maggi there are few unwanted materials, like lead or some products which are unsafe for consumption and which are harmful for the health.
So, now what did the government do? They sent some of its agencies.
So, what did the governments agencies do is they sent few of their agents in certain markets to pick Maggi products and perform tests on them.
Now, the government claims that in one ideal Maggi less than 2.5 ppm, which means parts per million, should be the lead content.
Which means if they want to sell Maggi in the market, government has told the Nestle people that there should be less than 2.5 PPM lead in it.
If it is more than that, then it is harmful and there will be legal action taken on this.
So, this is what the government said, the average lead content should be less than 2.5 ppm.
So, this is my claim, which means the government has created one hypothesis.
Now to find out about how much lead is exactly there in my Maggi, to know that we perform hypothesis testing.
How do we do that? We will see further.
But the basic idea is that if there is more than 1 million Maggi produced every day then we cannot check every product.
In that case we have to create a certain sample from the population.
And on that we have to perform the hypothesis testing.
So, in this way, by using different test and different ways we can use our hypothesis testing.
Now we had seen in the last module what is inferential statistics.
In this module we are going to see what is hypothesis testing? Now what is the relation between these two or what is the difference between them, let's see them once.
When is inferential statistics used? When I wanted inferences about my population.
Then we created samples from the population, we created its sampling data, we created sampling distribution.
And based on that, we drew our results like population mean and other different things using confidence interval.
When do we use hypothesis testing, suppose when I have an assumption of population mean, I already have got population mean through any method, maybe with the inferential statistics, the population mean that I have calculated, I want to confirm that is that 100% claim or not? Which means collecting so much of evidence which I can say with guarantee that my test which is performed, which is my sample that represents my population.
So, to prove our claim 100%, we use hypothesis testing.
So, we use few steps in hypothesis testing, which are as follows.
First of all before performing any test, we will have to create one null and one alternative hypothesis.
Second, the hypothesis which we have created, we perform few tests on them.
What are those tests? Like T test, Z score test, anova test, We will learn about them in detail in future.
Even what is null and alternate hypothesis, in which ways do we form them, that we will see in the futures chapters.
After completing both the steps, we have performed the test, we have created null hypothesis and alternate hypothesis.
We perform the tests, after that in the third step, we take one decision whether I have to reject my null hypothesis or if I am failing to reject my null hypothesis.
How will we do it, all that we will see in detail going ahead.
Lastly, we will prove our findings on the basis of conclusions.
So, by using different steps we will how hypothesis testing is done.
If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.
In this way, you can connect with other learners like you and you can discuss with them.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than