Interested in Personalized Training with Job Assistance? Know More

Statistics For Data Science Course in English > Hypothesis Testing

Hypothesis Testing in Data Science

14.1k

Start a new search

To find content from modules and lessons

Overview

In this module we will see what is hypothesis testing and why do we use hypothesis testing in our data science field.

So, first of all, before understanding hypothesis testing, we will see what is hypothesis? When we perform any descriptive or inferential analysis on our population or sample data, we get a certain information from there.

We get some claims about the entire population.

Those claims or those assumptions, we call those claims as hypothesis, which means if you see any example, like we have seen one example where I wanted to know the time taken by an employee to go from home to office which we had called as the mean commute time.

So, if we want we can call the entire population data as 35 minutes.

My entire population’s, whoever goes to the office, mean compute time is 35 minutes.

So, this was my claim, it was my assumption which we had found from the inferential statistics.

So, this claim is called one hypothesis.

And we are not sure about this claim 100%.

To make this claim, 100% sure, we perform hypothesis testing.

So, what is hypothesis testing? Sometimes we have a starting assumption, like we've seen in the last example that with central limit theorem, we were finding population mean but, in few cases, suppose I already have a population mean, we have assumed that what is the population mean but to confirm that population mean which means confirming my claim we perform few types of tests which we call this hypothesis testing.

Major goal of hypothesis testing or the main purpose is that whether we have enough data or not to prove our claim.

Lets understand the hypothesis testing through one example.

There is one food manufacturing company called Nestle, which makes Maggi Noodles.

Few days back then, they were in news, where it was claimed that in Maggi there are few unwanted materials, like lead or some products which are unsafe for consumption and which are harmful for the health.

So, now what did the government do? They sent some of its agencies.

So, what did the governments agencies do is they sent few of their agents in certain markets to pick Maggi products and perform tests on them.

Now, the government claims that in one ideal Maggi less than 2.5 ppm, which means parts per million, should be the lead content.

Which means if they want to sell Maggi in the market, government has told the Nestle people that there should be less than 2.5 PPM lead in it.

If it is more than that, then it is harmful and there will be legal action taken on this.

So, this is what the government said, the average lead content should be less than 2.5 ppm.

So, this is my claim, which means the government has created one hypothesis.

Now to find out about how much lead is exactly there in my Maggi, to know that we perform hypothesis testing.

How do we do that? We will see further.

But the basic idea is that if there is more than 1 million Maggi produced every day then we cannot check every product.

In that case we have to create a certain sample from the population.

And on that we have to perform the hypothesis testing.

So, in this way, by using different test and different ways we can use our hypothesis testing.

Now we had seen in the last module what is inferential statistics.

In this module we are going to see what is hypothesis testing? Now what is the relation between these two or what is the difference between them, let's see them once.

When is inferential statistics used? When I wanted inferences about my population.

Then we created samples from the population, we created its sampling data, we created sampling distribution.

And based on that, we drew our results like population mean and other different things using confidence interval.

When do we use hypothesis testing, suppose when I have an assumption of population mean, I already have got population mean through any method, maybe with the inferential statistics, the population mean that I have calculated, I want to confirm that is that 100% claim or not? Which means collecting so much of evidence which I can say with guarantee that my test which is performed, which is my sample that represents my population.

So, to prove our claim 100%, we use hypothesis testing.

So, we use few steps in hypothesis testing, which are as follows.

First of all before performing any test, we will have to create one null and one alternative hypothesis.

Second, the hypothesis which we have created, we perform few tests on them.

What are those tests? Like T test, Z score test, anova test, We will learn about them in detail in future.

Even what is null and alternate hypothesis, in which ways do we form them, that we will see in the futures chapters.

After completing both the steps, we have performed the test, we have created null hypothesis and alternate hypothesis.

We perform the tests, after that in the third step, we take one decision whether I have to reject my null hypothesis or if I am failing to reject my null hypothesis.

How will we do it, all that we will see in detail going ahead.

Lastly, we will prove our findings on the basis of conclusions.

So, by using different steps we will how hypothesis testing is done.

If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.

In this way, you can connect with other learners like you and you can discuss with them.

See More

Learner's Ratings

4.7

Overall Rating

83%
0%
17%
0%
0%

Reviews

A

Abhishek Srivastava

5

Awesome

V

Vrushali Kandesar

5

This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

K

Kesavaraman Balakrishnan

5

wow... Teaching and voice is good

P

Prashant Dadhania

5

Good Course

Recommended Courses

Free हिन्दी

Python Programming Course

257296

4.3 Enroll For Free

Free हिन्दी

Excel For Data Analysis

59246

3.6 Enroll For Free

Free हिन्दी

SQL For Data Analysis

22989

3.7 Enroll For Free

Free हिन्दी

Complete Machine Learning Course

22679

4.3 Enroll For Free

Course Content

Introduction to Statistics For Data Analysis and Data Science

Fundamentals of Statistics

Basics of Descriptive Statistics

Measures of Frequency

Measures of Central Tendency - Mean

Measures of Central Tendency - Median

Measures of Central Tendency - Mode

Measures of Variability - Mean Absolute Deviation

Measures of Variability - Variance

Measures of Variability - Standard Deviation

Measures of Position

Measures of Shape - Skewness

Measures of Shape - Kurtosis, Box and Whisker Plot

Assignment : Descriptive Statistics

Basics of Inferential Statistics

Introduction to Probability

Basics of Probability

Probability Distribution

Discrete Probability Distribution

Continuous Probability Distribution

Uniform Distribution

Normal Distribution

Standard Normal Distribution

Log Normal Distribution

Exponential Distribution

Methods to Detect Outliers

Methods to Treat Outliers in Python

Feature Scaling - Normalization V/S Standardization

Sampling Methods

Sampling Distribution

Central Limit Theorem

Assignment : Inferential Statistics

Hypothesis Testing in Data Science

Null and Alternative Hypothesis

Making a Decision - Reject or Fail to Reject Null Hypothesis

Type I & Type II Errors

Covariance and Correlation Coefficients

Types of Correlation Coefficients

Assignment : Hypothesis Testing

Types of Statistical Tests

Z-Test: Critical Value Method

Examples of Z-Test: Critical Value Method

Z-Test: P-Value Method

Examples of Z-Test: P-Value Method

T-Test: One Sample Mean Test

T-Test: Paired Two Sample Mean Test

T-Test : Unpaired Two-Sample Mean Test

Two Sample Proportion Test

Chi-Square Test: Independence Test

Chi-Square Test: Goodness of Fit

ANOVA Test or F-Test

Assignment : Types of Hypothesis Testing

Basics of Linear Regression Analysis

Assumptions of Linear Regression

Multiple Linear Regression Analysis

Course Summary

Interview Questions

Career Guidelines

Enroll For Free

Statistics For Data Science Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹1249/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate