Interested in Personalized Training with Job Assistance? Know More

Statistics For Data Science Course in English > Types of Hypothesis Testing

Two Sample Proportion Test

13.6k

Start a new search

To find content from modules and lessons

Overview

In this module we will see why is two sample proportion test is performed.

Many a times, the sample observations that we have it is in categorical form.

What is the meaning of categorical form? It is possible that whatever my responses of the data would be in the form of true or false, 1 or 0, Yes, no or male, female like categories.

It is this possible that data can be giving us the result in the form of success and failure.

So, in this situation where I have to perform my analysis from a particular total value on a proportion value.

So, in that case we use two sample proportion tests.

To understand this, let’s take an example.

Let's understand this with an exam.

So, suppose there is one researcher or there is a doctor he wants to find the effectiveness of any anti stress pill, which means there is a new medicine that he has made which relieves the stress.

So, to find its effectiveness, what he did is he compared his medicine with the already standard medicine.

So, those two will form our particular samples.

But in this how we have counted the proportions.

They have performed the test on total 32 people.

From which the result came that 15 out of 32 people who were taking the new medicines, they have reported the symptoms of stress but the people who were taking already standard medicines, they have reported that 20 out of 32 people, reported stress symptoms which means, the proportions that we have in the current scenario.

If we say p1 that becomes 15 upon 32 because from 32.

15 are such people, who even after taking the new medicine their stress is not relived.

Which means around 46% people.

If I find P2’s value.

There I get 20/32 which means 0.625, around 62% people are such people, who are taking normal standard tablets, even after taking those the stress is not relived.

So, in this situation after seeing the proportion we are feeling that may be P2 tables is a bigger failure.

But we have to figure it out through our analysis and through our test that in which ways form both the medicines which medicine is better, standard one or the new medicines that we have tested.

In this situation what will be our null and alternate hypothesis.

Null Hypothesis would be that both my proportions p1 and p2, both are equal.

Why? Because we are performing two sample proportion tests.

Alternate hypothesis would be p1 is not equals to p2.

So, in this way, we have created null and alternate hypothesis.

What would be the next step? We will find the test’s statistics with the help of a formula.

So, this particular formula is, since we have got two proportions.

So generally, we calculate one Z value in it, which has a formula in this way.

Here p1-p2, here the two p1 and p2 that are there.

That is sample 1 and sample 2 proportion ratio.

P* is my total samples combined proportion.

Which we will calculate by x1 plus and x2 upon n1+n2.

X1 and x2 are my observations in sample.

N1 and N2 are the sample size of 1^st and 2^nd sample.

When we put this value in the formula, then we will get one particular value which will be my test statistics.

Then we will compare it with our tabular form value and then we will come to a decision.

This is a general procedure, which we form in two sample proportion.

But let's see once, if we perform the same analysis in the Excel sheet.

Then how easy will this analysis be for us? You have seen the formulas that in two sample proportion test which formulas we use and in which ways the result comes.

Since its calculations will be more and we have already seen two to three tests in the old modules and even in the coming modules we will see them.

So, we will see it directly in Excel that in which way we can perform two sample proportion tests.

So, suppose our data was of 32 people.

The data that we have on the left, it is new medicine effectiveness.

Which means what was the effect of the new medicine and here my data is already in a categorical form.

which means people who had symptoms even after taking the tablet, the value is one and the people who didn't have symptoms, their value is zero.

In this way, I have got in total 32 people data.

In which sum is equal to 15 means that there are 15 such people who have not benefited by the tablet.

In the same way my second column is standard medicine effectiveness in which I have 32 values in total and there my sum is 20.

This means that there are 20 such people who still have symptoms even after taking the tablet.

In this case we have already calculated p1 and p2, which is 15 upon 32 will be my p1 and 20 upon 32 will be my p2.

So, what is null and alternate hypothesis in this case? P1 is equal to P2with the null hypothesis and p1 is not equal to p2, which means both the ratios are not same, that will be my alternate hypothesis.

But what happened in this we normally with data analysis we cannot perform our sample analysis test.

We have to download a separate module, which is, if you simply go on Google and click on excel stats download.

From there we get its free trial.

If you have a necessity of it, you simply go on this site where you have 14 days free.

Normally if you are a Window’s user, we will click on it which is our Excel stats.

If you're a Window’s user, then you can directly click here and if you're a Mac user then you can click on download XLSTAT for Mac.

You will simply get an executable file; you will run it.

So, when you run xl stats executable file, then you will get xl stats add in in your Excel sheet.

When you will click on Excel stats.

You will get a lot of options here.

Since we are performing hypothesis test, so we will simply click on to “test a hypothesis”.

We will go to parametric test and there we will find our test for 2 proportions.

When I clicked on test for two proportions, it asks me two things, one is frequency and the other is proportions.

So, what we will simply do is since we know the number of all the things, of both our variables.

So we will take the frequency.

In this case my frequency is 15, in the first sample, there are in total 15 such people who still have symptoms.

So, in this case what we will do is my frequency 1 is 50.

The total sample size of 1 is 32.

My frequency 2 in this case is 20 and the sample size two will be 32.

Here it is asking me for a range that in which range I want my values.

Simply if I select one random range, we want our values over here.

So, we have selected a range.

Then if we will go in the options.

In the options you will get alternate hypothesis which says proportion one and option two are not equal, which means that the difference of proportion one and proportion two is not equal.

And this is our alternate hypothesis that p1 and p2 are not equal or they are not same.

Our hypothesised difference is zero.

And the significance level that we are seeing is five.

As soon as we filled all these values we pressed okay.

Then we got our entire analysis in which we have test for two proportions.

In this the different values that you have put, frequency 1, frequency 2, total sample size of 1^st sample and the second sample.

Whatever was your hypothesised difference, that was 0, this significance level was 5.

In all these cases you've got a particular Z test proportion.

So, with this for a two tailed test, why two tailed tests? As we are considering is equal to two null hypothesis, so this becomes a two tailed test.

So, for the two proportion’s test we are performing this test, so we got all our analysis in one simple click.

In which you see that our alpha is 0.050.

The difference between the two means is still there.

Z observed value and critical value, if you compare both these values, you will get to know that my critical value is greater than the observed value.

Since my critical value is greater than the observed value, then whatever is our observation that lies in the acceptance region, this means that we fail to reject the null hypothesis.

So, with this, we simply got to know from a simple click, form a simple tool directly that in which ways we can perform two sample proportion tests.

So, when I got to know that we cannot reject our null hypothesis.

I can conclude that both my proportions are the same, which means the new tablet was not very effective.

So, in this way we saw how a two-sample proportion test is used in a formula and if you wish to calculate its value directly.

Then in this case how do we calculate the Z value that we saw completely.

Now one very important thing, you have studied two sample proportion test but it has a very basic and important application, that we use in the industry, that is my AB testing.

What is AB testing is, it is simply such a test, which uses two samples it uses their proportion and we widely use it in e Commerce Industry or we use it a lot in our promotional industries.

In which ways? let's let us consider that there is a website that we have.

On that website what we did is we created two different samples, two samples as in, on one website we kept “Buy Now” button and on one website we kept “Shop Now” button.

The rest of my site or page template are the same.

Now, what we do is AB testing? We have asked half the audience to use “Buy Now” feature and to the other 50% audience, we asked them to use “Shop Now” button.

Now, you will think that why are we dividing the thing between half people.

Its simple reason is that we perform AB testing in our population so that we can divide different samples in our population data, divide between people and take their feedback.

So, what happened by performing that test? Whoever was clicking on “Buy Now”, were more people compared to “Shop Now”.

What is all this? It is possible that by looking at “Buy Now” I will feel that it is easier for me to buy things rather than shop now.

With this analysis we got to know that 17% of my values, which means I got more click 17% more times when I used “Buy Now” on our site, compared to shop.

So, in this way different ecommerce sites, use this method, they use this testing.

This is a very widely used method.

And generally it is used or performed for improvement in good user experience.

Let's see one more example.

Suppose I have two different banners where you will see that on the left side.

First, I have a promotional banner, pre order and get xyz.

And below that I have a ticket booking option, that you can book your tickets.

What do I have on the right? Simply we remove the promotional banner and we kept only ticket booking option.

You won’t believe that when these tests were performed on different samples, then where there was no promotional banner there I got 42% of checkouts, which means 43% more time people choose the right option compared to the left side.

So simply, we perform the AB testing so that we can take two proportions of different samples and we can share it with the audience and take their feedbacks.

So that we can focus on the business.

This is entirely taken to test two different populations.

This is widely used in e-commerce website.

If they want to change the shape of the button or they want to change the various elements in their website.

So simply AB testing is highly used to improve the user experience.

So, in this way, this is a widely used application.

Simply if we consider of t test.

In this way we have covered different types of t tests.

If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.

In this way, you can connect with other learners like you and you can discuss with them.

See More

Learner's Ratings

4.7

Overall Rating

83%
0%
17%
0%
0%

Reviews

A

Abhishek Srivastava

5

Awesome

V

Vrushali Kandesar

5

This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

K

Kesavaraman Balakrishnan

5

wow... Teaching and voice is good

P

Prashant Dadhania

5

Good Course

Recommended Courses

Free हिन्दी

Python Programming Course

245923

4.3 Enroll For Free

Free हिन्दी

Excel For Data Analysis

55130

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

20958

3.8 Enroll For Free

Free हिन्दी

Complete Machine Learning Course

20436

4.3 Enroll For Free

Course Content

Introduction to Statistics For Data Analysis and Data Science

Fundamentals of Statistics

Basics of Descriptive Statistics

Measures of Frequency

Measures of Central Tendency - Mean

Measures of Central Tendency - Median

Measures of Central Tendency - Mode

Measures of Variability - Mean Absolute Deviation

Measures of Variability - Variance

Measures of Variability - Standard Deviation

Measures of Position

Measures of Shape - Skewness

Measures of Shape - Kurtosis, Box and Whisker Plot

Assignment : Descriptive Statistics

Basics of Inferential Statistics

Introduction to Probability

Basics of Probability

Probability Distribution

Discrete Probability Distribution

Continuous Probability Distribution

Uniform Distribution

Normal Distribution

Standard Normal Distribution

Log Normal Distribution

Exponential Distribution

Methods to Detect Outliers

Methods to Treat Outliers in Python

Feature Scaling - Normalization V/S Standardization

Sampling Methods

Sampling Distribution

Central Limit Theorem

Assignment : Inferential Statistics

Hypothesis Testing in Data Science

Null and Alternative Hypothesis

Making a Decision - Reject or Fail to Reject Null Hypothesis

Type I & Type II Errors

Covariance and Correlation Coefficients

Types of Correlation Coefficients

Assignment : Hypothesis Testing

Types of Statistical Tests

Z-Test: Critical Value Method

Examples of Z-Test: Critical Value Method

Z-Test: P-Value Method

Examples of Z-Test: P-Value Method

T-Test: One Sample Mean Test

T-Test: Paired Two Sample Mean Test

T-Test : Unpaired Two-Sample Mean Test

Two Sample Proportion Test

Chi-Square Test: Independence Test

Chi-Square Test: Goodness of Fit

ANOVA Test or F-Test

Assignment : Types of Hypothesis Testing

Basics of Linear Regression Analysis

Assumptions of Linear Regression

Multiple Linear Regression Analysis

Course Summary

Interview Questions

Career Guidelines

Enroll For Free

Statistics For Data Science Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate