Interested in Personalized Training with Job Assistance? Know More

Statistics For Data Science Course in English > Inferential Statistics

Central Limit Theorem

14.1k

Start a new search

To find content from modules and lessons

Overview

In the last unit we had seen how we can derive the central limit theorem by using sampling distribution properties.

So, in that there are three important properties like my sampling distribution’s mean mew x bar is equivalent to my population mean, which means mew.

Which sampling distributions standard deviation, which we call as Standard Error.

This is population standard deviation divided by sample size, which means it will be equivalent to sigma upon under root n.

Third important property is, if my sample size is greater than 30 then my sampling distribution will be converted into one normal distribution and we can use the property of normal distribution we can perform the analysis of central limit theorem.

Now, let's see where all and why central limit theorem is used.

So, we know that to find the characteristics of any population, we create one sample, we create more such samples so that if we gather the data of our individual values, then we will take time, that is a time taking tasks.

But if we take sample’s data and after finding its characteristics, we infer the population data so that becomes a time saving task for us and then we can use the central limit theorem properties and we can perform our analysis.

One very important role that a central limit theorem plays is if population mean is not given to me, I have only been given the sample’s mean and the sample’s standard deviation then I can easily calculate any population’s mean.

Let's see it through one example that in which way by using the central limit theorem calculate the population’s mean.

Let's imagine that I have 30 thousand employees in the office.

Those 30,000 employees commute daily from home to office.

So, how much average time do they take to go to their office from home, that we can call as mean commute time, which means any whatever time any employee takes to go from home to office and from office to home, we have to find the average time of every employee in that.

So, basically this 30,000 is my population and I have to find population mean over here.

Now, in this situation I cannot gather the data of all 30,000 employees.

So, what we will do is we will take different samples, let's take one sample suppose of 100 employees, which means from 30,000 employees I will randomly select any 100 employees, I will calculate their mean and I will create their distribution.

The sample’s mean that I get after completing the activity.

Let's say x bar comes as 36.6 minutes.

And the sample’s standard deviation, which we denote as S, it is coming to 10 minutes.

So, this means that we have got sample mean and sample’s standard deviation and, in this case, we have to find population’s mean.

In such situations if you pay attention, we can apply central limit theorem.

In which way? The first property of central limit theorem which I know is mean of any sampling distributions is equivalent to my population.

So, what we have to basically find? Population mean, means by using the other properties of central limit theorem we can find population mean.

Second, my standard error, basically sampling distribution’s standard deviation is there, we can find it through sigma upon under root n.

But since here our population’s standard deviation is not given, so, I can make this formula as sample’s standard deviation, under root by sample size.

So, here my sample size is 100 and S’s value is 10.

So, with this my standard error comes to be one.

Since my n’s value is 100 which means, I have chosen the sample size of 100 which is greater than 30.

So, we can say that my sampling distribution is one normal distribution and by using different properties of normal distribution we can find our population mean.

But what

is the case now, by directly using the sample mean we are inferring our

population mean but it is possible that the sampling techniques that we have

used in that some where some error is committed.

So, let's assume that that error that we have committed while making the sampling we will denote it with one term which is “margin of error”.

So, it is possible that I have calculated mean commute time of those 30,000 employees, it is absolutely possible that it might not be 36.6 minutes, there might be three minutes extra in it or it can be 36.6+1minute or 36.6+10 minutes.

In this way we put some margin of error, plus or minus, in the sample mean.

So, lets assume, suppose my margin of error is plus or minus 2 minutes.

Which means the population’s means that I have to calculate, mew, should be between 36.6-2 or 36.6+2.

So, in this particular scenario we will see how the normal distribution helps us.

You might remember 1, 2, 3 rule of normal distribution.

Which means any of my standard mean is 2 standard deviations back or ahead, then it shares, 95.4% probability, which means that much area under the curve is equivalent to 95.4%.

So, if I take a case if I see the probability of being between mew minus two and mew plus two.

So, when I see that 36.6 is my mean and with this means we go +2 standard deviation ahead or -2 standard deviation behind.

So, this will give me the probability which will be equal to 95.4%.

By using the same example if I say that is my population mean is between36.6 - 2 and 36.6 + 2.

So, we can easily use normal distribution property and say that those are 95.4% value.

Now in this situation we saw that I say that 95.4 % I am confident that this would be my probability of any population mean.

So, this is my confidence level which means, 95.4 is my confidence level.

Whatever maximum error that we have considered, we are calling it margin of error, like here plus, minus two minutes, more or less 2 minutes will be called my margin of error.

One

important thing, here the interval that was created, which means 36.6 minus 2

and 36.6 plus 2.

There's one interval that is created, which we call as confidence interval.

Basically, if we say that I am 95.4% confident that my population mean which will be of the commute time, will be between 34.6 and 38.6 minutes.

So, we are 95.4% sure with two minutes of margin of error.

So, by this way by different scenarios and use cases, we can use it through central limit theorem and estimate the population mean.

If I get some other examples, apart from this example, how we can use different formulas, we can find the values of our confidence level and confidence intervals.

Suppose, let's assume that I have a sample whose sample size is n, mean is x bar and the standard deviation is s.

Now, if I have to derive a confidence interval for y percent of confidence level.

So, in that particular case, the formula that I will use is x bar plus minus Z star multiplied by s upon under root n.

Here x bar is simply my sample’s mean.

Z star is a new term, which we will see what is that.

Z star basically a Z score which is associated with y person confidence level, which means I have different confidence levels, 90% 95% 99%, corresponding to that if in Z table I see one value then that value will be called as Z start.

For some common confidence level, I have used that star value which is as follows, if my confidence level is 90% which means I'm 90% confidence about my population mean then, I would place the Z star’s value as 1.65, for 95%, it is 1.964, for 99% it is 2.58.

We will find these values directly from the Z table.

Why? Because this is my Z score’s value.

Now, we can also call it… we have seen the formula of confidence interval.

In this the term that is there z star multiplied by s upon under root n, this basically denotes my margin of n, which means my population mean and sample my sample mean, there can be any error in it.

It is denoted by a particular margin of error.

If I see this particular error, which is a generalise case we have taken.

Suppose I have to find 90% of confidence interval for the commute time, which means the x bar that I have is of 36.6 minutes, s is 10 minutes which is sample’s standard deviation.

The sample size is 100 and I have 90% confidence level given for which I have to find the confidence interval.

So, here 90% confidence level means corresponding to that whatever Z score that we will there, it will be denoted by Z star and now we have seen in the table for 90% Z star value would be 1.65.

I have got all the values, if I fit them in this particular formula.

Then I can get a range over here.

That will be my population mean’s particular range that I'm 90% confident that my population mean is in between 34.95 minutes and 38.25 minutes.

So, basically this confidence level which is there it denotes that how confident you are while telling your result.

So, in this way this confidence interval and this confidence range will be extremely useful in the coming modules.

So, if you're working in any pharmaceutical company, which means in such a company where the medicines are made then there, we have immense need of confidence level and confidence interval.

Why? Because if I said that only 50% confident that this medicine will work but if I say that I'm 99% confident that my this medicine will work.

So, with that a lot of analysis gets affected, this we will cover in hypothesis testing, in which ways our mean calculation.

By using Central Limit Theorem we apply hypothesis testing and how do we see them.

So, in this module, we have covered how sample distribution, creates a base for central limit theorem.

In the coming modules we'll be covering what is hypothesis testing and how we use different types of tests.

If you have any comments or questions regarding this course.

If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.

In this way, you can connect with other learners like you and you can discuss with them.

See More

Learner's Ratings

4.7

Overall Rating

83%
0%
17%
0%
0%

Reviews

A

Abhishek Srivastava

5

Awesome

V

Vrushali Kandesar

5

This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

K

Kesavaraman Balakrishnan

5

wow... Teaching and voice is good

P

Prashant Dadhania

5

Good Course

Recommended Courses

Free हिन्दी

Python Programming Course

258467

4.3 Enroll For Free

Free हिन्दी

Excel For Data Analysis

59596

3.6 Enroll For Free

Free हिन्दी

SQL For Data Analysis

23163

3.7 Enroll For Free

Free हिन्दी

Complete Machine Learning Course

22878

4.3 Enroll For Free

Course Content

Introduction to Statistics For Data Analysis and Data Science

Fundamentals of Statistics

Basics of Descriptive Statistics

Measures of Frequency

Measures of Central Tendency - Mean

Measures of Central Tendency - Median

Measures of Central Tendency - Mode

Measures of Variability - Mean Absolute Deviation

Measures of Variability - Variance

Measures of Variability - Standard Deviation

Measures of Position

Measures of Shape - Skewness

Measures of Shape - Kurtosis, Box and Whisker Plot

Assignment : Descriptive Statistics

Basics of Inferential Statistics

Introduction to Probability

Basics of Probability

Probability Distribution

Discrete Probability Distribution

Continuous Probability Distribution

Uniform Distribution

Normal Distribution

Standard Normal Distribution

Log Normal Distribution

Exponential Distribution

Methods to Detect Outliers

Methods to Treat Outliers in Python

Feature Scaling - Normalization V/S Standardization

Sampling Methods

Sampling Distribution

Central Limit Theorem

Assignment : Inferential Statistics

Hypothesis Testing in Data Science

Null and Alternative Hypothesis

Making a Decision - Reject or Fail to Reject Null Hypothesis

Type I & Type II Errors

Covariance and Correlation Coefficients

Types of Correlation Coefficients

Assignment : Hypothesis Testing

Types of Statistical Tests

Z-Test: Critical Value Method

Examples of Z-Test: Critical Value Method

Z-Test: P-Value Method

Examples of Z-Test: P-Value Method

T-Test: One Sample Mean Test

T-Test: Paired Two Sample Mean Test

T-Test : Unpaired Two-Sample Mean Test

Two Sample Proportion Test

Chi-Square Test: Independence Test

Chi-Square Test: Goodness of Fit

ANOVA Test or F-Test

Assignment : Types of Hypothesis Testing

Basics of Linear Regression Analysis

Assumptions of Linear Regression

Multiple Linear Regression Analysis

Course Summary

Interview Questions

Career Guidelines

Enroll For Free

Statistics For Data Science Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹1249/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate