Interested in Personalized Training with Job Assistance? Know More

Statistics For Data Science Course in English > Inferential Statistics

Sampling Distribution

13.5k

Start a new search

To find content from modules and lessons

Overview

Before reaching central limit theorem, let's see once what are the different types of distributions.

From these we have already learned few distributions and we will learn newly about few distributions.

So, and we have 3 different types of distributions.

First is the population distribution, then sample distribution.

Third is sampling distribution.

Population distribution like its name suggest… what we had seen as the population, our entire group on which we have to draw the conclusion, we have to know about its characteristics, that is defined by population distribution.

We had seen its example, like I want the average of everyone's height in India.

So, in that particular case that will create the population distribution of entire India.

We can define mean and variance from population distribution.

We had already covered this in the last chapters, if I have x1, x2, x3, up to xn, ther is a population.

So, we will denote the population size with capital N.

What will be N, it will be number of total items in the population.

How we will find population mean? First, we will denote it with mew and its formula is simple like we calculate the average.

We find it in the same way.

Summation of 1 to N, xi upon total N, which is population size.

Population variance we denote with sigma square.

We define its formula as summation of 1 to capital N, defined by xi minus mew whole square upon capital N.

Next is

my sample distribution and sample distribution we had seen…what is sample? It

is simply a subset taken out from the population, which means it's a small part

which we take from the population.

Generally, we take a sample from the population so that we can create our inferences, if I take one example of it, when we had called all the people in India as the population.

If I select such few people from them, who are below poverty line.

So, in that case, that will become my sample.

Why? Because we have applied one type of sampling there, we have put a condition of below poverty line and we have created our sample.

Sample Distribution is also defined so that we can define sample mean and sample variance.

You can find it with that particular distribution.

But if I take the few samples form my population like x1, x2, x3, up to xn, here we will denote sample size with small n.

So, we do population with capital N and sample size we denote with small n.

Sample mean we denote with x bar like we calculate the mean, we use the same formula.

Sample variance we denoted by f square, The way we calculate population variance in the same way we will also calculate sample variance but if you remember from the previous chapters in the denominator we will be used n-1 instead of n.

Because of which the population data is denoted by the sample.

We just now saw population distribution and sample distribution.

Now, we

will see one very important distribution by which central limit theorem’s base

is created.

That is my sampling distribution.

Before understanding sampling distribution, we want that we will see from one example how we create our sampling distribution.

What really happens we will see it further.

Suppose I have a population of 100 people from that population someone tells me I want to select five random students.

What I did is I created one sample, which means in my first sample I randomly selected any five students.

After selecting those students, what we did is we created a mean of those five kids, we found a mean.

When I found the mean of the first samples.

So, suppose my mean comes to 497.

Now you already have one sample, now what we will do is, randomly we will select any five kids and recreate our second sample.

In this way, the second sample that has come, I calculated its mean as well.

Suppose that comes to 509.92.

In the way even for the third sample, I selected any 5 kids randomly.

My third sample’s mean comes to 506.02.

Even fourth sample I have created in the way.

I selected any random five kids from the population of 100 and I created their mean.

So here if you would see what we have done is, first of all from one population randomly we defined one sample size.

What would be my sample size over here? Five.

Why? Because in every sample, I am randomly selecting five kids.

So, the sample size is 5 and we have performed this event 5 times, which means I have created 5 samples.

By creating all those samples 4 times, I calculated its mean.

When I would plot every respective mean on one curve.

So, that would basically be my sampling distribution curve.

Like when I calculated the mean of sample one, I drew it on the graph.

Like you can see in it.

When I calculated the mean of the second sample.

I plotted that on the graph.

When I calculated for the third one, I drew that and calculated the fourth one, I drew that.

So basically, how do we create a sampling distribution? If I have a population, from that population I create different samples.

All those samples should have a fixed sample size.

We will calculate the mean of all those samples, whatever mean is given to us by the distribution that becomes my sampling distribution.

Like here the curve call that I have.

It is basically sampling distribution curve, which means on the x axis my means are plotted.

Of what? Of the samples.

So basically, when we plot different means of the sample, the curve that we get is called sampling distribution.

Here there is one thing to keep in mind.

Sample size and number of samples, these are two different terms.

Which means the example that we were seeing, 5 was my sample size.

Why? Because in every sample, we're only taking out five kids.

But the number of samples will four.

Why? Because I performed that event four times.

So now

we have seen want is basically sampling distribution.

Now we will see once if my sample size, which means if n’s value increases or decreases, how my sampling distribution gets affected? To understand it let's take such an example in which we'll take different sample sizes and in it the population will be defined.

So, suppose I have one population in which my total population is 0 and 1 numbers.

Which means from 0 to 1 whichever value comes that is defining my entire population.

numbers which means between zero to one, the values that are come in will define my entire population.

So, my half my value will be zero and half population would be one.

So, this means that the population mean that is there is 0.5.

So, what you do is if I take n is equal to one, which means form 0 and 1, I have to create just one sample.

So my total population mean becomes 0.5.

Now if I take n is equal to 5 values, so n is equal to 5 means between zero and one, we create five different parts like 0, 0.2, 0.4, 0.6, 0.8 and one.

So, for all these five values, we will calculate population mean and we will plot them.

Now if I take n is equal to 10, which means I'm constantly increasing my sample size.

So, by taking 10 between 0 to 1, we have divided it into 10 parts.

Like, 0, 0.1, 0.2 up to one.

We have calculated its population mean and we have plotted it.

Next, we have increased n’s value some more, we took n is equal to 20.

And we have calculated their respective population means and plotted them.

When we performed this experiment and we have drawn all their curves, the curves came in this way…basically what we are trying to do here is we are plotting the sample means, which means we want to see our sampling distribution, in which way it affects my curve if my n’s value keeps on increasing.

So, if we look at it in this way, in n is equal to one I have two possibilities zero or one.

So, that was my equal curve.

And both the graphs were in this way, which is shown in yellow.

Next, when my n’s value was increasing my curve was a little normally distributed.

Which means at the lower end and the upper ends, there my curve is taking smaller values and the values in the in the middle are increasing.

In the same way.

If you see the curve of n is equal to 10, which is shown with blue, same thing goes there, the values on the side, the corner values, those were shrinking, getting smaller and the centre values are increasing.

They're getting larger.

In the same way my n is equal to 20 curve… ideally if you see it will look like a bell curve to you.

So, from here we get to know a very interesting property of sampling distribution that is as big would be my sample size, that much sampling distribution curve would be normal or bell-shaped curve.

It is good for us that curve is a normal distribution so that we can use different normal distribution properties and perform our analysis.

So, now we saw a very important property in sampling distribution.

There is

another property which holds sampling distribution, that we can find standard

error of mean through it.

Now ther is no need to worry about standard error of mean.

This is basically the sampling distribution’s standard deviation, which means we had population standard deviation, we can use it and directly calculate standard deviation of sampling distribution.

We call it as standard error of the mean.

So simply if I have one population’s standard deviation and normally if I have one sample size, so the sampling distribution’s standard deviation that is there which we denote as sigma, x bar that becomes sigma upon under root and this is a very important formula.

It will be used in a lot of things going ahead.

So, we have to properly remember, what is it.

Sigma x bar is my sampling distribution’s standard deviation.

Sigma is population’s standard deviation and n is my sample size.

We call this particular formula as standard error of the mean.

There are many situations where it is possible that the population standard deviation might not know.

In that particular case what we do is we put sample’s standard deviation in place of sigma and we find out our sampling distribution’s standard deviation.

So

basically, we have seen two properties of sampling distribution.

The sampling distribution’s standard deviation that we have which we even call as standard error, that is sigma upon under root n.

We even denote it with SE.

One more important property of sampling distribution that we get is of its mean.

So, we have seen the standard deviation of sampling distribution.

Now, we will see what is mean.

Suppose, I already have the mean of the population which we have denoted with mew and I have to find mew x bar which means I have to find the mean of sampling distribution.

So, this is the interesting property that the sampling distribution holds.

So, mew x bar is equivalent to mew, which means if we have taken the mean of different samples and created one sampling distribution, then the mean of that sampling distribution that would be equivalent to my population mean.

So, we've seen two important properties over here.

One is how to calculate the standard error and what is the mean of sampling distributions.

We have seen sampling distributions mean and its standard deviation both the things but with me suppose, already have such a population which is normally distributed, which means if I have such a population which is normally distributed, like we can see in this curve.

And for different samples I have drawn sampling distribution curve.

Suppose for n is equal to 5, which means for different sample sizes I started to draw my curve.

When my n is equal to five, if my population is normally distributed, then in n is equal to 5 is am getting a normal distribution.

Now, if I have increased the n’s value from 5 to 30, even then when we will see the curve, that is normally distributed.

If I keep all these three curves one upon the other, the only difference which I will see is as and when my n’s value is reducing, in the same way my curve is getting flatter but my sampling distribution’s curve is normally distributed if my population distribution’s curve is normally distributed.

Sampling distribution’s very important property which will take us to the sample will limit theorem, it is that now we have seen about normal population distribution which means when my population distribution was normal, at that time even my sampling distribution was also normal.

But what happened was if my sampling distribution was normally distributed or not irrespective my population distribution, whether it is normally distribution or not.

My sampling distribution that will come, its curve would be normally distributed with at least n is equal to 30.

So, it means when my n’s value is minimum 30 at that time irrespective of population distribution, sampling distribution will follow a normal distribution.

Like we

will see here three examples over here.

In the first curve my population distribution is uniformly distributed, when I created the sampling distribution for n is equal to five, at that time it was a simple curve which is not looking normally distributed but I created a sampling distribution for n is equal to 30.

At that time my curve looked normally distributed.

In the second case I have any random population distribution curve.

When samples were created for n is equal to 5 and when the sampling distribution was seen.

It was looking like one random curve; I cannot define anything in it.

But when I plotted the same curve for n is equal to 30 and saw the sampling distribution at that time it is normally distributed.

The same in the third case, I have right skewed population distribution.

When the sampling distribution was n is equal to five then it was looking somewhat like normal distribution.

But as soon as my n’s value became 30 and the sampling distribution that came, it looked like normal distribution.

So, simply sampling distribution holds a very important property, which is my population distribution, whether it is normally distributed or not.

But as soon as my sample size is more than 30 or equal to 30 then the sampling distribution follows a normal distribution.

If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.

In this way, you can connect with other learners like you and you can discuss with them.

See More

Learner's Ratings

4.7

Overall Rating

83%
0%
17%
0%
0%

Reviews

A

Abhishek Srivastava

5

Awesome

V

Vrushali Kandesar

5

This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

K

Kesavaraman Balakrishnan

5

wow... Teaching and voice is good

P

Prashant Dadhania

5

Good Course

Recommended Courses

Free हिन्दी

Python Programming Course

245287

4.3 Enroll For Free

Free हिन्दी

Excel For Data Analysis

54848

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

20820

3.8 Enroll For Free

Free हिन्दी

Complete Machine Learning Course

20281

4.3 Enroll For Free

Course Content

Introduction to Statistics For Data Analysis and Data Science

Fundamentals of Statistics

Basics of Descriptive Statistics

Measures of Frequency

Measures of Central Tendency - Mean

Measures of Central Tendency - Median

Measures of Central Tendency - Mode

Measures of Variability - Mean Absolute Deviation

Measures of Variability - Variance

Measures of Variability - Standard Deviation

Measures of Position

Measures of Shape - Skewness

Measures of Shape - Kurtosis, Box and Whisker Plot

Assignment : Descriptive Statistics

Basics of Inferential Statistics

Introduction to Probability

Basics of Probability

Probability Distribution

Discrete Probability Distribution

Continuous Probability Distribution

Uniform Distribution

Normal Distribution

Standard Normal Distribution

Log Normal Distribution

Exponential Distribution

Methods to Detect Outliers

Methods to Treat Outliers in Python

Feature Scaling - Normalization V/S Standardization

Sampling Methods

Sampling Distribution

Central Limit Theorem

Assignment : Inferential Statistics

Hypothesis Testing in Data Science

Null and Alternative Hypothesis

Making a Decision - Reject or Fail to Reject Null Hypothesis

Type I & Type II Errors

Covariance and Correlation Coefficients

Types of Correlation Coefficients

Assignment : Hypothesis Testing

Types of Statistical Tests

Z-Test: Critical Value Method

Examples of Z-Test: Critical Value Method

Z-Test: P-Value Method

Examples of Z-Test: P-Value Method

T-Test: One Sample Mean Test

T-Test: Paired Two Sample Mean Test

T-Test : Unpaired Two-Sample Mean Test

Two Sample Proportion Test

Chi-Square Test: Independence Test

Chi-Square Test: Goodness of Fit

ANOVA Test or F-Test

Assignment : Types of Hypothesis Testing

Basics of Linear Regression Analysis

Assumptions of Linear Regression

Multiple Linear Regression Analysis

Course Summary

Interview Questions

Career Guidelines

Enroll For Free

Statistics For Data Science Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate