In this chapter we will be seeing one very important topic, which is used in statistics, about central limit theorem.
But before understanding central limit theorem, there are a few concepts which are very important to understand.
Why do we do sampling? What are the different types of sampling methods? What is sampling distribution and then we will cover that what is central limit theorem.
So, first of all we will see, why we need sampling and what exactly is sampling? So, we have seen till now that whenever we have to draw any inferences about our population, when we have to draw any conclusions then we take a small sample from this population.
And through that sample, we draw our conclusions.
Why do we generally do it? So that I save my time plus we cannot study the entire population, so we study its small samples and through that samples we define the characteristics of samples.
Now we have seen why we have to do sampling but we must also know that in which way we create the sample or in which way we can select the sample so that it is the representative of my entire group.
Which means the sample that I have selected should be of such a type that it should completely define my population.
So, we have two ways of sampling, on is probability sampling and the second is non probability sampling.
Now what is basically probability sampling, it is a random selection, which means if I have any particular population data and from that if I remove sample then I randomly select any value from it, So, probability sampling we generally use in a lot of statistical inferences.
Why? Because based on the random selection, the data that is selected, that is representative of our entire population.
We do Non probability sampling when we have to work as per our convivence.
Which means my main focus should be that I should easily collect the data without any efforts.
So, we do non random sampling in it, which means we will define any criteria or any such way which will be easy for us, we do non- probability sampling that way, but in this particular module, since we are learning inferential statistics, so we will see different types of probability sampling, in which ways we can do the sampling and then we will define it further for our central limit theorem.
So, we will once see the sampling methods that I have.
Probability sampling can be done in 4 ways.
First Simple Random Sampling, second systematic sampling, third is stratified sampling and fourth cluster sampling.
We will see them all in detail going ahead.
I again have four types of nonprobability sampling, convenience sampling, voluntary response sampling, purposive sampling and snowball sampling.
We will not cover nonprobability sampling in this unit, why? Because it helps us in collecting the data easily and not by understanding inferential statistics.
So, come let’s see different types of probability sampling.
My first type is simple random sampling.
Like we can see the name, this is the easiest method of sampling plus this is a widely used method.
When we try tom select a sample, we mostly use simple random sampling.
Its major characteristics is that if I have capital N population members.
Which mams I have an entire population that is denoted by capital N.
From that population every member’s chance of getting equally selected that I define through random sampling.
So, suppose let’s take an example, I have 1000 total employees in a company, which means here my total population which means capital N is 1000.
Somebody told me to select any 100 employees from them, the meaning of selecting 100 employees is that I want my sample, the small n of 100 value.
So, in this particular case, what we will do is from all the 100 employees, we will pick any random person and pick our 100 sample.
So, what will happen with this? Every member of population has an equal chance to get selected, there is no chance of biasing here.
And we use random sampling a lot.
Why? Because there are less chances of committing an error here.
And my sample is representative of the entire population.
Second, we will see stratified sampling.
Strat is basically called to a subgroup, which means suppose I have a population, from that population I made different small groups which we call as subgroup or strata.
How we can divide in those different groups? Either we can do on the basis of gender, on the basis of age or on the basis of income.
So, the different characteristic that are there, we divide out population in small subgroups.
And it is necessary to remember that these small groups that we have made should be non over lapping groups.
Which means if I have taken male and female from the population, then male and female are both non over lapping groups.
After that when those particular groups are created with me.
There we perform random sampling.
From those groups we randomly pick the people and create the sample.
For example, if I have 800 females in my company and 200 males.
And in that particular case someone told me that by maintaining the gender balance, which means I have to find out that this population of 1000 people in which there are 800 females and 200 males, I have to find out that the 100 people that I will select, that should be the way that it defines the population.
So, this means that I have divided my population in two groups based on the gender I've created one group of 800 females and one group of 200 men.
These are the two strata that we have created, on that we will perform random sampling, based on which from 800 females, we will remove 80 females and from 200 men, we will select 20 mails.
So, in this way the 100 sample that I will create will be defining the entire 100% population.
Which means it will be a representative.
This way we perform the stratified sampling.
Third is my systematic sampling.
Systematic sampling is very similar to random sampling.
But it becomes very easy for us because we give listing of numbers to the entire population.
Which means suppose if we have to do a survey in any mall.
We went to the mall and we doing Covid survey over there.
So, what we will do is we gave every person a random number, suppose I have 500 people on that particular moment in that mall.
So, I have numbered every person as 1, 2, 3, 4.
And I defined my systematic sampling equation in a way that if I pick a person on a regular interval which means I say that every 10th person I have to pick from my population.
So, that will be my systematic sampling.
Which means when on a regular interval when we create a sample from any population, that is my systematic sampling.
Like in this if I pick any 10th person then I will get 100 people’s samples very
And this is easy to do? Why? Because here I have to pick every 10th member without giving it a thought.
Next, my fourth sampling method is cluster sampling.
What do we do in cluster sampling? We divide our population in different subgroups.
But in these subgroups, there should be a property that all those subgroups should be similar.
What was stratified? Even in that we were defining in two subgroups but there both my subgroups or strata both were sharing different properties with each other.
What is there in cluster sampling, whichever is my group, whichever cluster we have defined, those should be same properties.
Now, instead of selecting individual members, which means instead of picking each person from a subgroup, what we do is we select the entire subgroup.
Suppose imagine that we have 10 offices of Learnvern in the entire country.
Whichever those 10 offices are, lets assuming that in those same number of employees work and they have almost similar roles.
Which means my 10 offices that are there of Learnvern, all of them function in the same way.
Which means they are the same employees.
Now if someone tells me that I have to collect the data of the office, so what we will do in that particular case.
We don’t have the capacity to go to each office and select every individual employee.
Since we know that all the officers have the same characteristics, so from those we will select any three offices, which means we will select three clusters and perform our sampling.
In this way cluster sampling is used a lot where we have large and dispersed population, which means such a population where you will take a lot of time and you are not easily able to collect your data from there.
But since we are not selecting an individual, we are selecting one entire cluster, so the risk factor increases here.
So, in this particular data there are more chances of committing an error.
Why? Because whichever samples we're taking, whichever clusters we are taking, we cannot guarantee that it represents the entire population.
So, in probability sampling methods we have covered 4 topics, from which widely used ones are simple random sampling and most of the sampling methods we use as per our discretion.
We use the sampling methods as per the use case.
So, now we covered different sampling methods.
If you have any comments or questions related to this course then you can click on the discussion button below this video and you can post them over there.
In this way, you can connect with other learners like you and you can discuss with them.
This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than