Interested in Personalized Training with Job Assistance? Know More

Statistics For Data Science Course in English > Descriptive Statistics

Measures of Variability - Variance

14.1k

Start a new search

To find content from modules and lessons

Overview

00:03 - 0:17 (music)

Next, we have Variance.

You must have heard this term Variance a lot of times in your childhood.

Since childhood we have been learning variance and standard deviation.

So today we will see what exactly this means and how it is used in statistics.

Variance is a statistical measurement that tells us about the degree of spread.

What does the degree of a spread mean? It means how spread the data points are, in our entire distribution.

How is this basically calculated? It is calculated by taking the average of squared deviation from the mean.

First - mean: we know how to calculate it.

We take out the average of the squared deviation.

How to do it? We will be seeing it further.

A high variance indicates that the data points are widely spread.

A small variance indicates that the data points are closer to the mean of the data set.

What is the formula of variance, x minus x bar’s whole square divided by n and the summation of all these values? So, how do we calculate all these? Let's consider it through one data set.

Suppose, I have this particular given data set, what will be the first step to calculate let's find out the mean.

In this data set we will find the mean which is x bar. We have taken the sum of all the values and we have divided it by 6 because what is the total number of data points that I have? It is 6.

My Mean comes out as 50.

Next step is that I have to find each score’s deviation from the mean.

Just Like we did in mean absolute deviation.

But what did we do in that case? We have taken out deviations and converted them into absolute values.

Here we will not be converting them into absolute values.

Whatever value has come I will let it be as it is.

How do you have to take deviation from the mean? Whatever is your score, whatever is your value, you subtract it from the mean.

Whatever value comes you can note it down.

2:18

My third step here is, we will square every division So, basically if the first value that has come now is -4.

-4’s square? 16, 19 is my second value, its square is 361.

In the same way you will square all the values.

Next step is, we will add all the values of the squares that we have taken.

Whatever value has come, in the final step, we will divide it by n -1 or n.

Now, what is this n and n-1? This is basically sample size or population size.

We will see about it further.

In this particular scenario, as we had six samples, So, we will divide n-1 which is five, and our one variance will be calculated which comes out to be 177.2.

Now, we will see the major difference in population variance and sample variance.

When we collected the data of all the members from the population, in that particular scenario we will calculate population variance.

Now, with what the population variance is denoted? It is denoted with a sigma square, which is nothing but the summation of x minus meu square upon capital N.

Let's see the description of each term, sigma square is nothing but our population variance.

Summation is like sum of, X is my each data point whichever score’s value I had taken, those data points are there.

Mue is my population means and capital N is my number of values in the population.

Now, we will see sample variance.

Suppose you have collected data for a sample.

So, the sample variance that is there we use them basically to find the population variance.

Basically, infer population variance from our sample data.

Now the difference’s formula is in this way.

S square is equal to summation x minus small x bar square upon N minus one.

you must be watching both of these terms here that N minus one and N, and why was n minus one used here? We will know that right now.

S square here will be called as sample variance.

X would be my data point, x bar is my sample mean, n is my number of values in the sample.

Now let us know one very interesting thing.

4:57

Why did I use n-1 in the sample formula whereas in my population formula there was simply capital N.

Now if I consider the entire population, okay.

So, my result would be accurate.

Because I have complete data of the population.

But what happens in the case of a sample, we take a small fraction of the population on which we work which means my answers will not be accurate.

I will get such a kind of data which will be representing my population.

If I consider mathematically, n minus one is a smaller number than n. right?

When you divide any value with a smaller number, if suppose n minus one is a smaller number, if I divide any particular value with it, my result will be a larger number.

This is a mathematical concept you must be knowing.

Go through and revise this thing once.

If I divide any number with a smaller number then my result would be a larger number.

Which means that after dividing n minus 1, the sample that I will get will be a larger value.

What will happen with that, if I suppose I have a larger sample variance, it means that I have more chances, I have greater chance that it will capture my true population.

Which means I can create an unbiased sample estimate in that particular scenario because my population variance that I've estimated will be exact and it will not be biassed.

But suppose if I divide it by n and not by n minus one.

My sample variance would be biassed.

Okay.

As we are trying to reveal the information about the population.

How? By calculating the variance of a sample.

So, we don't want to underestimate the variance.

But if we divide it by n, we will be clearly under estimating it.

So, this is the reason we have to divide the sample formula from n-1.

So that we can get unbiased results in the long run.

If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.

See More

Learner's Ratings

4.7

Overall Rating

83%
0%
17%
0%
0%

Reviews

A

Abhishek Srivastava

5

Awesome

V

Vrushali Kandesar

5

This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

K

Kesavaraman Balakrishnan

5

wow... Teaching and voice is good

P

Prashant Dadhania

5

Good Course

Recommended Courses

Free हिन्दी

Python Programming Course

257301

4.3 Enroll For Free

Free हिन्दी

Excel For Data Analysis

59246

3.6 Enroll For Free

Free हिन्दी

SQL For Data Analysis

22991

3.7 Enroll For Free

Free हिन्दी

Complete Machine Learning Course

22680

4.3 Enroll For Free

Course Content

Introduction to Statistics For Data Analysis and Data Science

Fundamentals of Statistics

Basics of Descriptive Statistics

Measures of Frequency

Measures of Central Tendency - Mean

Measures of Central Tendency - Median

Measures of Central Tendency - Mode

Measures of Variability - Mean Absolute Deviation

Measures of Variability - Variance

Measures of Variability - Standard Deviation

Measures of Position

Measures of Shape - Skewness

Measures of Shape - Kurtosis, Box and Whisker Plot

Assignment : Descriptive Statistics

Basics of Inferential Statistics

Introduction to Probability

Basics of Probability

Probability Distribution

Discrete Probability Distribution

Continuous Probability Distribution

Uniform Distribution

Normal Distribution

Standard Normal Distribution

Log Normal Distribution

Exponential Distribution

Methods to Detect Outliers

Methods to Treat Outliers in Python

Feature Scaling - Normalization V/S Standardization

Sampling Methods

Sampling Distribution

Central Limit Theorem

Assignment : Inferential Statistics

Hypothesis Testing in Data Science

Null and Alternative Hypothesis

Making a Decision - Reject or Fail to Reject Null Hypothesis

Type I & Type II Errors

Covariance and Correlation Coefficients

Types of Correlation Coefficients

Assignment : Hypothesis Testing

Types of Statistical Tests

Z-Test: Critical Value Method

Examples of Z-Test: Critical Value Method

Z-Test: P-Value Method

Examples of Z-Test: P-Value Method

T-Test: One Sample Mean Test

T-Test: Paired Two Sample Mean Test

T-Test : Unpaired Two-Sample Mean Test

Two Sample Proportion Test

Chi-Square Test: Independence Test

Chi-Square Test: Goodness of Fit

ANOVA Test or F-Test

Assignment : Types of Hypothesis Testing

Basics of Linear Regression Analysis

Assumptions of Linear Regression

Multiple Linear Regression Analysis

Course Summary

Interview Questions

Career Guidelines

Enroll For Free

Statistics For Data Science Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹1249/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate