Interested in Personalized Training with Job Assistance? Know More

Statistics For Data Science Course in English > Descriptive Statistics

Measures of Central Tendency - Mean

14.1k

Start a new search

To find content from modules and lessons

Overview

In this chapter, we will learn what are the measures of central tendency.

So, first we should know that what is central tendency, central tendency is information about the centre or middle part of a group of numbers.

So, suppose if I have a particular number group, in that I want such a value which will represent my entire data, that single value is basically my central tendency measure.

This is described in three ways, that is mean, median and mode.

We are going to learn about these three now.

Basically, we have studied a lot about them in the primary classes, but in which way it is used in data science, how important are these three measures? We will see it today.

Let's come first to the mean.

Mean is basically, an arithmetic mean which is nothing but the ratio of the sum of all the observations in the data to the total number of observations.

Suppose, I have a particular data of a particular factory, about the salary that is received by the entire staff in the factory, and I have the data of eight staff members.

Suppose if the first staff member gets a salary of 15,000 rupees, the second person gets 18,000, the third gets 16,000, fourth gets 14,000, fifth gets 15,000.

In the same way, the other three members also get the salary.

If I wish to calculate its mean, it means that I will sum up the entire salaries, I will divide it by 8.

Why 8? Because the total number of candidates that I have got are eight, with this my mean has come to 15 point 2 5.

What does that mean? This means that the average mean that I have received is 15,250 rupees of the entire measure, which is basically like representing the complete data set.

So, the mean is a number around which my entire data is spread out, that's the reason why we call it as measures of central tendency.

Now, we can have population mean, we can have sample mean.

Population mean, like suppose, you have an entire data that is representing the population with capital N number of elements, okay.

So, your mean will be represented by mew, which is nothing

but summation of all the data points divided by capital N.

In the same way, what does the sample mean? Suppose you have a small N number of elements in any particular sample.

So, your x bar would be, which is like the sample mean that will be represented with the sum of all the elements divided by, total number of elements in the sample which is small N.

Now, you will see one thing in these formulas, sample and population there is not much of a difference.

The only difference is the number of elements, which is a small N in case of sample mean and capital N in case of population mean.

So, you keep this in mind that we generally use sample mean, because we don’t have population data present at all the time.

So, average in daily life, is what we call a sample mean.

Next, before going ahead, I would like to tell you that the distributions that we have with us, you must have read a lot about distribution curves.

You must have read about normal distribution; you must have read about gaussian distribution.

We will see more in detail about them in the future chapters, in which the curves are formed, and what are all the different types of distributions but to understand mean, median and mode it is very important for us to understand what exactly is meant by symmetrical distribution, and what is an asymmetrical distribution.

So, basically symmetrical distribution means, whose mean lies in the centre, whose median, mode, all three things lie in the centre and are equal.

Around which my data on the left and my data on the right should be symmetrical.

But what can be an asymmetrical curve? It can be anything, it can be in the positive skewness and also in the negative skewness.

We will see this particular case going ahead in measures of shapes but for now it is important for us to understand that skew and asymmetrical data means when I don't have proper similar orientation to my left and to my right.

And my data is more on one of the sides, okay.

So, one most important thing which is used in statistics in machine learning, and we have a lot of practical application of that thing.

That is an outlier.

Now, what is an outlier? Outlier is basically an extreme or unusual value, which instead of representing our entire data set, it is completely a different value which can be a smaller or the larger one.

Okay, suppose, the old data set that I had of salary, if I would have seen its actual mean, I got the mean of 15 point 3 K in it.

Okay, but suppose in that particular data set, if I put the salary of two such staff members whose salary is supposed to be 90,000 and 95,000, okay.

So, that means now my mean salary for these 10 employees would be 30,000 instead of 15,000.

Okay.

So, now you can see that by just putting the salaries of two people, my entire data doubled up to 30,000.

This means that 90,000 and 95,000 were extreme values.

The major value range, in my distribution was between 12,000 to 18,000.

So, these values that I have added which were the extreme values, we call them as outliers.

Okay, to detect them in any distribution is very important.

Otherwise, the entire result of our analysis changes.

Now this means that, in this particular scenario my mean is skewed.

Skewed means like, we saw that our distribution became asymmetrical, because of two values my entire curve’s median was not equally distributed in the centre.

Because of all these effects, we use another central tendency measure which is the median, which we will see going ahead how the median nullifies the outlier’s effect, and how the mean is highly affected by our outliers the way we just saw.

Now, we will see what the advantages mean basically.

Now, we just saw that we were using mean on numerical data, correct? So, we can use mean both on continuous and discrete numeric data.

We can use both the form of data, it can be a continuous one or it can be a discrete one.

What is the disadvantage? The disadvantage is that we cannot use it for categorical data.

Why? Because we cannot sum the values in one particular way.

You cannot sum any particular category, suppose I have women and men categories.

Suppose if I have these two categories, I will not be able to take an average of these two, because it is not in numerical form.

Also, the other major disadvantage which we saw was outliers’ effect, right? Because our mean, considers every value in one particular distribution.

This means that if there are outliers in it, those will also be considered, then it is not good for our data.

Also, this is influenced by the skewed distribution, which simply means if there are extreme values my data will be highly skewed, and it will not give the exact results.

If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.

(outro: 15 sec)

See More

Learner's Ratings

4.7

Overall Rating

83%
0%
17%
0%
0%

Reviews

A

Abhishek Srivastava

5

Awesome

V

Vrushali Kandesar

5

This course is really nice, just have one question in empirical rule explanation , SD deviation example trainer is saying mean however mean (20+30+40+50+60+70/6) value is different kindly confirm than

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

K

Kesavaraman Balakrishnan

5

wow... Teaching and voice is good

P

Prashant Dadhania

5

Good Course

Recommended Courses

Free हिन्दी

Python Programming Course

258710

4.3 Enroll For Free

Free हिन्दी

Excel For Data Analysis

59661

3.6 Enroll For Free

Free हिन्दी

SQL For Data Analysis

23197

3.8 Enroll For Free

Free हिन्दी

Complete Machine Learning Course

22924

4.3 Enroll For Free

Course Content

Introduction to Statistics For Data Analysis and Data Science

Fundamentals of Statistics

Basics of Descriptive Statistics

Measures of Frequency

Measures of Central Tendency - Mean

Measures of Central Tendency - Median

Measures of Central Tendency - Mode

Measures of Variability - Mean Absolute Deviation

Measures of Variability - Variance

Measures of Variability - Standard Deviation

Measures of Position

Measures of Shape - Skewness

Measures of Shape - Kurtosis, Box and Whisker Plot

Assignment : Descriptive Statistics

Basics of Inferential Statistics

Introduction to Probability

Basics of Probability

Probability Distribution

Discrete Probability Distribution

Continuous Probability Distribution

Uniform Distribution

Normal Distribution

Standard Normal Distribution

Log Normal Distribution

Exponential Distribution

Methods to Detect Outliers

Methods to Treat Outliers in Python

Feature Scaling - Normalization V/S Standardization

Sampling Methods

Sampling Distribution

Central Limit Theorem

Assignment : Inferential Statistics

Hypothesis Testing in Data Science

Null and Alternative Hypothesis

Making a Decision - Reject or Fail to Reject Null Hypothesis

Type I & Type II Errors

Covariance and Correlation Coefficients

Types of Correlation Coefficients

Assignment : Hypothesis Testing

Types of Statistical Tests

Z-Test: Critical Value Method

Examples of Z-Test: Critical Value Method

Z-Test: P-Value Method

Examples of Z-Test: P-Value Method

T-Test: One Sample Mean Test

T-Test: Paired Two Sample Mean Test

T-Test : Unpaired Two-Sample Mean Test

Two Sample Proportion Test

Chi-Square Test: Independence Test

Chi-Square Test: Goodness of Fit

ANOVA Test or F-Test

Assignment : Types of Hypothesis Testing

Basics of Linear Regression Analysis

Assumptions of Linear Regression

Multiple Linear Regression Analysis

Course Summary

Interview Questions

Career Guidelines

Enroll For Free

Statistics For Data Science Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹1249/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate