In today's module we will learn, what is actually descriptive statistics.
So, the two types of statistics that we had seen, from those we will focus more on why it is important to learn descriptive statistics.
After that we will see the difference between Univariate, Bivariate and Multivariate descriptive statistics.
What are these, and we will read about different types of descriptive statistics.
So, let's start.
Like we had given the definition of descriptive statistics in the first chapter.
It is about, summarising and organising the data of the same group.
So, the particular data that I have, all the data that I have collected, we will summarise that particular data and organise it.
Through what? Through a certain number of Central Tendencies Measures or Dispersion Measures, about which we will discuss further.
We're going to discuss, in detail about mean, mode and median.
Now, what will be any data? It can be any collection of any responses or of observations, from a sample or from the entire population.
In any quantitative research, what we do first is, we collect the data, after that, we start performing our statistical analysis, so that, I can properly describe my response, like through which terms? Like the average of a number.
Okay.
Or maybe, I want a relation between any variables.
Let's say between age and creativity.
So, we will see all these things in detail going ahead.
Let's consider one example to understand descriptive statistics.
Suppose, I have conducted one study in which, based on different genders, I asked the people what all do they like to do in their spare time, which all activities? Let's say there are activities in it, like some like to draw, some like to watch series on Netflix, some like scrolling through Instagram.
So, we conducted one survey where we asked the participants how many times in the previous year they have performed these activities.
So, in this particular scenario, my particular data set would be the responses, that I have received from the survey.
Now, we can perform our descriptive analysis on this particular data set, through which we can know the overall frequency of every particular activity.
We can take an average of every activity or we can even get to know the spread of responses which is called a Variability Measure.
So, this is a particular example, through which you must have understood how we perform descriptive statistics, and why we perform it.
Now we will see, different types of descriptive statistics.
Suppose let's imagine that, I have one particular variable with me.
Okay.
Suppose, I have a variable of watching a series on Netflix.
Now what I did with that particular variable is, that if I wish to focus on this and describe it through different measures, like mean, median, mode.
All these things we will learn in detail, in types of descriptive statistics, what are the definitions of these things and where can we use them.
But the main goal of Univariate Descriptive Statistics is, to describe and examine every variable separately.
In which, we can easily use the tools like Excel and Python very easily and perform our univariant descriptive statistics.
Next is Bivariate descriptive statistics.
“Bi” (pronunciation: bai) like the name suggests it means two.
Suppose, you have such a kind of data, where you have to find a relationship between two variables.
Okay, so in that particular scenario, we can perform bivariate statistics where we have to understand the relationship between any two variables.
Whether they are related to each other or not.
Okay.
Let's consider this particular example which we had just seen.
Let's say, we want to see a relationship, how many people do painting and the same people watch the series on Netflix.
So, to relate those two, I have drawn a particular scatter plot.
What is a scatter plot? Scatter Plot means, I have described two variables, point by point on one plot so that I can see one relationship over there, that how many percent are those two variables related to each other.
Coming to the third descriptive statistics, which we call multivariate descriptive statistics.
Like you saw univariate descriptive statistics, means on one variable I have to do my data analysis, bivariate means on two variables I have to do my analysis, multivariate means, if I have more than two variables now in those, I have to see how much is the relationship between those, I can easily see this visually.
Suppose let's take an example, that we figured out top smartphone ratings.
I have one data where there are top smartphone ratings.
What can those ratings be? Bad, average, and good, my ratings are of these three types.
Now, by their models and features… I have three particular variables.
Let's say one is their model.
Suppose it's a Samsung Galaxy phone or Apple's iPhone or Nokia Lumia.
These would be my different models.
The second feature that I have with me is, what all things are there in that phone which means, how good is the camera? Which phone has a better battery backup, which is the cheapest phone of them all.
Okay, and my third variable is our ratings.
And the particular ratings that we have are bad, average, good.
So, this is the particular table that I have, I want to see the relationship within these three variables, which phone from these would be best for me.
It should be cheap; it should have a good battery backup.
I even want a good camera.
The processor should also be good.
So, I would be wanting a good quality phone.
A phone with a good rating.
So, for that we have different types of charts and graphs like to describe one multivariate analysis, we have a heat map.
In which I have kept model versus features.
Like the Apple iPhone or Nokia Lumia, I have kept them on the Y axis.
And their features like screen size, price, battery backup, all these I have kept on X axis.
Now you must be thinking that this would be a 2d graph.
How will we be able to relate the third variable on it.
But in heat, there is one advantage that is my third variable which is rating, we can describe it through colour coding in our chart, and with that we get to know what is the particular rating.
Like you can see in this particular graph, here are all the things that I'm able to see in green, those are my good ratings.
Okay.
in yellow are my average ratings which are between eight and seven that I can see and good ratings means the ratings above nine.
In this way we can do multivariate analysis and describe our data.
Coming to the next topic, which is types of descriptive statistics.
Sometime back, you must have heard me saying it continuously that this is frequency, this is the mean, this is the median.
Now we will see, in which ways, we can divide the descriptive statistics.
So on the basis of properties, whatever is my data based on its properties, we will divide our descriptive statistics in five categories.
First, measures of frequency, second, measures of central tendency, third, measures of dispersion, or we even call it variation, fourth, measures of position, and fifth, measures of shape.
So, first of all, we will see measures of frequency, like its name defines, frequency means, how many times a certain thing is repeating itself, basically frequency is a type of count of different outcomes in a data set.
I have a particular data set, in that, how many times any value is getting repeated.
We call that as frequency.
In this module, we'll learn grouped frequency distribution, and ungrouped frequency distribution.
Second, is my measures of central tendency.
Central Tendency means, any such value which can give me an estimation of centre or average.
We use it when we generally want average, in any indicative response and the three different measures of it are mean, median and mode.
So basically, we use measures of central tendency when we want any particular centre or average in our data.
Third comes, measures of dispersion or variation, now what is there in it? Like you can understand from the name, variation or dispersion.
It gives us a sense of how spread over my data is.
How much deviated it is from its actual value.
How much variation is there in it? Basically, this reflects different aspects of the spread, and in this video, we'll learn ahead variance standard deviation, range and absolute deviation from mean.
So, all these four topics will be discussed in detail in the coming chapters.
Fourth measure is, measures of position.
Position like the name describes, if we want to know how much my score is related to each other.
If we have to compare, the normalised or standardised score, we use it at that time.
Now you don't worry about normalised and standardise.
We will cover these in detail in the coming chapters.
Measures of position are basically of two types.
In which we will learn percentiles, and the second that we will learn is quartile.
The fifth measure of our descriptive statistics is, measures of shape.
What does measures of shape basically do is, it defines a distribution or pattern, that is there in our data set.
And about distributions, like normal distribution, gaussian (pronunciation: gaws-ee-uhn) distribution, you must have read about it.
We will be covering them in our future chapters.
Measures of shape is divided in three parts, first is the skewness, other is kurtosis and third is box and whisker plot.
So, this is our different types of descriptive statistics, and we will see all these topics in detail in future, and we will also see how to implement them in Python Notebooks.
In today's chapter we covered some different types of descriptive statistics.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.
Share a personalized message with your friends.