(16 sec, IVR)
Hello learners, welcome to Learnvern.
Come let's start today's first chapter, which is “Introduction to Statistics”.
In today’s module, we will learn what is statistics, what are different types of statistics, and what are different stages of statistics.
So, let's start, with the definition of statistics.
It is a branch of applied mathematics, that involves the collection, description, presentation, analysis and interpretation of the numerical data.
You're seeing different terms here like, collection of data, description of data.
We will learn about all these going ahead.
Before that I want, we must know what the different types of data are.
Data is found in different forms, like, structured data, and unstructured data.
Structured data, is that form of data which is available in, human understandable format.
Like you see tabular data in, Excel sheets.
In that, you have got rows in a structured way, and columns in a particular way.
Our phone records are also a type of structured data.
In the same way, unstructured data is something, which is not easily available in human understandable format.
We use so much audio and video files in our daily lives.
We use images and text files like emails, word documentations, PowerPoint presentations.
All these data are available in, unstructured form.
Now we will move into the different stages of statistics.
In which the first stage is, Collection of Data.
Collection of Data, as its name suggests, that we are collecting the data.
From where we will be collecting the data? We can collect the data through the surveys, we can collect the data available on the internet, in different structured or unstructured forms.
We can collect any type of data.
There is data present in the stock market, we can use that and perform our analysis.
So, this is the first stage where we collect the data, based on our uses.
Now, let's come to the second stage, which is organising the collected data.
What does this mean? This means that, the data that we have collected, we will convert that into a meaningful manner.
All the data, that is made easier to understand.
Means basically, all the collected data that we have taken, we will use it, with the purpose of easily understanding it, in a meaningful manner.
Then comes the third stage of statistics, which is presentation of the data.
Presentation, like the name suggests, we will be presenting the data in the form of graphs, diagrams or in the forms of tables, so that our data looks visually pleasing.
And we can justify it, in a simplified manner.
Now you see yourself, the data that we have with us, suppose, we have to interact with any business.
Let's say, any data scientist has to interact with a business, you will normally tell him that I want the information from this data, I want these updates.
You will not get any results, but if you easily give them the reports, dashboards and charts, in this way, if in a simplified form when you will pitch in any business, anyone will be able to easily understand it and that data will be presented in a presentable format.
Next comes the fourth stage of statistics, which is analysis of data.
What do you mean by analysis? In the coming modules or if you want to go ahead in the field of data sciences, then analysis is such an important term which you will be using continuously.
So, analysis of the data means that, whatever data has been collected with you, you will apply different methods of statistics on it.
Like your centre tendencies majors, mean, mode, median.
We will study about all these going ahead.
We will be applying different types of correlations, and different integrations. We will be using all these methods of statistics, to provide a sense to our data so that, our data provides us with a particular result which is meaningful, at the end.
So, we call this process an analysis.
Now Interpretation of the data, interpretation of the data, simply means that whatever particular data has come to you.
You are using the data, to interpret it in a usable format, to present it in a result-oriented form.
Suppose we have weather data.
We have weather data from different different localities.
We have to predict whether it will rain tomorrow or not.
What you will do first is you will collect the data.
You can collect the data from the government sites, or you can collect it through different surveys.
Then you will organise that data in a meaningful form, because let's say the particular data that you have received that will not be clear, it will not be a clean data.
You will provide meaningful data, by applying different analysis on it, and make it in a presentable form.
We will apply different machine learning data science algorithms, and we will be converting it into an interpretable result.
We will be seeing all these details going ahead in the chapter…
Okay after this.
Let's see different types of statistics.
Basically, there are two types of statistics.
One is descriptive, and the other one is the inferential statistics, but before learning them in depth, I want to concentrate on two very important terms of statistics which are population and sample.
We will concentrate on them, and learn about them in depth.
What is population, like its name suggests, population is the entire group, that you want to draw conclusions about.
What does this mean? This means that whichever complete data is available with us, we will call the complete group as the population.
Like, if you take the different countries in the entire world, all the countries in our world, constitute as a population.
If I say that, I want different industries. It can be the textile industry, it can be the finance industry, it can be the marketing industry.
All those together make up my population, which is the industries as a group.
Now coming to the sample, the sample is the specific group, that you will collect data from.
Which means, you have a population.
From that population, you have taken a specific group, and you are doing your analysis on that specific group.
We call that particular small group as a sample.
For example, if I have a population, which is all residents of the country, which means all the people staying in the country are my total population.
From that, if I choose the people who live below the poverty line, those people, will make my sample, because they represent a group which is specific, they are below the poverty line.
In the same way, it is important for us to know the difference between the two, and always keep in mind, that the sample size will be smaller than the population size.
Because we take samples from the population.
Now, it must be coming to your mind, why do we take samples when we already have a population.
But sometimes it so happens that, we are not able to reach all the data points in the population, or the population in itself, becomes such a big data set that we cannot continue our study on it.
For that purpose, what we do is from our population we pick different samples, which gives basic representation to the total population, and then we perform our analysis on that particular sample.
Okay, now that we know the difference between the population and sample, now we will see the different types of statistics…
Let’s come to a descriptive statistic.
Descriptive statistics means, like its name suggests, we describe the data.
We describe it in which form? Basically, if we summarise or organise the data of the same group, you pay attention to these four words, “of the same group”.
Now, through the different techniques like mean, mode, median, through all these techniques, we describe or summarise our data in a particular form.
But, it should be of the same group.
Like you imagine that I have 300 kid’s data, of one particular class.
I have data of 300 kids.
Those 300, represent one group.
It can be possible that, I have their marks, I have their names, I have the age of those kids.
So, now we can describe different things in it.
It is possible that, in physics, these many percent of kids have scored the highest marks. It is possible that someone might ask us to describe the mean age of this particular class.
So, in this way, if you do the analysis on one particular data, and you describe it, that is what we call as descriptive statistics.
There is one more definition.
Descriptive Statistics, basically describes properties of sample and population.
Pay attention to this thing!
This describes properties of sample and population.
Going ahead, we will see the difference, whether inferential statistics, describes properties of sample and population, or what is that thing, inferential statistics.
Now you have collected your data, with the descriptive analysis and you have properly summarised it, but if you want to infer any of your results on that data.
Infer means, you have to portray or describe any of your results.
We call that as inferential statistics.
This uses properties of sample and population to test hypotheses, and then draw conclusions.
You must have paid attention; descriptive statistics describe those properties of the sample and of the population, but inferential statistics use those properties.
The properties that you have described, it used those properties, so that we can draw certain hypotheses and can create certain conclusions.
We will see in detail, about this in the coming chapters.
We will see inferential statistics through one example.
Suppose, we have taken the total population of five lakh people, and we are doing a survey on the number of two wheelers in the city.
Which means, if in my city I am doing a survey on the total number of 2-wheelers in a city.
And I have a total population of five lakh people, in that particular city.
Suppose, from that population if I take 1000 people, who represent my total population as a sample data.
I did my analysis on that sample data.
And I got one result that, 800 people out of 1000 use 2-wheelers.
Which means 80% of the people are using 2-wheelers.
So, I can infer that, it's a city of five lakh population then four lakh people, because 80% would be its four lakhs.
Four lakh people are using 2-wheelers over there.
So, in this way we create samples from the population, and we apply some analysis on those samples, and we infer our results.
You must have noticed that, the sample data with us is small.
It won't be time consuming to do analysis on that small data.
We will save cost on it, and we can easily do the analysis, and we can use that result on our inferential statistics, which is the total population.
Congratulations everyone.
You have successfully completed your first module.
We will tell you the summary of the module and what all we have learned.
First of all, we saw in unit one that what is Statistic, what exactly is statistics? What are its different stages like collection of data, organising of data, presenting, analysing and interpreting of data. Next, we saw what are the different types of statistics, in which we defined descriptive and inferential statistics.
Next, we saw what is the difference between sample and population and what is their importance.
After that in the second unit, we saw what is the importance of statistics, and how many fields it is used in? Which all are the application areas? All these things we had covered in unit two, in which we saw from healthcare, data science, information technology, banking and finance, in all big fields our statistics is used.
Lastly, we saw that we have to learn statistics for data science in which we got to know that, as important as machine learning and python programming language, it is equally important to learn statistics to become a good data scientist in the field of data science.
In the next module we will be seeing what is Descriptive Statistics? Thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.
Share a personalized message with your friends.