Hello,
I am (name) from LearnVern
Uptil now, in our session, we saw how to handle missing data.
We learnt this conceptually first and, thereafter we also saw it by practically handling the missing values.
So, let's go ahead,
Our next topic is “Encoding Categorical Data”.
So, let's begin and see as to, what is Categorical Data?
You must have heard the word "Categories".
For instance, if I say there are categories like
One is A,
Next is B, and,
then is C
So, in this way these are three categories.
Yes you understood, right?
In a very similar way, we can divide "Categorical Data" in groups.
For example, supposedly, we can divide some data between kids age group, then Middle-age group and lastly, Seniors Group
So, here we have divided our data in three groups according to their age.
In the same way, you will find a lot of examples all around in your day to day life, where we use grouping.
Like, for instance, would you like to have sweets or salted chips.
This is also grouping.
So, such type of data is called Categorical Data
So, come let's see what are its types.
It has two types
One is Nominal, and the other is Ordinal.
Nominal is, for example Male or Female. There are no values containing numbers, or there is absence of any quantitative value.
Can we do any sum or subtraction with male or female?
No, we cannot!
Hence this data is called Nominal Data.
Next up, we will see Ordinal Data
In Ordinal data, there is an order that is followed
As you can see,
Strongly agree
Neutral
Strongly Disagree
So, here is an order of opinions that is followed, to agree on a thing, to be normal without any opinion or to be dissatisfied and not agree.
An order is being followed.
Or, other example on order could be
1st Rank,
Second rank, and
Third rank.
So, this is also an order.
In this way data related to this order is known as Ordinal data.
So, let's see how to encode this data
I will show you two techniques here,
First is “One hot encoding” and second is “Label Encoding”.
Apart from these, there are many other techniques as well.
Like as we have label Encoding, similarly we have Ordinal Encoding.
We will learn about them later.
Let us see how they actually work and how they help us.
First, we will begin with one hot encoding.
In one hot encoding, when we have categories like Red or Yellow.
For Red, we will take a different variable, and for yellow also, we will take another different variable. You will understand very clearly from our next slide, how you will bring it.
Next, each category is mapped with binary variables.
The new variable that we will bring, will be a Binary variable.
Binary meaning zero and one.
If I write zero for Red, then red is not available.
And, if I write one for Red that means red is available.
Let's understand this, directly with an example.
Here, you can see that I have a small table with me, where we have two columns as Index and Animal.
In the first table, we have a cat, then a dog and then a horse.
We have to convert it through one hot encoding.
Look, I have written the Index column 1, 2 and 3 as it is, and what did I do for Cat? I converted Cat, Dog & Horse into columns here. Then as Cat is present on Index 1, so I have written 1 below Cat, but you can see that dog and horse are absent in Index 1, so I have written zero and zero below them.
So, in this way we got a binary 1 0 0 which represents Cat only.
Similarly, you will see "dog" which is present at the second index, so here in this table in the second column of dog, I have written 1 at the second index, but cat and horse are not present at 2 index , so I have written 0.
So, in this way 010 becomes a unique binary number for "dog".
In the same way moving ahead, if I talk about the horse, it is present at index number 3, so you can see that in place of the horse column it is written 1 here.
And we have kept the rest, that is dog and cat as 0.
So, the string for its binary number has been created, that is 001.
In this way, one hot encoding will be done if we have N items or N categories, then it will generate a binary number for N digits and give its encoding like this.
Now, we will move ahead and see about “Label Encoding”.
What happens in label Encoding, is that the labels , as we were just seeing now, are converted into numerical values.
This is even more simple than one hot encoding.
So, let's see with this example.
We have the same table.
I have encoded it through the label Encoding method.
What will Label Encoding do?
It will give Cat a numerical value,
And it will give some other numerical value to 'dog'
And it will give a third numerical value to 'horse'.
Here, you can see cat is given 0
Dog is given 1,
And Horse is given 2.
In this way, through label Encoding also we can encode our categories.
So, these are two techniques that we saw,
And now we will see them practically.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
So, the further parts we will cover in the next session.
Till then keep learning and remain motivated.
Thank you.
Share a personalized message with your friends.