Hello,
I am (name) from LearnVern. (9 seconds gap, music)
In Machine Learning's previous session we studied the K Nearest Neighbour Algorithm.
Here, we saw, How KNN performs the classification!
Now we are moving ahead to learn about Decision Trees.
Decision is a word which we always use in our lives.
For instance,
Would you like to have tea or coffee?
I hope you are able to understand me.
Here, also we decide between tea and coffee as to what we would like to drink.
So, in the same way we can make decisions for the easiest thing until something very complex, with the help of decision trees.
This decision tree also is used for classification as well as regression.
And as in the name Tree, so it uses the same tree data structural method to decide and make decisions.
So, let's understand this in detail.
Now, you can see a tree structure in front of you on your screen.
This tree structure is quite opposite to the real time tree structure.
As the root of a real tree is placed below, but here you can see that it is placed at the top most.
This is because we start with roots, therefore it is kept on the top.
Next, after the root, you can see 1,2,3 decision nodes.
This means after root, there are branches coming out and here at branches we have decision nodes.
And after that where there is no possibility of further growing of branches, there at the end we call it a leaf node.
Here, we need to understand that this decision node has conditions.
So, after getting on to one condition we will get one more condition and on the basis of that condition, we would be able to proceed till the final decision, that is till the leaf node.
It is possible that here is a condition, there is another condition, suppose that condition is fulfilled then yes, if it is not fulfilled then no.
So, this is the way we get the final output of the decision tree.
2:30
Let's understand this furthermore.
In the decision tree there are two famous algorithms.
Although they have many different variants .
But the famous ones are CART, and ID3.
CART is a Classification and regression tree, this uses a formula or a concept that is gini index.
So, this is used as a matrix, meaning it is for calculations done through gini indexes that are used to make decisions.
And ID3, which is an iterative dichotomiser 3 uses entropy, and through the help of the entropy formula we gain information.
So, entropy and Gini both are impurity indexes.
Meaning the data has impurity.
Now, you might be thinking what is this impurity?
I will explain it to you.
Supposingly, if we have some data such as.. I will take you to ‘paint window’, see here that we have the data in this way, i have one column with rain, the next column of movies and the third column is the decision.
I hope you are able to understand me.
Here, rain is a yes,
Movie is of comedy,
So, what is our decision?
Our decision is that we love comedy movies and yes we will go.
Again,
Rain is yes,
Movie is horror,
Decision is No.
Again,
Rain is yes,
Movie again is horror,
Decision is No.
Now, you can observe one thing, that whenever it is raining and the movie is horror-type , the decision is always no.
Similarly, if it goes the same way, it is raining and the movie is comedy-type, then the decision will be yes.
So you will notice, whenever it's a yes to rain and the movie turns out to be a comedy film then always the decision is positive.
Similarly it happened over here as well.
So, this type of decision where we get the same results.
Such data types are pure.
Pure means in a particular XYZ condition, we always get a yes.
This is known as a pure classification.
And, if suppose, here we would have got no, then this data would have become impure.
Because first when it was raining and the movie was comedy it was saying yes.
Now, when it is raining and the movie is comedy but the decision is no.
So, how can such a thing happen?
So, the data becomes impure.
So, here in this presentation we were learning that gini and entropy finds out how much impurity and purity does the data have.
So, if there is purity it's very easy for the algorithm to make decisions but if there is impurity then complexity increases for the algorithm.
And this we will be learning practically also.
So, friends, let's conclude here for today.
Today's session will end here, and it's further part we will be studying in our upcoming sessions.
So, keep learning and remain motivated.
Thank you.
If you have any questions or comments related to this course.
then you can click on the discussion button below this video and post it there.
So, in this way you can discuss this course with many other learners of your kind.
Share a personalized message with your friends.