Hello I am (name) from LearnVern.
In a previous tutorial we saw how a supervised algorithm, that is "Naive (naa-eev) Bayes" algorithm based on probability works.
Now in today's topic , we will learn about “evaluating classification model performance”. In this we will see how these algorithms that we have studied uptil now and also that we are implementing, by running an evaluating matrix to see how accurately they complete their classification class.
So, we will learn about How to perform an Evaluating Metrics to know about algorithm accuracy or how correctly it is able to do classification.
So, let's see this.
“after building a predictive classification model needs to be evaluated.”
So performance evaluation is very important because if we create a model and do not evaluate its performance, then such a model does not carry any relevance or meaning.
Now, let's see: What are some of the Metrics that are available to us for performing these evaluations.
So, we have confusion matrix,
Model accuracy, precision and recall.
Along with that we have F1 Score, ROC Curve and AUC curve that is the area under the curve.
So, we have these many matrix and we will understand them one by one,
So, let's move ahead
First is Confusion Matrix
Here, you can see it is written true positive, false positive, false negative and true negative.
So, let's understand this a little as to what this is.
Here, in our output label we have two things positive and negative, P represents positive and N represents negative.
So, understand this, the actual dataset that is presented to us has labels as P and N.
And the prediction that the Algorithm has made is also P and N.
But, here we need to understand that if the algorithm says it's P and then for that record the prediction also says P, then we say that the output is true positive.
But if my data says it's Positive, but algorithm says it Negative,then it is False positive, because in actuality it is positive but algorithm doesn't recognise it as positive then it's false positive.
Similarly, if our algorithm says it's Positive but our actual data says it's negative then it is False negative.
In the same is true negative where both algorithm and the actual data denote it as negative, so that is called as true negative.
So, this is a confusion matrix.
Where it represents us numbers as to How many true positive, false positive and true negative and false negative, did we get.
On this basis only some other matrix is built, such as Model accuracy.
Here accuracy is true positive plus true negative.
True positive meaning, those records that are positive and algorithm also call it as positive.
True negative meaning, those records that are negative and algorithm also calls it as negative.
So, true positive plus true negative divide by true positive plus true negative plus false positive plus False negative, that means total number.
So, this is basically in short those who are correctly predicted divided by total numbers, so this is how accuracy is calculated.
Now, next is precision, meaning to find out those that are positive, in how accurate those positives are.
For instance if we have 10 positives but only 8 are predicted as positive, so to become 8, divided by TP plus FP, that means the total number of positives that are there.
So, that will become 8/ 10, which is 80.
So, 80 percent precision is counted.
So, this is the second formula.
Now, third formula is of recall, here it means the total proportion of positive class that are correctly classified,
What is the amount of positive class that in prediction also are depicted as positive.
So, true positive divided by true positive plus false positive
So, this will derive the proportion of positive classes that are correctly classified, which is TP divided by TP plus FN.
Next, we have an F1 Score.
Now, we will see this precision and recall are two different things.
So, what does F1 do, it removes a single score from precision and recall by computing it.
So, you can see 2 into recall and precision divided by recall plus precision.
So that comes out from this is a combination of both the scores precision and recall.
In this way F1 Score is formed as a single matrix, with which you can find out the score, otherwise you will have to individually remove the accuracy.
Now, next is the receiver operator characteristic curve also known as ROC Curve.
So, this is helpful for binary classification problems.
You can see here that it has 0 and 1, and there are true positive rates and false positive rates.
And here the true positive rate is called sensitivity.
False positive rate is called specificity.
So, the ROC curve forms the curve with the help of true positive rate and false positive rate.
Ok! Now after the ROC curve, we learn about its sub part that is AUC curve, that is Area Under the Curve.
This helps the algorithm in identifying which item will belong to which class.
Here you can see it is the True Positive Rate, and here we have a false positive rate.
So, here the false positive rate basically is reflecting the number of false that is increasing.
And true positive basically reflects the number of true positive cases it has.
So, as much the curve will be close to this line, that depicts true positive rates are more.
Or, if the curve is more towards the X axis then it reflects basically that false positives are more.
So, these were some matrix with whose help we evaluate as to How an algorithm is actually performing
We will stop our session here,
It's further parts we will continue in the next video.
So, keep learning and remain motivated.
Thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.