Namaskar I am Kushal from learnvern.
This tutorial is the continuation to the last tutorial and so let’s watch ahead. In this tutorial of machine learning we will understand variance and bias and the relation between them and in what way we should adjust variance and bias.So let’s get started and firstly understand variance.
Variance is ,when we implement an algorithm on a dataset then we evaluate it also , now to evaluate it we already keep the data sampled and after sampling let us assume that we have training data and testing data , so the different sets that we have training data, like we learnt K fold sampling , so if we have K 4 sets, meaning ten samples we have, then let us assume ten sets and on these ten sets how our model will differ , or what will bet the difference attained , so this is what is called variance.
Now let us move towards bias. In bias again whenever we implement an algorithm , then here we don’t see the training set but the predictor and the target ,meaning the input and the output , how strong can the algorithm build the relation between them, so this is how we check the bias.So these two somehow denote the error so to say that how accurate our model is how many errors it has.
Now here are some important things that we should understand. Now in variance, if there is a high variance then that means that overfitting is taking place continuously, overfitting is continuously taking place and in the same manner if there is high bias then underfitting is taking place continuously so that means high variance is also a problem and high bias is also a problem.
So we want that variance should be low and bias should also be low , so this is a challenging task and we call it tradeoff , that is to say how a tradeoff can be established between variance and bias and this we call variance bias tradeoff. So now to implement this we have a library , which has the name mlxtend and this we will use.
Now when I was exploring I witnessed that in this old version of mlxtend is installed due to which I first uninstalled the old version of mlxtend and then again installed the library for mlxtend , so here you will see mlxtend dot evaluate bias variance decompose OK, decompose, basically, I am using this module ,now this module previously had a library that was of 0.014 version and which was giving me some issues and that is why I uninstalled it and after uninstalling it I installed it again and you will observe now that after re-installing the version of the library has updated to 0.19.So this way I have updated the mlxtend package and installed the latest version.
Now here there is a request to restart the run time so I will restart the runtime also. Now let us walk through the code, so this is housing.csv dataset and in this we will calculate and in this we will now calculate bias and variance , now you will observe that here using sklearn dot model selection I have imported train test split and along with this I have with linear model imported linear regression and then with mlxtend dot evaluate I have imported the module bias variance decomposition , so algorithms we have already implemented because of which this section of the code, this much I am not explaining and now here we can see that after making the model mse, bias and variance that is mean square error and bias and variance we are calculating with the help of what, with the help of bias variance decompose package.
Now here we passed the model X train Y train , then X test and Y test and in loss because we have to take mse so in loss we mentioned mse here and after this number of rounds 200 and random seed value we have given as one and let us now execute it and see , and after executing it we get the output in which mse , mean square error is 22.418, bias is high and variance is low.
So this is the result we got.So we have to strike a balance between bias and variance and now let us see one more example, and this was our example of linear regression and in a similar manner let us see the example of decision tree qualifier and this is iris data which we have practiced many a times and on it let us see how bias and variance are attained. So here you can see that expected loss is of 0.062 and average bias is 0.022 and variance is 0.040 and now if in decision tree classifier which is a single algorithm I put a bagging classifier meaning I put ensembled technique then what will happen, so let’s watch it once.
So when we used bagging classifier and then we tried calculating , so it is taking some time, so let us have some patience and let us see that here the output that is attained in that it is expected that the variance for that will be low and with this expectation we will wait for the result and then we will understand the same example with decision tree regressor also.
So now here we can see that we have output on average expected loss and above here we had an average expected loss of 0.062 and the current average expected loss is 0.048, average bias is 0.022 and earlier also average bias was 0.022 but our variance has become low so this is how we try to achieve balance and try to reduce and so this is also an optimization that average bias instead of increasing is 0.022 and the average variance has reduced , so we want that bias should also be low and variance should also be low and in the same manner now if we perform it for decision tree regressor and then calculate there, then again this has been performed on boston housing data and here we can see that expected loss is 31.536 which is a high value and here bias is 14.096 and average variance is 17.440 and now let us execute this with bagging regressor that can there be an improvement to this , so here also we expect that there may be a reduction in variance , so let us wait for this execution and let us see that what we get in results and till the time results are attained,let us do a recap that bias and variance, both of these calculate the error and we want that variance should be low and bias should also be low so with this, with this intention, with this knowledge whenever you calculate bias and variance then you must try to keep it low.
So let us now see that what is the output for this, so here we can see that the in the output attained the average expected loss is of 18.620 and here the average expected loss was 31 so there is a reduction in loss and average loss is 15.461 and here it is 14.096 , so here average bias has increased but at the same time average variance has become 3.159 and here average variance was 17. So this is how we try to adjust bias and variance.
So today's session ends here, and the parts ahead we will see in the next session.. So keep learning, remain motivated, thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.