Hello I am (name) from learnvern. ( 6 seconds pause, music)
In the last tutorial of machine learning we tried to understand what PCA means, what is referred to as Principal Component Analysis and how do we calculate principal components.
Now we will see another technique and its name is LDA which is referred to as Linear Discriminant Analysis. So let's understand what LDA is. LDA is a technique similar to PCA in which we reduce the dimensions but the approach is different. Ronald A Fisher developed this in 1936 and this is called two class technique and let's understand why it's called so.
This is a supervised approach and as I had told you earlier this helps in doing dimensionality reduction. Now let us understand it in detail as to what are the assumptions in it. First we will we see the assumptions and then we will understand it's working and then I will make you go through it's walk through as to what actually it is.
So here let's assume that there may be any variable, any dimension or an attribute, all of them follow Gaussian distribution. Now what Gaussian Distribution is that I will just explain her. Just watch this diagram here, this diagram is of Gaussian distribution , all of them, we call it Normal distribution or call it Bell shaped curve.
- So the assumption here is that all of them, all of the variables here will follow what? they will follow Gaussian distribution.
- The second assumption here is that all the features here will have the same variance, this is also an assumption.
- The third assumption is that the features are sampled randomly meaning random samples have been taken, like if there are 1000 then randomly 100 have been taken. So these are the assumptions of LDA.
1:50
So, let's understand how it's done.
- So firstly I will tell you verbally and then we will understand it in a diagrammatic form. So first of all we have; compute separate ability OK, separate ability, now what does this mean. Let's assume I have a triangle and a circle and separation between triangle and circle can be because of what, to compute that separability, to compute distance between them, so this is the first part and it's called between class variance. So one class is triangle and what is the next class, it is a circle, so between class variance.
- The second point here is that now you see inside a class, so see inside the triangle, so how much is within class variance , so what is the variance, so variance may be scattered , it's spread . How much separability is there in it. So now you see circles that show how close or far they are and how far and near are the triangles. So you will observe if we see this thing in general that circles are quite far from triangles in terms of features and the circle and the other circle are quite near. So with this you will be able to gauge in your mind that between two different classes variance will be more and inside them the components which are there, means circle and a circle , triangle and a triangle , the variance between them will be less. So these two things we have just discussed.
- Next is, now create lower dimensional shape , now we have to create a lower dimensional shape , now how we make it , I will just tell you OK, this will maximize the variance between the class and minimize the variance inside the class , and this is what I was telling you isn't it, it will maximize variance between triangle and circle and minimize between circle and circle and reduce for triangle and triangle , so this is the, this is the third step.
3:37
OK these applications we will see later, we will just understand it now. To understand it we will just take this example here and this example is that you introduced some scheme . Alright. And this scheme. This scheme that you have introduced is for those who are below the poverty line , for them you have introduced it, alright. Now for below poverty line , let's assume that you ran this scheme in one state , OK, state 1 , and then you ran in State 2 and then you ran in State 3 and so like this you are running in multiple states the same scheme, same scheme, OK. Now I will just show you linearly a thing by drawing , that this scheme that you are running, let's assume that what you are seeing here , we will give them color also . Let's give them a color also, OK nice , so here you can see that the white one's , for these white one's the scheme is not working , the scheme is working for some people and for some it is not working, so you will see that you ran a scheme but it works for some people and fro some people it does not work. So here I will make some duplicates of this OK.
so these are the people for whom it is not working , these people in yellow who are there, the scheme is not working for them and let's assume that for people in white it is working. So we will generate more of these white one's , for them it is working OK… This is possible, it can happen that they have different problems , their challenges may be different, those schemes may not work there properly , that is a possibility, it can happen isn't it. So in this way you can see here how this is , OK yes. Now here, let me make one more duplicate of this OK, so that you can understand it…. This one here, what we will call it, is not satisfied , so the purpose which was there was not fulfilled and that is why this has become not satisfied, and the white one is satisfied, so this one is satisfied. Ok? ( 6 seconds pause)
So now see what we will do , OK, what we will do on this , now here let us assume that on one state only this is working, only one state , this is data for one state , so by seeing one state we can observe one thing that when we are moving towards this side , OK , I will just modify it a bit , when we are moving towards the right side , you see , this arrow I will make here , we are moving towards right side so what is happening is the density is increasing OK, we are getting more number of points when moving towards right side OK, and when we move towards left it is decreasing , so here OK, here we will instead of one state jump to two states , so this was a single state so we did like this, and now there are two states so now we will do it for two states , so for two states when you do this in that case what will happen is , in that case all these that are there will remain there but the dimension will be 2 D so we will make a 2D diagram and then understand it , so it will be something like this , just watch this,( 9 seconds pause) OK, so it will be something like this , I have taken one of this here and one white one also I will take , OK, so you can see that the data will be seen somewhat in this form. Now let us plot the data in 2 D, here one state will come on X axis and one state will come on Y axis and you will see the date somewhat like this OK.
7:46
So, in 2D it is a bit more clear as to how many people it is working and on how many people it is not working , so you can understand clearly. Now what we will do, now we can do a thing that between these two we draw a line like this , now we have drawn the line and what this line does is , it is separating both of these , so you will understand that PCA has a different approach OK, it has a different approach , what is the approach here, here the approach is that we are observing the separability between these two, that this is not working and this is working , these white one's are working and these in yellow are not. So to separate them we have drawn this line in between that separates them, it separates the two classes, OK.
Now , now I will go one level further ahead and if we talk about 3D , so just watch here, these are three dimensions, so you just imagine that this is the third dimension OK, where you are sitting now, you can see that there will be a corner and at that corner you will see it is 3 D, three lines would have gone like this OK, so this is 3D. Now it is difficult to represent it here, but OK I will represent it , like these are the white one's so here we can see that these are the white one's and similarly these are also there, we are doing it according to three states , 3D will be a little complex because it is a bit beyond imagination, we will not be able to visualize it OK. So in this way these data points can be seen here and these white one's can also be seen , now if you have to show it in 3D then you can assume that these bigger one's are nearer to us and the smaller ones are far from us , so this an assumption that we can assume in 3 D OK, large size means nearer, small size means farther so in this way we can take some assumptions , fine.
9:48
Now in this also if I want to separate then what will I do, how is separation done in 3 D , in 3 D I will have to draw a plane, I will have to draw a plane . Now if I have drawn a plane here OK, this I have drawn a plane. Assume it to be a plane , it is a plane , let us give it a different color , now this plane is separating, in 2 D it is difficult to understand, difficult to understand in 2 D but you can imagine that this plane is separating these points from these , although in 3 D if we rotate it and see then we will be able to understand more but in 3D if you put a 2D plane then it can separate two classes and in a similar way if they are more than this then to visualize that will not be possible but, but it works in this manner only. The linear discriminant analysis works in this way only. It identifies separability between both the classes , now after identifying the separability, observe how it will reduce OK, there are two classes, how it will reduce.
So for reducing there is one way, let us see that first . See one way is to do like here we have state 1 and here we have state 2 , there are two states and between them we have this data . OK, so now what we will do is, we will ignore this . We will ignore this and here we take out the perpendicular distance for these , so when we will find the perpendicular distance then what will happen is , by finding perpendicular distance what will happen is that we will find all values on X axis but just watch this that here you have completely ignored this and if you would have done vice versa also, means if you would have taken only this then also it was the same thing that the values of X axis have been ignored completely. So this is not a good way, it is not a correct way, right? I will tell you a better way, similar to what LDA does.
What do LDA do is whatever data points are there , I am not giving the points different colors this time, see these are your data points , so for these data points it will prepare a new dimension , will prepare a new dimension , now how work will be done in this new dimension is that in this new dimension, in this new dimension we will see that it is taking Y also and X also , so the intersection of X and Y OK, so this is considering both . So this is what LDA does, to combine both of them it forms a new axis and on this new axis we will plot values of both features , So this is how LDA works.
12:45
So now you would have understood how it has made two dimensions into one. OK, now watch , let us move further when, further to this, the algorithm that we have studied it let's implement it here so that the scatter and the variance , so between the two should be more distance that is between one class and another, but in class within a class if you see , those values should have less less distance , the scatter should be less , so how we do this, how we find this , just watch this, here again , this is my line and I again take these values from here OK (10 seconds pause) OK, so here I took this and this one also , yes, now you see and assume that here it is plotted OK, there will be multiple values of this kind, these multiple values here have been plotted in new dimension and for this also multiple values have been plotted OK. Now this will not be that accurate , accurate means here they all seem to be different, it may so happen that one out of them may be here somewhere OK, this can happen, it is a possibility , one may be here and one may be there , so it is a possibility, it may happen , OK, so this way we come in new dimension. Now what we will do is we will see what is the mean of this is . might be somewhere here will be the mean and we represent it by mieu (pronounce: me-u) and the mean of this will be here and let me represent this with mieu dash and this with mieu double dash. So in this step we will see that these mieu's, that is mieu dash and mieu double dash , the distance between them, minus, distance between them should be maximum, what it should be , it should be maximum and these points, all these data points here , the scatter between them , the scatter , white one's and yellow one's , their scatter should be the minimum so here S square plus S square , I will write them as S1 and S2 OK, so this should be the minimum , this should be minimum , so in this manner the best of this line we identify, that which is the one fitting the best, OK. Now here if there are a lot of classes , this can also happen that sometimes classes are more, there are many classes , many means if they are more than two then what will we do , so if the classes are more than two, like here we have these two , so there are more than two classes, and here in a red circle there is another class , this is also there, so there are more than two .
15:50
So it will happen in this manner sometimes, so what we will do in this case is we will have to modify the previous formula a little bit. In this we will find the center of points, center for all the points and for all points let us assume that this is the center , this is the center. So in this for all distances, means mean of this, mean of this, and mean of this , we will calculate their distance from here , so you must have understood that earlier we were calculating distance for two pints because there were only two points in 2 D, or in 3D also if you see , ok in 3D also there are more than two points. More than two classes. So here you have to identify a center point for every state and from here you calculate mean for all.
So this becomes D1 plus D2 plus D3 , OK and then square all of them. Do you know Why do we square, because we do not want negative values because it can nullify, sometimes it can give the sum of zero for the complete submission. So we will take the square of all, and similarly here we will take the square of S1 plus the square of S2 plus the square of S3 , so this is how we will do it. So, this is all we will do in 3D, so LDA works in this manner. I hope all of you would have understood , little bit of this intuition. Now let us go back to applications , so applications that we had seen in PCA are here also . In medical if you have alot of diseases which have lots of features , with many parameters it is known whether that disease is there or not then that is an application where this will work , this basically can give more information with less features.
In customer recognition also, where we see images , in what way, whether they are images or customer's features like their buying behavior or how many items he buys every time, so there are a lot of features of customers that we have to record , how frequently he comes, he visits , how much time he stays, so there are so many features so those features also we can reduce here and recognize. Then face recognition, we see lots of applications that recognize faces. There are also so many features and to reduce them LDA can be used. And the parts ahead of this we will see in the next session. So keep learning, remain motivated, thank you.
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.