Hello I am (name) from Learnvern. In the previous tutorial on machine learning we had started the topics of Introduction to Dimensionality Reduction and we saw that in dimensionality reduction we reduce the number of features. So let’s watch the first technique and this technique is PCA or Principal Component Analysis. In principal component analysis we will reduce the number of features meaning that if the number of features are 10 then by reducing the dimensions we can bring the number down to 2 or 3, so how we do this, we will just watch that. Now, this is used to avoid multicollinearity. (multi-co-linearity) What multicollinearity is that I have X1, X2 and till XN, I have many X, now what this X1 and X2 will determine is my output Y , but when we actually observed we found that X1 and X2 are related to each other also , so we want to avoid their being related to each other so for that what we do is we make it one feature or by reducing we convert both into one feature, and this kind of technique we can use. PCA is used to avoid this very feature. Now we will understand it step by step. And first of all what we will do in the first step is, I will explain to you diagrammatically also with paint also, now just understand the steps , the first step is standardization. Standardization means that if I have some values of the kind minus 1, 50, 1000 and see the variation in the values, see the difference so I will scale them, scaling means I will rescale them and here standardization has been done through the medium of Z score and you can see that Z score is equal to value minus mean so whatever data you have take out its mean, value minus mean divided by standard deviation. so this way we will get the Z score and this Z score will remain from where to where, it will be in a limited range which can be negative also and positive also.
So the first step we did was standardization where we are rescaling values so that we get values in a specific range. OK, this is the first step. Now we move to the next step and that is finding out the Covariance Matrix , now when we find the covariance matrix, so what we do in this, just watch the second line, it summarizes the correlation between all possible variables. So what happens with this is we get the correlation , we were just talking about multicollinearity, so what happens in multicollinearity,in multicollinearity we establish that how much is one variable correlated to another variable. So, here the covariance matrix will tell us the correlation between variables. Next you can view this diagrammatically how we find the covariance, here P is symmetric matrix and P into P means that the number of features that I have I will make a matrix for that, now watch here x,x then y,x and then x,z and then x,y y,y and y,z so we have done all combinations, so if it is for 3 then 3 by3 , if it is for 4 then 4 by 4 , yes here you can see that x, x,y and y,x will become the same that means if x,y and y,x you see somewhere it will mean the same and when you see the formula you will understand better.
Now, let’s proceed ahead, watch this, covariance of A,A is equal to variance of A, so the covariance between them is variance of a and the other thing that I have already mentioned is that covariance of A,B is equal to covariance of B,A or x,y or Y,A is the same thing. The next two things that are worth understanding is that correlation can be positive also and can be negative also. When will positive correlation happen, when you have two variables and one of them is increasing and the other one is also increasing with it , OK, this is also increasing and this is also increasing and if this is decreasing then this is also decreasing , so whenever both of them increase whether one increase by two and the other by half that is ok, but both are increasing positively or both are decreasing positively , it is called positive correlation. In the negative one what happens is if one increases then the other decreases , it is the opposite, in the negative one it is opposite, it is inversely proportional so this is basically negative and positive correlation. Now let’s watch ahead that what we do in step 3, in step 3 we have to find eigenvectors (ee-gun-vectors) and eigenvalues, eigenvectors and eigenvalues, so these word if someone is hearing for the first time may find it complex but this is very simple and you will understand it ahead. So eigenvectors and eigenvalues help us in finding the principal components, so for now you can understand it as we will find some eigenvectors, so this I will just explain to you, so like this is a graph and in that you draw some line and in that line you make an arrow also, means that you give direction also , so that is what is called eigenvector and that I will explain in detail and that eigenvector will have some value also and to compute it there is a formula. So these eigenvectors and eigenvalues will help us in doing what, to help find the principal component. so principal component 1 , principal component 2 we will find with their help only. (05:57)
okay, so let’s move ahead, now what happens here is, we see that how much variation is there , how much is the variation so, these eigenvectors we will first for some variables and after that we will find for PC 1 , then PC 2 , then PC 3 so whatever variables we have we will find for them. So these principal components can go from 1,2,3 upto the number of variables you have , so like if you have 5 variables so you can find upto 5 , alright. Now , how it happens is, these principal components , one component and the other are totally different from one another , Ok and what is the purpose of finding them , I will repeat that word multicollinearity , now X 1 and X 2 are dependent on each other , so if they are dependent then this is wrong , this should not happen, there should be only one feature , a strong feature which strongly identifies the output, Ok. So if this changes then this will also change and if this changes then this will also change, this should not happen, so these are uncorrelated, Ok. Uncorrelated principal components and this the purpose of PCA that we can find uncorrelated principal components, alright.
Now here you will see the largest possible variance so the most variance is seen here, the variance and this you will understand fully watching an example , these are technical terms and might sound complex but don’t worry. So, these steps we just watched here, now in this diagram also you can see, with this diagram I will just give you a hint , these are all point in blue color, different points and what we are doing is, and here X and Y you can see , so this is X and this is Y and we are trying to draw a line here and by drawing a line we are calculating distance from each point , we are calculating perpendicular distance, when we are calculating perpendicular distance then we see that we require that line which maximizes the variance, it will maximize the variance , so that is the line we are trying to find ok, and that line makes the principal component, so let’s move ahead now.
So, the second principal component , the highest one we will find and the one which will be making the second highest variance , will become our second highest principal component, Ok. So in a similar way we will find after second, the third one, then the fourth one and so on, so, if we have P variables then we can make P principal components. Now here the eigenvectors indicate directions , eigen vectors, have you seen a vector, vector means the one that shows directions, so these tell the directions because the line that you will make will have some direction and the other eigenvector that you will make or the other principal component you will make , it will also have eigenvectors , it will also have directions, so these eigenvectors which are there are actually directions, Ok, of the place where that line exists which we call principal component. Let’s move ahead, what are eigenvalues, eigenvalues are those coefficients that are associated with eigenvectors, for those eigenvectors the coefficients are eigenvalues , and here we take out the square and sum of the distances and by finding the sum we get eigenvalues. OK, now why do we square, because some distance might be negative , some distance may be positive , so keep all distances positive, we first find the square and then we do the sum and then we find these eigenvalues.
Now what we do is, whatever eigenvectors we have , we reorganize them from highest to lowest values. So you must have understood highest to lowest, so these which are from highest to lowest, so if there are 10 highest and I want only two then I will select the highest 2. So this way whatever principal components you want to select , you rearrange them and select them. (10:09)
Now, this is a feature vector, Feature vector is that matrix in which you represent the eigenvectors, that means if you select two, then you represent two eigenvectors, so this is called a feature vector, OK. So this is a new, what were the old features , they were X 1, X 2 isn’t it, now you have PC 1, PC 2 or PC 3 and if you chose two then PC 1 and PC 2 are your feature vectors, OK. So now let’s move ahead, and in this way we move towards dimensionality reduction and we have N number of features out of which P number of new features we find with the help of eigenvectors and eigenvalues. So, now Recast… What happens in recast is, like we had those lines , now you can imagine everything, the eigenvectors we had or you can call it line of fit , we cast it on the original axis, or we reshape it, so this also you will understand further with an example that when we reshape we are able to reform the principal components by plotting them graphically and understand them. So recasting is the last step and how it is done is, you can see here that the final data set is equal to the transpose of the feature vector into the transpose of standardized original dataset and with this basically we will be able to find the final dataset. So let us now try to understand it. This is my 9.2 PCA sheet and now let me give you an example here , now here let us take some samples , for example, how can we take the samples, OK , so here I am taking samples in this manner , Ok this is our sample 1 , sample 2 and sample 3. Now there can be multiple samples, after 3 can be 4, 5 and 6 and so on. Now here I have feature 1 , feature 1, feature 2 and so on. Like this I can have many features, right. Here we can put some random values because we are doing this example just for understanding. So here let me put some random values from 1 to 100, Ok, so let me put it from 1 to 10 only because till 100 it may be too much, OK. So here I have put some random values, OK.(pause 3 seconds) So here you can see that a dataset has been created just in front of you, right. So now see these samples 1, 2 ,3 what you can imagine them to be is your four offices and all of them having a different sales team , this is sales team 1, this is 2, this is 3, this is 4 so, these are different sales teams. This feature 1 , feature 2, feature 3 can be some performance, not performance , I will take it as how many numbers of training did you give to them, how many assignments did you give, what were the targets and in the next one you include that leadership team consisted of how many people , so these are your different features, alright.
So you can take any example , and you can include samples here and features here. So let’s understand it further, these are different values and I will show these to you as if initially I have only this data , let me highlight it and I have highlighted it , so there is data only for one feature. If I now want to represent data of this feature on a number line then how will I do it? So here I will go to the board, and I will open this board to understand this better. So on a number line let me show represent this data , OK, so this 8,4,10,2 ok, 8,4,10,2 and now I will rearrange it as 2,4,8,10 , and I will rearrange it OK, So here this is my line OK, and on this line I will arrange the data, let’s assume this to be a line and here I will arrange these values so, this is my first value 2, and the second value was also close to this which I believe was 4 and after this the value that I had was large and was 8 and 10 so 8 I will position here and 10 I will position here. So if you see these values of 2,4,8,10 , you can understand here easily that lowest are shown this side and maximum are shown this side, OK, and I will also write here that this is 2, this is 4, this is 8 and this is 10. Now just be seeing this, you can do a grouping between these , just by seeing this , yes we are able to do it quite easily , OK, how are we able to do grouping, this is one group and this is another, so just by seeing this much we are able to do grouping. Now, suppose I have two dimensions , if I had two dimensions then what would I have done, then basically in that case I would have , Ok let me add a new slide here, now if it was two dimension then in that case I would have had a graph like this and there would have been two values and let’s take those two values from here , so here it is 2 and 1 so 2 and 1 will come somewhere near here, isn’t it , this is 2 and 1 , let us assume this 2 and 1. Then the other one is 10 and 5 , so how to plot 10 and 5 , so 10 and 5 will be plotted like (4 second pause) we will assume 5 here and 10 here , Ok so this is 10 and 5. So similarly here it is 4,4 so let’s plot 4,4 here , so 10 and 5, 4 and 4 will come somewhere here…4,4 OK and 10 one one should go a further up so we plot it a little up, here, Ok and this is 4,4, OK. Now the last value is of 8,8 so value for 8,8 , so 10, 5 is here and so value of 8,8 will be somewhere here , so we just make a duplicate of this and we plot it here, 8,8. So this also if you see , if I want then these look a bit closer and these look closer so this is how we can do clustering, OK. So this is understanding graphically how we can do grouping but now we will see how we will apply PC. Before applying PC one more thing we will add in this and that is, that one thing is that I make a duplicate of it here , OK, now watch here , that if here I would have had one more feature, means this third feature was also there and I also have to represent that, so if third feature is also there then how will you represent it.
So we will have to convert 2D to 3D and I will have to make one more dimension like this and then here I will have to represent and here if I represent something then a larger one means that it is near and smaller one means it is farther in 3D, I will have to do something like this. But let’s assume that there is one more dimension, alright, this is also a possibility. So when more dimension comes then what do we do, in that case, so when one more dimension comes what we will do, so in this case we will have to make one more dimension which is far from human imagination, we cannot imagine that way, isn’t it? But for a machine, it treats two features also, three features also and more than three features also. So the machine treats more than two features also. How it does, we will just see, one by one we will understand that. Ok, let’s go to the new slide, we will go on the new slide and understand this…fine.
Now how will the principal components identified that I will explain. Now let’s talk about 2D the one we had here , so this is our 2D OK, so in this 2D it is showing 3,4 … so for this 2D I will take a duplicate , here let’s take another duplicate. So here I will remove these components , these groupings that I had done I will just remove , OK. I will have to erase so just give me some time. This I will just remove and what will happen with this is we will see how far is this from X axis and how far from Y axis. We will find the perpendicular distance for this from X and Y , OK.
(19:57)
So this is known as doing average measurement and how we will do it is, I will explain. So I have erased it sufficiently and now watch what we will do, we will calculate it’s distance from X axis, for this also, this also, this also and for every point so for all of them we calculated the distance from X axis and then we will find average distance , whatever distance all of them have, the average of that and let’s assume that average distance comes out to be here. So similarly, exactly similarly what we will do is we will calculate for Y axis in a similar way , OK, same distance we will calculate for Y axis also . So we calculate from here, from here, from here and from here also and the average distance we will capture like this. So you will have an average distance on the X axis and one on the Y axis , so in this way you have calculated the distance. Now this average distance that you have, from here you draw a perpendicular distance like this, so this point that is here is your center of the data, this is the center of data , so as when you find the center of data so what do you understand from the center of data? here also you understand one thing that the upper two points are above the center and the lower points are below the center, this you can see and understand, isn’t it. You can see the grouping that this is different & this different. This center point which is here, what we will do with that is we will shift this center point from here to origin, alright, so now you will have to see it with caution what I will do with it, I will make another duplicate , this one here and taking this we will try here, just watch , the origin that is there, I will choose another color, I will choose green color and draw it in bold. So, we shifted it here so instead of shifting I will with a highlighter I have shifted the origin, so what happens with this, what happens is now also you will see, now also from this point the distance to other points is still the same , alright, so we can shift it here just to understand.
Now here what is the highest point that you can see, so the highest point that can be seen in every way, let’s assume this is it , we will assume ok, this is the highest point , so in this way if you see which is the one that is to the most right, that is this , which is the most above, this is the most above. So earlier also it was like this and now also it is like this, so there are no changes isn’t it, we took it in center and despite that there are no changes in it. Now what we will do is, I will take another color , I will take yellow color, now I will try to fit a line here which originates from the center, I will draw a line like this, draw a line like this, draw a line like this, so we do this in regression , right, we fit a line , so I am trying to fit a line , so how will we fit the line, how will we fit the line. So while fitting this line what caution we have to use is , we have to observe caution in this particular line , what is the distance between these points.
So here there are two things, OK, two things are there, which I will explain to you separately. Now watch this, I will explain by drawing a small diagram, see this, this a point here and from here this line goes like this , however it is not necessary that this line crosses this point, now the distance of this point from this center this origin is fixed, that is not going to change. What is changing is , you see from here if I draw perpendicular distance , so there are two things that can change and I will mark them with red, this is one thing that can change, this is another that can change because if this line comes a little closer so, this line’s distance will increase and this perpendicular distance will reduce. If you can remember what this is, this is the Pythagoras theorem. What happens in pythagoras theorem? In pythagoras theorem what happens is A square is equal to or you do it like this that the hypotenuse H square is equal to what, both the other side that is S1 square plus S2 square , so that is equal to S1 square and S2 square, alright , so here you can see that this is fix , this blue line from origin to this point. but the line we are fitting and the perpendicular distance from that point will vary. So what will happen in this is that if this perpendicular distance increases then this line will decrease and if the line distance increases then perpendicular will decrease , and why will it decrease because if this equation if we see that if one is increasing then one has to decrease because sum of both will make H square.
So with this we are getting another intuition as to when what is increasing and what is decreasing, alright? Now what we have to find , we have to understand what we have to find, there are two things to this. If the perpendicular distance of this point is less than the fitting line then choose that or vice versa, choose this perpendicular line’s distance from the origin which is the most. so if you are seeing this, this then you see the maximum , maximum and if you are seeing this, this perpendicular distance then you are seeing the minimum, OK.
So out of the two you can choose any, this is your choice, you can choose any. So we are trying that this distance here should be kept maximum and one with the maximum distance we choose , this is what we are trying. When we have grief this out, then what we will do is, after that this line which will be formed, we will call it, we will call it best fit line, this is our best fit line because , not any line can be best fit , so this line will become the best fit line. Now what to do further, further I have explained one thing to you that basically, this H square over here is this line here which has gone towards center from this point and because of it, this distance from center of this line should be maximum.
So this we were doing for one point, just one point so, for this let me assume that this is D 1 , if we do it for other point that will be D 2 and the third will be D 3 , for fourth point D 4 , like this for different different points this will be D 5 , D 6 and whatever points you have , you collect the distance for that this in this way. The moment you collect distances OK, after collecting distance just take their square , take square of all, alright, and after taking square of all add their sum… so sum of square distances, Ok, sum of square distances, this is what you have to find , OK, the one with the maximum, the line which will have the maximum that line you have to choose, OK. This we call SS of distances , you must have studied technically many times that, SS of distances OK, so the sum of square distances is known as SS of distances. So when you will find this, whichever has the maximum just freeze that and here,this will be your first first line, your first line has been found and this line will now make your principal component, alright. Now let us understand further, so here I will add one more. so just watch , now that we have these lines and this line that we have got, alright, ok this goes from the center , so let me draw it and assume it’s going from center. In this we will see how much this line moves towards the X axis and how much towards Y axis. Let’s assume that when it moves 4 units towards the X axis then it moves 1 unit towards Y axis. You can see that this angle is approximately 25 to 30 degrees and let’s assume it is 25 , when it moves 4 units towards X and then it will move 1 unit towards Y , so this is a type of ratio or relation we can see, OK. So now the next thing that we must understand is that what this means, this means that when it moves towards X, when it moves towards X for 4 units then it will move 1 unit towards Y and this is what we call linear combination, OK, what do we call it? linear combination between what, you will see that there are two features X and Y , we have represented two features , so we call this linear combination between those features.
And now between this, find this very distance OK, this one. See this is perpendicular and this is hypotenuse , so from this you can find the unit vector because this here is let’s assume A square is equal to, this is B so this is equal to B square and plus this is C so this is C square , so A square plus B square, A square is equal to B square plus C square OK. Now A is 4 so 4 square plus 1 square and under root of this and it’s under root will be what, it will be A OK, because square of A has gone there , so this value that comes is 4 point something , 4 point 12 or something. Now with this value you divide all these three side and what will you get then, you will get unit vector OK, you will get a unit vector , so this unit vector that we will get is called EigenVector OK, we call it Eigen Vector and, and this EigenVector only we were discussing in the beginning that this unit vector , so there will be some unit vector here , in this way one unit vector , so this is your unit vector , fine. Now let’s proceed ahead and after this, we can call this Eigenvector. I will write EV here and then you can call this singular vector, you can call it SV also and this is your PC1 which we call principal component 1, OK fine. Now let’s proceed ahead, now after this you have to find PC 2 , alright, find PC 2 and to find PC 2 what will you do, firstly you have to calculate the eigenvalue, so how do you find the eigenvalue, eigenvalue you will find by calculating whatever data points you had , it’s distance from here , whatever data points you had , you calculate their distance, take out their square and then their sum of squares and that will be eigenvalue, and after finding that only we had computed this right, so that becomes your eigenvalue.
Now, let’s proceed further , now you have to find the second dimension, means what, you have to find second PC 2 , so for PC 2 simply you can , I will add one more slide , for PC 2 this is your current and perpendicular to this you draw one line , perpendicular, so you will draw a line perpendicular to it , so her if you see the relation , so here also relation is somewhat of the kind that, let’s assume that as X is moving , Y is decreasing, it’s going towards negative , so X is increasing and Y is decreasing , somewhat like this , so what will happen here, so here if X increase by one unit so Y because it is perpendicular, so Y reduces by 4 units OK, so if X increase by one then Y decreases by 4 , so this type of relation we can observe here OK, so this here is your PC 2 or you can call it singular vector for PC 2 or you can call it eigenvector PC 2. So this is how we found PC 1 and PC 2 and this is how further we can compute it, and we can programmatically also implement it , so that’s it, for now this much mathematics only,. So friend, today's session we will stop here and now we will continue in the next session.
If you have any questions or comments related to this course then you can post them beneath this video using the discussion button and by means of this you can discuss this course with other learners too.
Share a personalized message with your friends.