Namaskar I am Kushal from learnvern.
You are welcome to the course on machine learning and this tutorial is the continuation of the previous tutorial. So let us watch further and in today's session we are going to see a project on machine learning. The title of this project is teach a taxi with reinforcement Q learning. So here we will try to make an automatic taxi, an automatic car or you can say automatic cab. Now to make this we will not try to make the functionalities of the car but will try to make that how can a car learn from its environment by taking step after step and on taking each step what kind of experiences it will get and in those experiences the rewards that it is going to get or some penalties that will be imposed on them and so on basis of rewards and penalties, how this taxi can optimize itself. So here our scenario is that this taxi is going to give pick up and drop to the passengers and pickup will have a location and drop off will also have a specific location and this taxi has to pick up from the right location and then drop at the right location and this will be possible with reinforcement learning which is an algorithm and it's name is Q learning and on basis of Q learning our taxi, our agent will do learning. So come let us understand the pre-requisites and they are that you should have basic knowledge of python which you have seen before this course and throughout this course also that we have been using python and to some extent you should know linear algebra also, although we are not going to deal with too much of mathematics in this project and so for those of you who do not know linear algebra, you can also be at rest and don't worry you will also be able to understand.
Now what will be its outcomes, what reinforcement learning is and how it works and we will use the package open AI Gym to create the complete environment and we will implement Q learning with python code. So now let us move forward, now reinforcement learning and let us first take an example, let us take example of a dog , so if you have a pet dog and its is your agent , so in that case what you will do is , you will walk the dog, feed it, give water and it will stay with you , so it's surrounding that is there is called environment , so dog is your agent and the surrounding is your environment, so now after environment also, suppose you took him for a walk, so when you are walking the dog that becomes a state , and now you have brought him back to home and when you are back home then that is another state , and now when you play with it, it becomes another state , so means that as and when this dog meets new situations, the state also changes.
Now here also there is an example that now the state could be your dog standing and you use a specific word, so that dog is standing and you call him by a specific word or you are telling it to stand up, sit down and on your saying it is standing up then that's a different state , on saying sit down it is sitting that is a different state . So this is how it is.
Now the agent meaning the pet dog , that basically will perform some actions , now it is an obvious thing that it will perform the action, so now when it will perform the action so because of that action what will happen is that it will go from one state to another, like if it runs and comes to you then it has come from it's first state to the second one in which it is closer to you.
So in this way, what happens when you take an action , the state changes when you take an action. Now consider the fact that we are talking and while talking we are leaving behind the older states and entering a new state , in older state we were discussing dog as the agent and its environment and in this current state we are discussing the dog's action and the changing states due to the actions of the dog , so in each state there is a change and that change shows that the state has changed , this change can be of physical location , this change can be of some input output , this change can happen in anyway , like my age is increasing every second, so I am in a new state in every second , so states can be defined somewhat-like this.
Now the next thing is that when an action is taken then the state surely changes but along with the change in state reward or a penalty is also acquired. Now what are these rewards and penalties, for example, if the pet dog is told to sit down and it sits down and it is given something to eat after it sits down , so this becomes a reward and now it will remember it the next time, and whenever it will be told to sit it will recall the reward and it will sit down because it knows that it was rewarded the last time and whenever it follows the same it surely gets the reward , so this is what a reward is.
Now at times the dog does not obey us and it behaves notoriously and when we see it becoming extreme so we do not given him food for sometime , we will not talk to it, will leave it alone , then what has happened this is a penalty levied on it , it has got a punishment , so rewards are positive and negative and if you sometimes get a positive feedback then it will be like a reward and if it is negative feedback it will be like a penalty , so what happens with this is, that the learning of pet dog improves and it understands that what is should do and what it should not.
In normal human psychology also you will observe that the actions which are appreciated are repeated by people meaning if someone does a good job and you praise it, then they will repeat it , so the action which is appreciated is repeated and in the same way there is a rewarding system here and those who get punishment is not repeated , so this basically is reward and penalty.
Now we also have a policy, policy means that when you take an action so every action must have a strategy such that action one and two are both possible but why must you take only action one or why should you take only action two or why should not you take action three, so this should also have a policy or a strategy.
Now let us move further, now in reinforcement learning this concept comes between unsupervised and supervised learning, now you will see that it is not semi supervised but it is not supervised also and it is not unsupervised also why, because we earlier discussed that in supervised we have prior data and the output also, in unsupervised along with data output is not three, labels are not there , in reinforcement learning initially you do not have data at all and as and when data is collected it may be collected in supervised way or it may be in unsupervised way , so it is lying between the spectrum of supervised learning and unsupervised learning.
Now here there is one important thing that one must remember that being greedy doesn't always work. Being greedy all the time can also cause harm , for example there is a person and he has some eatables around him , now he does not know that he has only these eatables to consume and after this he won't get more and he is greedy, now by being greedy what he is doing is quickly consuming the first one, the second one, the third one, the fourth one and quickly he has consumed all but he doesn't think at all that if he consumes all of them does he have any other source from where he will be able to manage food , and if it does;t happen then he will die, the person will die , he will die and in the same way being greedy does not always work, it doesn't always work , so that is why along with being greedy there must be some other strategy that must be devised on some logic or some other data.
Now the next part here is sequence matters in reinforcement learning , and surely sequence matters, you will observe that if from this place you have to go to an XYZ place , so to reach XYZ place you may have four paths , one can be simple, other can be complex, the other may meet the previous one somewhere OK, that can be there.
Now here sequence is important, if you have taken the wrong sequence initially only then it might happen that you have to go via the longest path , yes it is a possibility , so sequence matters in reinforcement learning. Now let us move further and understand this process.
Here you can see that we have environment, states and rewards and there is an agent also, an agent performs an action or an execution and on the basis of that action a new state is attained and rewards are received , and so in this way reinforcement learning process takes place in which the agent does learning.
So here are some steps written that, first of all observe the environment, then decide how to take the action , means using which strategy action should be taken , then perform that action and after performing that action a reward or penalty is received and so take the reward or penalty, and then learning from experiences , as you keep getting rewards and penalties the agent will keep getting experiences and with those experience the strategy can be refine and then follow this procedure till an optimal strategy is found.
So this is the whole purpose of reinforcement learning. So let us move further and now we will create a whole scenario of self-driving car.
So to create the scenario of a self -driving car we will have to understand what kinds of components we will have and what all situations we will have.
So here we will have a case of drop off , that means we have to drop off a passenger at the right location , meaning you just don't drop a passenger anywhere but there will be a correct location where the passenger has to be dropped.
Now save passengers time, save the passengers time , so if the customer was going to take ten steps ten times to reach a place so you take ten thousand or one lakh or ten lakh steps as that will be incorrect, that will be wrong, so we have saved passenger's time also. So this is what we have to calculate while doing the first one, the second one, the third one, in which one the minimum time is being taken, so this is what we have to check.
So now, take care of passengers safety and traffic rules meaning that if there is a barricade then you cannot hit the barricade , then there is no safety of the passenger, no safety of the car , so passengers safety and traffic rules, you cannot break traffic rules also , so this also we have to take care of.
So let us move forward and move forward , there are different aspects that need to be considered , there are many aspects .
So now we shall discuss some more concepts. Now let us take reward , so reward or penalty, now agents should get a high positive reward for a successful drop off , so if the drop off was correct then a high positive, high positive reward should be given for successful drop off because the task was correctly done.
Now the agent should be penalized, so he should be penalized , when should it be penalized, if he tries to drop the passenger at the wrong location , should be penalized if done at a wrong location.
The agent should get slightly negative reward for not making it to the destination, means he made the pick up but did not reach the destination then slight negative reward OK, some negative reward should be given, and in the same way, now why slight, because we will want that the agent reaches the destination till basically, and that also it must reach on time, not in the way that I was giving the example that it was reaching in ten thousand times and this is reaching in one lakh times, so in one lakh times we will give it a slight negative reward and that will become larger in ten lakh times , so this becomes the mechanism for negative rewards.
Now the other case is of state space, state space teels as to how the complete environment will be , so in state space you can observe this matrix and in this matrix you will be able to see from top to bottom, here 0,1, 2, 3 OK , so here you can see 0 1 2 3 and here also you can see 0 1 2 3 , so with this a five by five matrix is being formed and in that here are colors also given like here there is G R Y B and from these colors only you can see that in pink over here a man is standing and at this blue here, he has to reach , so he has to reach from here to here, reach from Y to R and this is a taxi and taxi can move ahead, move back and it can move to the right hand side also but it cannot move towards the left because what is there on the left side, on the left there is a barricade , so because of this for this taxi, for this taxi we will have to define actions , and some policies and strategies.
So this is the scenario and to make it easy we have divided it into a five by five matrix , so that we can easily deal with the situation, in one step it will only move to one square box OK, this way.
So now just see here that there is a five by five grid and twenty five possible locations we can have, twenty five OK, and notice the current location state of our taxi , so where is it 0 1 2 3 so three and after three 0 1 so our taxi is at three comma one ok.
Now let us move forward and understand more, now the complete state of the environment we have will be , how much will it be, five into five into five into four , so how will this be, you will now see that here it is four plus one passenger, so what are the possibilities, so there are four possibilities and there is one passenger , so four plus one, this becomes five and there are four destinations , so five into five into five into four, so this becomes , so this is the five into five matrix , this is four plus one five and this four has come from here , so so five into five into five into four is equal to five hundred, so there are five hundred total possible states that we have.
Now let us move further, here you will see that in this case how many actions we have , we have south, north, east , west , pick up and drop off, so these six possible actions we have. Now let us move further towards action space.
So actions and action space , so actions are specific things like south, north, east, west, pick up ,drop off, so these are our actions and action space is the total possible actions that emerge from these states and that will be our action space . So I will conclude this part here and in the next part we will see how we can do implementation with python, how we can implement it with python .
If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your query.
Share a personalized message with your friends.