Interested in Personalized Training with Job Assistance? Know More

Complete Machine Learning Course in English > Reinforcement Learning

Upper Confidence Bound - Practical

17.3k

Start a new search

To find content from modules and lessons

Overview

Hello, I am (name) from Learnvern… ( 8 seconds pause ; music )

You are welcome to the machine learning course and this tutorial is the continuation of the last session. So let’s watch ahead.

We will today see the Upper Confidence Bound. Upper confidence bound is a concept of reinforcement learning. Ok, let me give you an example, before that, before we understand this particular program.

So, in this example I have a small robot, this is a baby robot, and we went out with this baby robot. When we went out, it saw a small dog and it started running after the dog, until we could notice, it had already run quite far. Now while running he reached a place where he could not see the dog anywhere, but it had lost almost all its battery. Now without battery it became unsettled how it should now find it’s guardian, because to find it needs battery. So what is the challenge now, the challenge is that this particular robot is now thinking as to what it should do, and that very moment what happened was that it saw a shop at that very place where it could see charging sockets, with which it could charge its batteries.

So, this was a good time for that particular robot. So it went there and after reaching it started charging. It then observed that the charging was very slow, and it would have to stay there for about two or three days before I could be fully charged… So he started charging from the other socket, but that was charging even slower. Then it started charging from the third one , and it got unsettled as to what to do now. Should I check each socket one by one? So, see how difficult this problem is.

(02/00)

This problem is called multi armed bandits. Ok, this is called multi armed bandits. This means that when you have so many choices and every choice is giving you a different reward, then you will get confused as to which choice to choose, so to solve this, we will now discuss the upper confidence bound. Upper confidence bound says that you will take some action, which means you will have to take some action to make a choice. So the action that you will take, we will consider that a little and till now the best that we have got, that also we will consider. So current action plus the best we have got till now, we will consider that also. So this is the approach taken by upper confidence bound and this is the way it works.

Let us see an example here, this example is one of the ads, this ad has to be optimized. So we will have some ads and out of that which ads are being seen the most, we are practicing to optimize that. So, here let me add this file… I will click on this file, so this is the file and I have uploaded it, OK… (Pause 3 sec, clicking). Now here import numpy as N P , import panda as P D , import matplotlib dot pyplot as plt, import math, so these relevant libraries we have imported, D S Equal to P D dot read C S V and D S dot head will be used to see whether this file has loaded or not. So this file has been loaded.

(03/37)

After this, total number of ads are 10, “the number of times add I was selected, that is number of selections” is equal to, now see what’s happening here. Zero means that value zero has been given for all ads and it has been out as zero, so for all the ten it has been specified as zero, alright. Now “the sum of the number of times ads I was correctly selected”, how many times was he selected correctly. So that has been written here as sum of rewards, rewards means correctly selected, so rewards has been written as 0 into D which means everybody will get rewards, so all ten will get rewards, and total number of rounds, how many rounds will be there? Rounds will be this much, 10,000 rounds. OK. So this is a complete scenario that has been built here. Now which ads are being selected, we will write the logic for that here, so that will come in ads selected.

So now you will see what we are doing here, we have made total rewards zero here, as in initially rewards will be zero. After that we have implemented ‘for loop’ here , for n in range zero comma N, so where did this N come from, see above here , it came from here from total number of rounds So, from zero to total number of rounds because it becomes less, it becomes 0 to N-1. Then max upper bound is equal to zero, max upper bound is zero, and ad equal to Zero, ad is also zero. So what are we doing here? For i in range Zero to D, what is D? Number of, number of advertisements right. So 0 to D means it will run ten times. Here we are specifying the number of selections, if the number of selections is greater than zero. Then, we are calculating the award, which is average award, average reward and we are also doing delta i, what is in delta i, in delta i, we have, two things are there in delta i, average and delta. So the average part is, just see here, sum of rewards divided by number of selection i. So we are taking out the average, complete average till now, so there in average part in this, and the other part in this is, this, which you can see here, square root of 3 by float 2 into math, math log n, math dot log N plus 1, divided by float number of selection i, so this thing here that is delta i, this part you will see it is telling us about the exploration. Ok.

(06/05)

So one part is talking about exploitation and one part is talking about new explorations and after that, if the condition is not there, then give the upper bound this fix value, and if the upper bound is greater than max upper bound, max upper bound is equal to upper bound and ad is equal to i… So in this way what we are doing in this for loop ? in this for loop we are doing exploration, also means which new options to choose and along with that we are doing exploitation here too. OK… So what does exploration mean? Exploration means to try new options, exploitation means that the option you have chosen, you keep going on that, so both are done here. Means for sometime exploit then explore, exploit then explore, this way. Now the ads that are selected, just append them , and no of selections, all these ads should also be appended here, and here sum of rewards ad plus equal to reward. So what happens with this.. basically with this every time at this A D, at this particular position, the reward will be updated, Ok. Then total reward plus is equal to reward. So, the total reward will now be the final reward that is calculated.

So let me execute this, and let us see what is the total reward, total reward is two thousand one seventy eight… and if we see the same thing graphically here, the this here basically shows the upper confidence bound, as I already told that whatever is existing now, out of those which is performing better, so this is performing better, fifth one is performing, fifth advertisement it is. That is performing better.

So this is the Upper Confidence Bound. Out of the many choices, we have to see two things. The first thing is that you should exploit, exploit means that you know nothing, so what you will do is, exploit. But with time you come to know, know what, this is also an option, that is also an option, you get to know multiple options, so, now you will do exploration. So, with a combination of both exploitation and exploration. Upper Confidence bound takes place. So, let's conclude this video here and in the next one we will see, what Thompson sampling is. So remain motivated, keep watching, thank you.

If you have any queries or comments, click the discussion button below the video and post there. This way, you will be able to connect to fellow learners and discuss the course. Also, Our Team will try to solve your queries.

See More

Learner's Ratings

4.4

Overall Rating

75%
13%
0%
6%
6%

Reviews

L

Losika Nicholas

5

were can i get the dataset

K

Kumar Madduru

5

Thanks for giving this course

D

Dinesh Kumar

4

Your screen is very blur and it doesn't has clarity even in 720P.Please make sure that will not happen again.

D

DOGALA UDAYKUMAR

5

bettor

N

Naresh Kulunge

4

good learning but the content titles are jumbled up, like first title of this module is decision tree dichotomiser which is practical part ahead of theory part. Same with the SVM practical 1 title has

E

Eswar Veeranki

5

good

I

Isakki Alias Devi P

5

Wonderful course

S

sushma Yadla

5

yes, i am happy to learning for machine learning in LearnVern.it i s easily understanding for Beginners.

P

Prabhat Yadav

5

Superb and amazing 😍🤩 enjoyable experience.

M

Muhammad Nazam Maqbool

5

Absolutely good course... will suggest it to everyone. has superb content that is covered in a fantastic way.

Show More

Recommended Courses

Free हिन्दी

Excel For Data Analysis

51136

3.7 Enroll For Free

Free हिन्दी

SQL For Data Analysis

19104

3.8 Enroll For Free

Course Content

Getting Started with Machine Learning

How to use LearnVern

Introduction to Machine Learning

Environment Setup Part 1

Environment Setup Part 2

Environment Setup Part 3

Data Wrangling

Importing Libraries and Dataset

Handling Missing Data

Handling Missing Data - Practical

Encoding Categorical Data

Encoding Catergorical Data - Practical

Splitting Dataset

Splitting Dataset - Practical

Normalizing the Data - Part 1

Normalizing the Data - Part 2

Finding Machine Learning Datasets

Exploratory Data Analysis

Plotting Graphs - Part 1

Plotting Graphs - Part 2

Distribution Models - Part 1

Distribution Models - Part 2

Assignment : Data Preprocessing for Machine Learning

Machine Learning Paradigms

Assignment : Machine Learning Paradigms

Decision Tree Iterative Dichotomiser 3

Random Forest

Support Vector Machine Classifier

Support Vector Machine Classifier - Practical 1

Support Vector Machine Classifier - Practical 2

Naive Bayes Classifier

Naive Bayes Classifier - Practical 1

Naive Bayes Classifier - Practical 2

Evaluating Classification Models Performance

Evaluating Classification Models Performance - Practical

Overview of Classification

Logistic Regression

Logistic Regression - Practical - 1

Logistic Regression - Practical - 2

KNN

KNN Practical - 1

KNN - Practical 2

Decision Trees for Classification

Decision Trees for Classification - Practical 1

Decision Trees for Classification - Practical 2

Assignment : Supervised Learning Algorithms

Simple Linear Regression

Simple Linear Regression - Practical

Salary Prediction using Linear Regression

Multi-Linear Regression

Startup Prediction using Multiple Regression

Support Vector Regressor

Support Vector Regressor - Practical 1

Support Vector Regressor - Practical 2

Decision Tree Regressor

Decision Tree Regressor - Practical 1

Decision Tree Regressor - Practical 2

Regressor Model Selection

Evaluating Regression Model Performance

Evaluating Regression Model Performance - Practical

Assignment : Regression Algorithms

Distance Metrics

K-Means Clustering

K-Means Clustering - Practical

Mall Customers Prediction using K Means Clustering

Hierarchical Clustering - Agglomerative , Divisive

Agglomerative Clustering - Practical

Divisive Clustering - Practical

DBscan Spatial Clustering

Mall Customers Prediction using Hierarchical Clustering

Assignment : Unsupervised Learning Algorithms

Association Rule Learning - Apriori, FP Growth

Association Rule Learning - Apriori Practical

Market Basket Analysis using Apriori

FP Growth

Market Basket Analysis using FP Growth

Assignment : Association Rule Mining

Reinforcement Learning Theory - Multi Armed Bandits

Upper Confidence Bound - Practical

Thompson Sampling - Practical

Q Learning

Assignment : Reinforcement Learning

Overview of Dimensoionality Reduction

Princinpal Component Analysis

Principal Component Analysis - Practical

Linear Discriminant Analysis

Linear Discriminant Analysis - Practical

Assignment : Dimensionality Reduction

Basics of Regularization and Optimization

Cross Validation

Hyperparameter Tuning

Sampling Methods

Underfitting and Overfitting in Models

Variance and Bias

Assignment : Regularization and Optimization

Advance Trends in Machine Learning

Introduction to Keras and Deep Learning

Practical Demonstration -Keras

Reinforcement Learning Project - Teach a Taxi Part 1

Reinforcement Learning Project - Teach a Taxi Part 2

Reinforcement Learning Project - Teach a Taxi Part 3

Reinforcement Learning Project - Teach a Taxi Part 4

Loan Prediction Project Part 1

Loan Prediction Project Part 2

Course Summary

Interview Questions Part 1

Interview Questions Part 2

Interview Questions Part 3

Career Guidelines

Enroll For Free

Complete Machine Learning Course in English Code

Free

Full Course, No Certificate

With Ads
No Certificate

₹999/-

No Ads

Full Course, with NSDC Certificate

Ad Free
Globally Recognized NSDC Certificate