Course Content

  • Thompson Sampling

Course Content


Thompson Sampling is a method that uses exploration and exploitation to maximise the total rewards gained from completing a task. Thompson Sampling is sometimes referred to as Probability Matching or Posterior Sampling.

Thompson sampling, named after William R. Thompson, is a heuristic for selecting actions in the multi-armed bandit issue that addresses the exploration-exploitation conundrum. It entails selecting the course of action that maximises the expected reward in relation to a randomly generated belief.

Thompson is better geared for optimising long-term total return, whereas UCB-1 will yield allocations more akin to an A/B test. In comparison to Thompson Sample, which encounters greater noise due to the random sampling stage in the algorithm, UCB-1 acts more consistently in each unique trial.

In a countable class of general stochastic environments, we discuss a variation of Thompson sampling for nonparametric reinforcement learning. Non-Markov, non-ergodic, and partially observable environments are possible.

Recommended Courses

Share With Friend

Have a friend to whom you would want to share this course?

Download LearnVern App

App Preview Image
App QR Code Image
Code Scan or Download the app
Google Play Store
Apple App Store
598K+ Downloads
App Download Section Circle 1
4.57 Avg. Ratings
App Download Section Circle 2
15K+ Reviews
App Download Section Circle 3
  • Learn anywhere on the go
  • Get regular updates about your enrolled or new courses
  • Share content with your friends
  • Evaluate your progress through practice tests
  • No internet connection needed
  • Enroll for the webinar and join at the time of the webinar from anywhere