<< Chapter < Page Chapter >> Page >

The reason – one of the reasons that reinforcement learning is much harder than supervised learning is because this is not a one-shot decision making problem. So in supervised learning, if you have a classification, prediction if someone has cancer or not, you make a prediction and then you’re done, right? And your patient either has cancer or not, you’re either right or wrong, they live or die, whatever. You make a decision and then you’re done.

In reinforcement learning, you have to keep taking actions over time, so it’s called the sequential decision making. So concretely, suppose a program loses a game of chess on move No. 60. Then it has actually made 60 moves before it got this negative reward of losing a game of chess and the thing that makes it hard for the algorithm to learn from this is called the credit assignment problem. And just to state that informally, what this is if the program loses a game of chess in move 60, you’re actually not quite sure of all the moves he made which ones were the right moves and which ones were the bad moves, and so maybe it’s because you’ve blundered on move No. 23 and then everything else you did may have been perfect, but because you made a mistake on move 23 in your game of chess, you eventually end up losing on move 60.

So just to define very loosely for the assignment problem is whether you get a positive or negative reward, so figure out what you actually did right or did wrong to cause the reward so you can do more of the right things and less of the wrong things. And this is sort of one of the things that makes reinforcement learning hard.

And in the same way, if the helicopter crashes, you may not know. And in the same way, if the helicopter crashes, it may be something you did many minutes ago that causes the helicopter to crash. In fact, if you ever crash a car – and hopefully none of you ever get in a car accident – but when someone crashes a car, usually the things they’re doing right before they crash is step on the brakes to slow the car down before the impact. And usually stepping on the brakes does not cause a crash. Rather it makes the crash sort of hurt less.

But so, reinforcement algorithm, you see this pattern in that you step on the brakes, you crash, it’s not the reason you crash and it’s hard to figure out that it’s not actually your stepping on the brakes that caused the crash, but something you did long before that.

So let me go ahead and define the – formalize the reinforcement learning problem more, and as a preface, let me say algorithms are applied to a broad range of problems, but because robotics videos are easy to show in the lecture – I have a lot of them – throughout this lecture I use a bunch of robotics for examples, but later, we’ll talk about applications of these ideas, so broader ranges of problems as well. But the basic problem I’m facing is sequential decision making. We need to make many decisions and where your decisions perhaps have long term consequences.

So let’s formalize the reinforcement learning problem. Reinforcement learning problems model the worlds using something called the MDP or the Markov Decision Process formalism. And let’s see, MDP is a five tuple – I don’t have enough space – well, comprising five things.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask