<< Chapter < Page Chapter >> Page >

Student: [Inaudible] and would it be in sort of turn back into [inaudible].

Instructor (Andrew Ng) :Actually, [inaudible] right question is if in policy iteration if we represent ? implicitly, using V(s), would it become equivalent to valuation, and the answer is sort of no. Let’s see. It’s true that policy iteration and value iteration are closely related algorithms, and there’s actually a continuum between them, but yeah, it actually turns out that, oh, no, the algorithms are not equivalent. It’s just in policy iteration, there is a step where you’re solving for the value function for the policy vehicle is V, solve for Vp. Usually, you can do this, for instance, by solving a linear system of equations. In value iteration, it is a different algorithm, yes. I hope it makes sense that at least cosmetically it’s different.

Student: [Inaudible] you have [inaudible]representing ? implicitly, then you won’t have to solve that to [inaudible] equations.

Instructor (Andrew Ng) :Yeah, the problem is - let’s see. To solve for Vp, this works only if you have a fixed policy, so once you change a value function, if ? changes as well, then it’s sort of hard to solve this. Yeah, so later on we’ll actually talk about some examples of when ? is implicitly represented but at least for now it’s I think there’s – yeah. Maybe there’s a way to redefine something, see a mapping onto value iteration but that’s not usually done. These are viewed as different algorithms.

Okay, cool, so all good questions. Let me move on and talk about how to generalize these ideas to continuous states. Everything we’ve done so far has been for discrete states or finite-state MDPs. Where, for example, here we had an MDP with a finite set of 11 states and so the value function or V(s) or our estimate for the value function, V(s), could then be represented using an array of 11 numbers ‘cause if you have 11 states, the value function needs to assign a real number to each of the 11 states and so to represent V(s) using an array of 11 numbers. What I want to do for [inaudible]today is talk about continuous states, so for example, if you want to control any of the number of real [inaudible], so for example, if you want to control a car, a car is positioned given by XYT, as position and orientation and if you want to Markov the velocity as well, then Xdot, Ydot, Tdot, so these are so depending on whether you want to model the kinematics and so just position, or whether you want to model the dynamics, meaning the velocity as well.

Earlier I showed you video of a helicopter that was flying, using a rain forest we’re learning algorithms, so the helicopter which can fly in three-dimensional space rather than just drive on the 2-D plane, the state will be given by XYZ position, FT?, which is ?[inaudible]. The FT? is sometimes used to note the P[inaudible]of the helicopter, just orientation, and if you want to control a helicopter, you pretty much have to model velocity as well which means both linear velocity as well as angular velocity, and so this would be a 12-dimensional state.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask