<< Chapter < Page Chapter >> Page >

This is clearly not a very good representation, right, and when we talk about regression, you just choose some features of X and run linear regression or something. You get a much better fit to the function. And so the sense that discretization just isn’t a very good source of piecewise constant functions. This just isn’t a very good function for representing many things, and there’s also the sense that there’s no smoothing or there’s no generalization across the different buckets. And in fact, back in regression, I would never have chosen to do regression using this sort of visualization. It’s just really doesn’t make sense.

And so in the same way, instead of X, V(s), instead of X and some hypothesis function of X, if you have the state here and you’re trying to approximate the value function, then you can get discretization to work for many problems but maybe this isn’t the best representation to represent a value function. The other problem with discretization and maybe the more serious problem is what’s often somewhat fancifully called the curse of dimensionality. And just the observation that if the state space is in RN, and if you discretize each variable into K buckets, so if discretize each variable into K discrete values, then you get on the order of K to the power of N discrete states. In other words, the number of discrete states you end up with grows exponentially in the dimension of the problem, and so for a helicopter with 12-dimensional state space, this would be maybe like 100 to the power of 12, just huge, and it’s not feasible. And so discretization doesn’t scale well at all with two problems in high-dimensional state spaces, and this observation actually applies more generally than to just robotics and continuous state problems. For example, another fairly well-known applications of reinforcement learning has been to factory automations. If you imagine that you have 20 machines sitting in the factory and the machines lie in a assembly line and they all do something to a part on the assembly line, then they route the part onto a different machine. You want to use reinforcement learning algorithms, [inaudible] the order in which the different machines operate on your different things that are flowing through your assembly line and maybe different machines can do different things. So if you have N machines and each machine can be in K states, then if you do this sort of discretization, the total number of states would be K to N as well. If you have N machines and if each machine can be in K states, then again, you can get this huge number of states. Other well-known examples would be if you have a board game is another example. You’d want to use reinforcement learning to play chess. Then if you have N pieces on your board game, you have N pieces on the chessboard and if each piece can be in K positions,then this is a game sort of the curse of dimensionality thing where the number of discrete states you end up with goes exponentially with the number of pieces in your board game. So the curse of dimensionality means that discretization scales poorly to high-dimensional state spaces or at least discrete representations scale poorly to high-dimensional state spaces. In practice, discretization will usually, if you have a 2-dimensional problem, discretization will usually work great. If you have a 3-dimensional problem, you can often get discretization to work not too badly without too much trouble. With a 4-dimensional problem, you can still often get to where that they could be challenging and as you go to higher and higher dimensional state spaces, the odds and [inaudible] that you need to figure around to discretization and do things like non-uniform grids, so for example, what I’ve drawn for you is an example of a non-uniform discretization where I’m discretizing S-2 much more finally than S-1. If I think the value function is much more sensitive to the value of state variable S-2 than to S-1, and so as you get into higher dimensional state spaces, you may need to manually fiddle with choices like these with no uniform discretizations and so on. But the folk wisdom seems to be that if you have 2- or 3-dimensional problems, it work fine. With 4-dimensional problems, you can probably get it to work but it’d be just slightly challenging and you can sometimes by fooling around and being clever, you can often push discretization up to let’s say about 6-dimensional problems but with some difficulty and problems higher than 6-dimensional would be extremely difficult to solve with discretization. So that’s just rough folk wisdom order of managing problems you think about using for discretization. But what I want to spend most of today talking about is [inaudible]methods that often work much better than discretization and which we will approximate V* directly without resulting to these sort of discretizations. Before I jump to the specific representation let me just spend a few minutes talking about the problem setup then. For today’s lecture, I’m going to focus on the problem of continuous states and just to keep things sort of very simple in this lecture, I want view of continuous actions, so I’m gonna see discrete actions A. So it turns out actually that is a critical fact also for many problems, it turns out that the state space is much larger than the states of actions. That just seems to have worked out that way for many problems, so for example, for driving a car the state space is 6-dimensional, so if XY T, Xdot, Ydot, Tdot. Whereas, your action has, you still have two actions. You have forward backward motion and steering the car, so you have 6-D states and 2-D actions, and so you can discretize the action much more easily than discretize the states. The only examples down for a helicopter you’ve 12-D states in a 4-dimensional action it turns out, and it’s also often much easier to just discretize a continuous actions into a discrete sum of actions. And for the inverted pendulum, you have a 4-D state and a 1-D action. Whether you accelerate your cart to the left or the right is one D action and so for the rest of today, I’m gonna assume a continuous state but I’ll assume that maybe you’ve already discretized your actions, just because in practice it turns out that not for all problems, with many problems large actions is just less of a difficulty than large state spaces. So I’m going to assume that we have a model or simulator of the MDP, and so this is really an assumption on how the state transition probabilities are represented. I’m gonna assume and I’m going to use the terms “model” and “simulator” pretty much synonymously, so specifically, what I’m going to assume is that we have a black box and a piece of code, so that I can input any state, input an action and it will output S prime, sample from the state transition distribution. Says that this is really my assumption on the representation I have for the state transition probabilities, so I’ll assume I have a box that read take us in for the stated action and output in mixed state. And so since they’re fairly common ways to get models of different MDPs you may work with, one is you might get a model from a physics simulator. So for example, if you’re interested in controlling that inverted pendulum, so your action is A which is the magnitude of the force you exert on the cart to left or right, and your state is Xdot, T, Tdot. I’m just gonna write that in that order. And so I’m gonna write down a bunch of equations just for completeness but everything I’m gonna write below here is most of what I wanna write is a bit gratuitous, but so since I’ll maybe flip open a textbook on physics, a textbook on mechanics, you can work out the equations of motion of a physical device like this, so you find that Sdot. The dot denotes derivative, so the derivative of the state with respect to time is given by Xdot, ?-L(B) cost B over M Tdot, B. And so on where A is the force is the action that you exert on the cart. L is the length of the pole. M is the total mass of the system and so on. So all these equations are good uses, just writing them down for completeness, but by flipping over, open like a physics textbook, you can work out these equations and notions yourself and this then gives you a model which can say that S-2+1. You’re still one time step later will be equal to your previous state plus [inaudible], so in your simulator or in my model what happens to the cart every 10th of a second, so ? T would be within one second and then so plus ? T times that. And so that’d be one way to come up with a model of your MDP. And in this specific example, I’ve actually written down deterministic model because and by deterministic I mean that given an action in a state, the next state is not random, so would be an example of a deterministic model where I can compute the next state exactly as a function of the previous state and the previous action or it’s a deterministic model because all the probability mass is on a single state given the previous stated action. You can also make this a stochastic model. A second way that is often used to attain a model is to learn one. And so again, just concretely what you do is you would imagine that you have a physical inverted pendulum system as you physically own an inverted pendulum robot. What you would do is you would then initialize your inverted pendulum robot to some state and then execute some policy, could be some random policy or some policy that you think is pretty good, or you could even try controlling yourself with a joystick or something. But so you set the system off in some state as zero. Then you take some action. Here’s zero and the game could be chosen by some policy or chosen by you using a joystick tryina control your inverted pendulum or whatever. System would transition to some new state, S-1, and then you take some new action, A-1 and so on. Let’s say you do this for two time steps and sometimes I call this one trajectory and you repeat this M times, so this is the first trial of the first trajectory, and then you do this again. Initialize it in some and so on. So you do this a bunch of times and then you would run the learning algorithm to estimate ST+1 as a function of ST and AT. And for sake of completeness, you should just think of this as inverted pendulum problem, so ST+1 is a 4-dimensional vector. ST, AT will be a 4-dimensional vector and that’ll be a real number, and so you might run linear regression 4 times to predict each of these state variables as a function of each of these 5 real numbers and so on.

Questions & Answers

how do they get the third part x = (32)5/4
kinnecy Reply
can someone help me with some logarithmic and exponential equations.
Jeffrey Reply
sure. what is your question?
ninjadapaul
20/(×-6^2)
Salomon
okay, so you have 6 raised to the power of 2. what is that part of your answer
ninjadapaul
I don't understand what the A with approx sign and the boxed x mean
ninjadapaul
it think it's written 20/(X-6)^2 so it's 20 divided by X-6 squared
Salomon
I'm not sure why it wrote it the other way
Salomon
I got X =-6
Salomon
ok. so take the square root of both sides, now you have plus or minus the square root of 20= x-6
ninjadapaul
oops. ignore that.
ninjadapaul
so you not have an equal sign anywhere in the original equation?
ninjadapaul
Commplementary angles
Idrissa Reply
hello
Sherica
im all ears I need to learn
Sherica
right! what he said ⤴⤴⤴
Tamia
hii
Uday
what is a good calculator for all algebra; would a Casio fx 260 work with all algebra equations? please name the cheapest, thanks.
Kevin Reply
a perfect square v²+2v+_
Dearan Reply
kkk nice
Abdirahman Reply
algebra 2 Inequalities:If equation 2 = 0 it is an open set?
Kim Reply
or infinite solutions?
Kim
The answer is neither. The function, 2 = 0 cannot exist. Hence, the function is undefined.
Al
y=10×
Embra Reply
if |A| not equal to 0 and order of A is n prove that adj (adj A = |A|
Nancy Reply
rolling four fair dice and getting an even number an all four dice
ramon Reply
Kristine 2*2*2=8
Bridget Reply
Differences Between Laspeyres and Paasche Indices
Emedobi Reply
No. 7x -4y is simplified from 4x + (3y + 3x) -7y
Mary Reply
is it 3×y ?
Joan Reply
J, combine like terms 7x-4y
Bridget Reply
how do you translate this in Algebraic Expressions
linda Reply
Need to simplify the expresin. 3/7 (x+y)-1/7 (x-1)=
Crystal Reply
. After 3 months on a diet, Lisa had lost 12% of her original weight. She lost 21 pounds. What was Lisa's original weight?
Chris Reply
what's the easiest and fastest way to the synthesize AgNP?
Damian Reply
China
Cied
types of nano material
abeetha Reply
I start with an easy one. carbon nanotubes woven into a long filament like a string
Porter
many many of nanotubes
Porter
what is the k.e before it land
Yasmin
what is the function of carbon nanotubes?
Cesar
I'm interested in nanotube
Uday
what is nanomaterials​ and their applications of sensors.
Ramkumar Reply
what is nano technology
Sravani Reply
what is system testing?
AMJAD
preparation of nanomaterial
Victor Reply
Yes, Nanotechnology has a very fast field of applications and their is always something new to do with it...
Himanshu Reply
good afternoon madam
AMJAD
what is system testing
AMJAD
what is the application of nanotechnology?
Stotaw
In this morden time nanotechnology used in many field . 1-Electronics-manufacturad IC ,RAM,MRAM,solar panel etc 2-Helth and Medical-Nanomedicine,Drug Dilivery for cancer treatment etc 3- Atomobile -MEMS, Coating on car etc. and may other field for details you can check at Google
Azam
anybody can imagine what will be happen after 100 years from now in nano tech world
Prasenjit
after 100 year this will be not nanotechnology maybe this technology name will be change . maybe aftet 100 year . we work on electron lable practically about its properties and behaviour by the different instruments
Azam
name doesn't matter , whatever it will be change... I'm taking about effect on circumstances of the microscopic world
Prasenjit
how hard could it be to apply nanotechnology against viral infections such HIV or Ebola?
Damian
silver nanoparticles could handle the job?
Damian
not now but maybe in future only AgNP maybe any other nanomaterials
Azam
Hello
Uday
I'm interested in Nanotube
Uday
this technology will not going on for the long time , so I'm thinking about femtotechnology 10^-15
Prasenjit
can nanotechnology change the direction of the face of the world
Prasenjit Reply
At high concentrations (>0.01 M), the relation between absorptivity coefficient and absorbance is no longer linear. This is due to the electrostatic interactions between the quantum dots in close proximity. If the concentration of the solution is high, another effect that is seen is the scattering of light from the large number of quantum dots. This assumption only works at low concentrations of the analyte. Presence of stray light.
Ali Reply
the Beer law works very well for dilute solutions but fails for very high concentrations. why?
bamidele Reply
how did you get the value of 2000N.What calculations are needed to arrive at it
Smarajit Reply
Privacy Information Security Software Version 1.1a
Good
Got questions? Join the online conversation and get instant answers!
QuizOver.com Reply

Get the best Algebra and trigonometry course in your pocket!





Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask