<< Chapter < Page Chapter >> Page >

Instructor (Andrew Ng) :Okay. Good morning. Just one quick announcement before I start. Poster session, next Wednesday, 8:30 as you already know, and poster boards will be made available soon, so the poster boards we have are 20 inches by 30 inches in case you want to start designing your posters. That’s 20 inches by 30 inches. And they will be available this Friday, and you can pick them up from Nicki Salgudo who’s in Gates 187, so starting this Friday. I’ll send out this information by e-mail as well, in case you don’t want to write it down.

For those you that are SCPD students, if you want to show up here only on Wednesday for the poster session itself, we’ll also have blank posters there, or you’re also welcome to buy your own poster boards. If you do take poster boards from us then please treat them well. For the sake of the environment, we do ask you to give them back at the end of the poster session. We’ll recycle them from year to year. So if you do take one from us, please don’t cut holes in it or anything. So welcome to the last lecture of this course. What I want to do today is tell you about one final class of reinforcement learning algorithms. I just want to say a little bit about POMDPs, the partially observable MDPs, and then the main technical topic for today will be policy search algorithms. I’ll talk about two specific algorithms, essentially called reinforced and called Pegasus, and then we’ll wrap up the class. So if you recall from the last lecture, I actually started to talk about one specific example of a POMDP, which was this sort of linear dynamical system. This is sort of LQR, linear quadratic revelation problem, but I changed it and said what if we only have observations YT. And what if we couldn’t observe the state of the system directly, but had to choose an action only based on some noisy observations that maybe some function of the state?

So our strategy last time was that we said that in the fully observable case, we could choose actions – AT equals two, that matrix LT times ST. So LT was this matrix of parameters that [inaudible] describe the dynamic programming algorithm for finite horizon MDPs in the LQR problem. And so we said if only we knew what the state was, we choose actions according to some matrix LT times the state. And then I said in the partially observable case, we would compute these estimates. I wrote them as S of T given T, which were our best estimate for what the state is given all the observations. And in particular, I’m gonna talk about a Kalman filter which we worked out that our posterior distribution of what the state is given all the observations up to a certain time that was this.

So this is from last time. So that given the observations Y one through YT, our posterior distribution of the current state ST was Gaussian would mean ST given T sigma T given T. So I said we use a Kalman filter to compute this thing, this ST given T, which is going to be our best guess for what the state is currently. And then we choose actions using our estimate for what the state is, rather than using the true state because we don’t know the true state anymore in this POMDP. So it turns out that this specific strategy actually allows you to choose optimal actions, allows you to choose actions as well as you possibly can given that this is a POMDP, and given there are these noisy observations. It turns out that in general finding optimal policies with POMDPs – finding optimal policies for these sorts of partially observable MDPs is an NP-hard problem. Just to be concrete about the formalism of the POMDP – I should just write it here – a POMDP formally is a tuple like that where the changes are the set Y is the set of possible observations, and this O subscript S are the observation distributions. And so at each step, we observe – at each step in the POMDP, if we’re in some state ST, we observe some observation YT drawn from the observation distribution O subscript ST, that there’s an index by what the current state is. And it turns out that computing the optimal policy in a POMDP is an NP-hard problem. For the specific case of linear dynamical systems with the Kalman filter model, we have this strategy of computing the optimal policy assuming full observability and then estimating the states from the observations, and then plugging the two together.

Questions & Answers

What are the factors that affect demand for a commodity
Florence Reply
differentiate between demand and supply giving examples
Lambiv Reply
differentiated between demand and supply using examples
Lambiv
what is labour ?
Lambiv
how will I do?
Venny Reply
how is the graph works?I don't fully understand
Rezat Reply
information
Eliyee
devaluation
Eliyee
t
WARKISA
hi guys good evening to all
Lambiv
multiple choice question
Aster Reply
appreciation
Eliyee
explain perfect market
Lindiwe Reply
In economics, a perfect market refers to a theoretical construct where all participants have perfect information, goods are homogenous, there are no barriers to entry or exit, and prices are determined solely by supply and demand. It's an idealized model used for analysis,
Ezea
What is ceteris paribus?
Shukri Reply
other things being equal
AI-Robot
When MP₁ becomes negative, TP start to decline. Extuples Suppose that the short-run production function of certain cut-flower firm is given by: Q=4KL-0.6K2 - 0.112 • Where is quantity of cut flower produced, I is labour input and K is fixed capital input (K-5). Determine the average product of lab
Kelo
Extuples Suppose that the short-run production function of certain cut-flower firm is given by: Q=4KL-0.6K2 - 0.112 • Where is quantity of cut flower produced, I is labour input and K is fixed capital input (K-5). Determine the average product of labour (APL) and marginal product of labour (MPL)
Kelo
yes,thank you
Shukri
Can I ask you other question?
Shukri
what is monopoly mean?
Habtamu Reply
What is different between quantity demand and demand?
Shukri Reply
Quantity demanded refers to the specific amount of a good or service that consumers are willing and able to purchase at a give price and within a specific time period. Demand, on the other hand, is a broader concept that encompasses the entire relationship between price and quantity demanded
Ezea
ok
Shukri
how do you save a country economic situation when it's falling apart
Lilia Reply
what is the difference between economic growth and development
Fiker Reply
Economic growth as an increase in the production and consumption of goods and services within an economy.but Economic development as a broader concept that encompasses not only economic growth but also social & human well being.
Shukri
production function means
Jabir
What do you think is more important to focus on when considering inequality ?
Abdisa Reply
any question about economics?
Awais Reply
sir...I just want to ask one question... Define the term contract curve? if you are free please help me to find this answer 🙏
Asui
it is a curve that we get after connecting the pareto optimal combinations of two consumers after their mutually beneficial trade offs
Awais
thank you so much 👍 sir
Asui
In economics, the contract curve refers to the set of points in an Edgeworth box diagram where both parties involved in a trade cannot be made better off without making one of them worse off. It represents the Pareto efficient allocations of goods between two individuals or entities, where neither p
Cornelius
In economics, the contract curve refers to the set of points in an Edgeworth box diagram where both parties involved in a trade cannot be made better off without making one of them worse off. It represents the Pareto efficient allocations of goods between two individuals or entities,
Cornelius
Suppose a consumer consuming two commodities X and Y has The following utility function u=X0.4 Y0.6. If the price of the X and Y are 2 and 3 respectively and income Constraint is birr 50. A,Calculate quantities of x and y which maximize utility. B,Calculate value of Lagrange multiplier. C,Calculate quantities of X and Y consumed with a given price. D,alculate optimum level of output .
Feyisa Reply
Answer
Feyisa
c
Jabir
the market for lemon has 10 potential consumers, each having an individual demand curve p=101-10Qi, where p is price in dollar's per cup and Qi is the number of cups demanded per week by the i th consumer.Find the market demand curve using algebra. Draw an individual demand curve and the market dema
Gsbwnw Reply
suppose the production function is given by ( L, K)=L¼K¾.assuming capital is fixed find APL and MPL. consider the following short run production function:Q=6L²-0.4L³ a) find the value of L that maximizes output b)find the value of L that maximizes marginal product
Abdureman
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask