1.19 Machine learning lecture 20

Machine learning Page 1 / 14

Instructor (Andrew Ng) :Okay. Good morning. Just one quick announcement before I start. Poster session, next Wednesday, 8:30 as you already know, and poster boards will be made available soon, so the poster boards we have are 20 inches by 30 inches in case you want to start designing your posters. That’s 20 inches by 30 inches. And they will be available this Friday, and you can pick them up from Nicki Salgudo who’s in Gates 187, so starting this Friday. I’ll send out this information by e-mail as well, in case you don’t want to write it down.

For those you that are SCPD students, if you want to show up here only on Wednesday for the poster session itself, we’ll also have blank posters there, or you’re also welcome to buy your own poster boards. If you do take poster boards from us then please treat them well. For the sake of the environment, we do ask you to give them back at the end of the poster session. We’ll recycle them from year to year. So if you do take one from us, please don’t cut holes in it or anything. So welcome to the last lecture of this course. What I want to do today is tell you about one final class of reinforcement learning algorithms. I just want to say a little bit about POMDPs, the partially observable MDPs, and then the main technical topic for today will be policy search algorithms. I’ll talk about two specific algorithms, essentially called reinforced and called Pegasus, and then we’ll wrap up the class. So if you recall from the last lecture, I actually started to talk about one specific example of a POMDP, which was this sort of linear dynamical system. This is sort of LQR, linear quadratic revelation problem, but I changed it and said what if we only have observations YT. And what if we couldn’t observe the state of the system directly, but had to choose an action only based on some noisy observations that maybe some function of the state?

So our strategy last time was that we said that in the fully observable case, we could choose actions – AT equals two, that matrix LT times ST. So LT was this matrix of parameters that [inaudible] describe the dynamic programming algorithm for finite horizon MDPs in the LQR problem. And so we said if only we knew what the state was, we choose actions according to some matrix LT times the state. And then I said in the partially observable case, we would compute these estimates. I wrote them as S of T given T, which were our best estimate for what the state is given all the observations. And in particular, I’m gonna talk about a Kalman filter which we worked out that our posterior distribution of what the state is given all the observations up to a certain time that was this.

So this is from last time. So that given the observations Y one through YT, our posterior distribution of the current state ST was Gaussian would mean ST given T sigma T given T. So I said we use a Kalman filter to compute this thing, this ST given T, which is going to be our best guess for what the state is currently. And then we choose actions using our estimate for what the state is, rather than using the true state because we don’t know the true state anymore in this POMDP. So it turns out that this specific strategy actually allows you to choose optimal actions, allows you to choose actions as well as you possibly can given that this is a POMDP, and given there are these noisy observations. It turns out that in general finding optimal policies with POMDPs – finding optimal policies for these sorts of partially observable MDPs is an NP-hard problem. Just to be concrete about the formalism of the POMDP – I should just write it here – a POMDP formally is a tuple like that where the changes are the set Y is the set of possible observations, and this O subscript S are the observation distributions. And so at each step, we observe – at each step in the POMDP, if we’re in some state ST, we observe some observation YT drawn from the observation distribution O subscript ST, that there’s an index by what the current state is. And it turns out that computing the optimal policy in a POMDP is an NP-hard problem. For the specific case of linear dynamical systems with the Kalman filter model, we have this strategy of computing the optimal policy assuming full observability and then estimating the states from the observations, and then plugging the two together.

Questions & Answers

What are the factors that affect demand for a commodity

Florence Reply

differentiate between demand and supply giving examples

Lambiv Reply

differentiated between demand and supply using examples

Lambiv

what is labour ?

Lambiv

how will I do?

Venny Reply

how is the graph works?I don't fully understand

Rezat Reply

information

Eliyee

devaluation

Eliyee

WARKISA

hi guys good evening to all

Lambiv

multiple choice question

Aster Reply

appreciation

Eliyee

explain perfect market

Lindiwe Reply

In economics, a perfect market refers to a theoretical construct where all participants have perfect information, goods are homogenous, there are no barriers to entry or exit, and prices are determined solely by supply and demand. It's an idealized model used for analysis,

Ezea

What is ceteris paribus?

Shukri Reply

other things being equal

AI-Robot

When MP₁ becomes negative, TP start to decline. Extuples Suppose that the short-run production function of certain cut-flower firm is given by: Q=4KL-0.6K2 - 0.112 • Where is quantity of cut flower produced, I is labour input and K is fixed capital input (K-5). Determine the average product of lab

Kelo

Extuples Suppose that the short-run production function of certain cut-flower firm is given by: Q=4KL-0.6K2 - 0.112 • Where is quantity of cut flower produced, I is labour input and K is fixed capital input (K-5). Determine the average product of labour (APL) and marginal product of labour (MPL)

Kelo

yes,thank you

Shukri

Can I ask you other question?

Shukri

what is monopoly mean?

Habtamu Reply

What is different between quantity demand and demand?

Shukri Reply

Quantity demanded refers to the specific amount of a good or service that consumers are willing and able to purchase at a give price and within a specific time period. Demand, on the other hand, is a broader concept that encompasses the entire relationship between price and quantity demanded

Ezea

Shukri

how do you save a country economic situation when it's falling apart

Lilia Reply

what is the difference between economic growth and development

Fiker Reply

Economic growth as an increase in the production and consumption of goods and services within an economy.but Economic development as a broader concept that encompasses not only economic growth but also social & human well being.

Shukri

production function means

Jabir

What do you think is more important to focus on when considering inequality ?

Abdisa Reply

any question about economics?

Awais Reply

sir...I just want to ask one question... Define the term contract curve? if you are free please help me to find this answer 🙏

Asui

it is a curve that we get after connecting the pareto optimal combinations of two consumers after their mutually beneficial trade offs

Awais

thank you so much 👍 sir

Asui

In economics, the contract curve refers to the set of points in an Edgeworth box diagram where both parties involved in a trade cannot be made better off without making one of them worse off. It represents the Pareto efficient allocations of goods between two individuals or entities, where neither p

Cornelius

Suppose a consumer consuming two commodities X and Y has The following utility function u=X0.4 Y0.6. If the price of the X and Y are 2 and 3 respectively and income Constraint is birr 50. A,Calculate quantities of x and y which maximize utility. B,Calculate value of Lagrange multiplier. C,Calculate quantities of X and Y consumed with a given price. D,alculate optimum level of output .

Feyisa Reply

Answer

Feyisa

Jabir

the market for lemon has 10 potential consumers, each having an individual demand curve p=101-10Qi, where p is price in dollar's per cup and Qi is the number of cups demanded per week by the i th consumer.Find the market demand curve using algebra. Draw an individual demand curve and the market dema

Gsbwnw Reply

suppose the production function is given by ( L, K)=L¼K¾.assuming capital is fixed find APL and MPL. consider the following short run production function:Q=6L²-0.4L³ a) find the value of L that maximizes output b)find the value of L that maximizes marginal product

Abdureman

Got questions? Join the online conversation and get instant answers!

Jobilize.com Reply

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask

	28 AP 28 Development Inheritance Essay By OpenStax Start Flashcards
	3 AP Key Terms 03 Cellular Level of Organization By OpenStax Start Key Terms
©flickr: U.S.	Biology Chapter 9 By Michael Sag Start Exam
	SCEA for Java EE Study Guide By Edward Biton Start Quiz
©flickr: Gisela	Electrocardiogram Quiz By Anonymous User Start Quiz
	1 AP 01 Human Body Anatomy Physiology MCQ By OpenStax Start Quiz
	2011 Dynamics CRM By Danielrosenberger Start Quiz
	16 AP 16 Neurological MCQ Exam By OpenStax Start Quiz
	Spanish (Places) By Inderjeet Brar Start Flashcards
	Spreadsheets MCQ By Ryan Lowe Start Quiz