<< Chapter < Page Chapter >> Page >

Instructor (Andrew Ng) :Okay. Good morning. Just one quick announcement before I start. Poster session, next Wednesday, 8:30 as you already know, and poster boards will be made available soon, so the poster boards we have are 20 inches by 30 inches in case you want to start designing your posters. That’s 20 inches by 30 inches. And they will be available this Friday, and you can pick them up from Nicki Salgudo who’s in Gates 187, so starting this Friday. I’ll send out this information by e-mail as well, in case you don’t want to write it down.

For those you that are SCPD students, if you want to show up here only on Wednesday for the poster session itself, we’ll also have blank posters there, or you’re also welcome to buy your own poster boards. If you do take poster boards from us then please treat them well. For the sake of the environment, we do ask you to give them back at the end of the poster session. We’ll recycle them from year to year. So if you do take one from us, please don’t cut holes in it or anything. So welcome to the last lecture of this course. What I want to do today is tell you about one final class of reinforcement learning algorithms. I just want to say a little bit about POMDPs, the partially observable MDPs, and then the main technical topic for today will be policy search algorithms. I’ll talk about two specific algorithms, essentially called reinforced and called Pegasus, and then we’ll wrap up the class. So if you recall from the last lecture, I actually started to talk about one specific example of a POMDP, which was this sort of linear dynamical system. This is sort of LQR, linear quadratic revelation problem, but I changed it and said what if we only have observations YT. And what if we couldn’t observe the state of the system directly, but had to choose an action only based on some noisy observations that maybe some function of the state?

So our strategy last time was that we said that in the fully observable case, we could choose actions – AT equals two, that matrix LT times ST. So LT was this matrix of parameters that [inaudible] describe the dynamic programming algorithm for finite horizon MDPs in the LQR problem. And so we said if only we knew what the state was, we choose actions according to some matrix LT times the state. And then I said in the partially observable case, we would compute these estimates. I wrote them as S of T given T, which were our best estimate for what the state is given all the observations. And in particular, I’m gonna talk about a Kalman filter which we worked out that our posterior distribution of what the state is given all the observations up to a certain time that was this.

So this is from last time. So that given the observations Y one through YT, our posterior distribution of the current state ST was Gaussian would mean ST given T sigma T given T. So I said we use a Kalman filter to compute this thing, this ST given T, which is going to be our best guess for what the state is currently. And then we choose actions using our estimate for what the state is, rather than using the true state because we don’t know the true state anymore in this POMDP. So it turns out that this specific strategy actually allows you to choose optimal actions, allows you to choose actions as well as you possibly can given that this is a POMDP, and given there are these noisy observations. It turns out that in general finding optimal policies with POMDPs – finding optimal policies for these sorts of partially observable MDPs is an NP-hard problem. Just to be concrete about the formalism of the POMDP – I should just write it here – a POMDP formally is a tuple like that where the changes are the set Y is the set of possible observations, and this O subscript S are the observation distributions. And so at each step, we observe – at each step in the POMDP, if we’re in some state ST, we observe some observation YT drawn from the observation distribution O subscript ST, that there’s an index by what the current state is. And it turns out that computing the optimal policy in a POMDP is an NP-hard problem. For the specific case of linear dynamical systems with the Kalman filter model, we have this strategy of computing the optimal policy assuming full observability and then estimating the states from the observations, and then plugging the two together.

Questions & Answers

what does preconceived mean
sammie Reply
physiological Psychology
Nwosu Reply
How can I develope my cognitive domain
Amanyire Reply
why is communication effective
Dakolo Reply
Communication is effective because it allows individuals to share ideas, thoughts, and information with others.
effective communication can lead to improved outcomes in various settings, including personal relationships, business environments, and educational settings. By communicating effectively, individuals can negotiate effectively, solve problems collaboratively, and work towards common goals.
it starts up serve and return practice/assessments.it helps find voice talking therapy also assessments through relaxed conversation.
miss
Every time someone flushes a toilet in the apartment building, the person begins to jumb back automatically after hearing the flush, before the water temperature changes. Identify the types of learning, if it is classical conditioning identify the NS, UCS, CS and CR. If it is operant conditioning, identify the type of consequence positive reinforcement, negative reinforcement or punishment
Wekolamo Reply
please i need answer
Wekolamo
because it helps many people around the world to understand how to interact with other people and understand them well, for example at work (job).
Manix Reply
Agreed 👍 There are many parts of our brains and behaviors, we really need to get to know. Blessings for everyone and happy Sunday!
ARC
A child is a member of community not society elucidate ?
JESSY Reply
Isn't practices worldwide, be it psychology, be it science. isn't much just a false belief of control over something the mind cannot truly comprehend?
Simon Reply
compare and contrast skinner's perspective on personality development on freud
namakula Reply
Skinner skipped the whole unconscious phenomenon and rather emphasized on classical conditioning
war
explain how nature and nurture affect the development and later the productivity of an individual.
Amesalu Reply
nature is an hereditary factor while nurture is an environmental factor which constitute an individual personality. so if an individual's parent has a deviant behavior and was also brought up in an deviant environment, observation of the behavior and the inborn trait we make the individual deviant.
Samuel
I am taking this course because I am hoping that I could somehow learn more about my chosen field of interest and due to the fact that being a PsyD really ignites my passion as an individual the more I hope to learn about developing and literally explore the complexity of my critical thinking skills
Zyryn Reply
good👍
Jonathan
and having a good philosophy of the world is like a sandwich and a peanut butter 👍
Jonathan
generally amnesi how long yrs memory loss
Kelu Reply
interpersonal relationships
Abdulfatai Reply
What would be the best educational aid(s) for gifted kids/savants?
Heidi Reply
treat them normal, if they want help then give them. that will make everyone happy
Saurabh
What are the treatment for autism?
Magret Reply
hello. autism is a umbrella term. autistic kids have different disorder overlapping. for example. a kid may show symptoms of ADHD and also learning disabilities. before treatment please make sure the kid doesn't have physical disabilities like hearing..vision..speech problem. sometimes these
Jharna
continue.. sometimes due to these physical problems..the diagnosis may be misdiagnosed. treatment for autism. well it depends on the severity. since autistic kids have problems in communicating and adopting to the environment.. it's best to expose the child in situations where the child
Jharna
child interact with other kids under doc supervision. play therapy. speech therapy. Engaging in different activities that activate most parts of the brain.. like drawing..painting. matching color board game. string and beads game. the more you interact with the child the more effective
Jharna
results you'll get.. please consult a therapist to know what suits best on your child. and last as a parent. I know sometimes it's overwhelming to guide a special kid. but trust the process and be strong and patient as a parent.
Jharna
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask