# 1.8 Machine learning lecture 9  (Page 13/13)

The best quadratic function – by best I mean in the sense of generalization error – the hypothesis function – the quadratic function with the lowest generalization error has to have equal or more likely lower generalization error than the best linear function. So by switching to a more complex hypothesis class you can get this first term as you go down. But what I pay for then is that K will increase. By switching to a larger hypothesis class, the first term will go down, but the second term will increase because I now have a larger class of hypotheses and so the second term K will increase.

And so this is sometimes called the bias – this is usually called the bias variance tradeoff. Whereby going to larger hypothesis class maybe I have the hope for finding a better function, that my risk of sort of not fitting my model so accurately also increases, and that’s because – illustrated by the second term going up when the size of your hypothesis, when K goes up. And so speaking very loosely, we can think of this first term as corresponding maybe to the bias of the learning algorithm, or the bias of the hypothesis class. And you can – again speaking very loosely, think of the second term as corresponding to the variance in your hypothesis, in other words how well you can actually fit a hypothesis in the – how well you actually fit this hypothesis class to the data. And by switching to a more complex hypothesis class, your variance increases and your bias decreases.

As a note of warning, it turns out that if you take like a statistics class you’ve seen definitions of bias and variance, which are often defined in terms of squared error or something. It turns out that for classification problems, there actually is no universally accepted formal definition of bias and variance for classification problems. For regression problems, there is this square error definition. For classification problems it turns out there’ve been several competing proposals for definitions of bias and variance. So when I say bias and variance here, think of these as very loose, informal, intuitive definitions, and not formal definitions. Okay. The cartoon associated with intuition I just said would be as follows: Let’s say – and everything about the plot will be for a fixed value of M, for a fixed training set size M. Vertical axis I’ll plot ever and on the horizontal axis I’ll plot model complexity. And by model complexity I mean sort of degree of polynomial, size of your hypothesis class script H etc. It actually turns out, you remember the bandwidth parameter from locally weighted linear regression, that also has a similar effect in controlling how complex your model is. Model complexity [inaudible] polynomial I guess. So the more complex your model, the better your training error, and so your training error will tend to [inaudible]zero as you increase the complexity of your model because the more complete your model the better you can fit your training set.

But because of this bias variance tradeoff, you find that generalization error will come down for a while and then it will go back up. And this regime on the left is when you’re underfitting the data or when you have high bias. And this regime on the right is when you have high variance or you’re overfitting the data. Okay? And this is why a model of sort of intermediate complexity, somewhere here if often preferable to if [inaudible] and minimize generalization error. Okay? So that’s just a cartoon. In the next lecture we’ll actually talk about the number of algorithms for trying to automatically select model complexities, say to get you as close as possible to this minimum – to this area of minimized generalization error. The last thing I want to do is actually going back to the theorem I wrote out, I just want to take that theorem – well, so the theorem I wrote out was an error bound theorem this says for fixed M and delta where probability one minus delta, I get a bound on gamma, which is what this term is. So the very last thing I wanna do today is just come back to this theorem and write out a corollary where I’m gonna fix gamma, I’m gonna fix my error bound, and fix delta and solve for M. And if you do that, you get the following corollary: Let H be fixed with K hypotheses and let any delta and gamma be fixed.

Then in order to guarantee that, let’s say I want a guarantee that the generalization error of the hypothesis I choose with empirical risk minimization, that this is at most two times gamma worse than the best possible error I could obtain with this hypothesis class. Lets say I want this to hold true with probability at least one minus delta, then it suffices that M is [inaudible] to that. Okay? And this is sort of solving for the error bound for M. One thing we’re going to convince yourselves of the easy part of this is if you set that term [inaudible]gamma and solve for M you will get this. One thing I want you to go home and sort of convince yourselves of is that this result really holds true. That this really logically follows from the theorem we’ve proved. In other words, you can take that formula we wrote and solve for M and – because this is the formula you get for M, that’s just – that’s the easy part. That once you go back and convince yourselves that this theorem is a true fact and that it does indeed logically follow from the other one. In particular, make sure that if you solve for that you really get M grading equals this, and why is this M grading that and not M less equal two, and just make sure – I can write this down and it sounds plausible why don’t you just go back and convince yourself this is really true. Okay?

And it turns out that when we prove these bounds in learning theory it turns out that very often the constants are sort of loose. So it turns out that when we prove these bounds usually we’re interested – usually we’re not very interested in the constants, and so I write this as big O of one over gamma squared, log K over delta, and again, the key step in this is that the dependence on M with the size of the hypothesis class is logarithmic. And this will be very important later when we talk about infinite hypothesis classes. Okay? Any questions about this? No? Okay, cool. So next lecture we’ll come back, we’ll actually start from this result again. Remember this. I’ll write this down as the first thing I do in the next lecture and we’ll generalize these to infinite hypothesis classes and then talk about practical algorithms for model spectrum. So I’ll see you guys in a couple days.

[End of Audio]

Duration: 75 minutes

how to know photocatalytic properties of tio2 nanoparticles...what to do now
it is a goid question and i want to know the answer as well
Maciej
Do somebody tell me a best nano engineering book for beginners?
what is fullerene does it is used to make bukky balls
are you nano engineer ?
s.
what is the Synthesis, properties,and applications of carbon nano chemistry
Mostly, they use nano carbon for electronics and for materials to be strengthened.
Virgil
is Bucky paper clear?
CYNTHIA
so some one know about replacing silicon atom with phosphorous in semiconductors device?
Yeah, it is a pain to say the least. You basically have to heat the substarte up to around 1000 degrees celcius then pass phosphene gas over top of it, which is explosive and toxic by the way, under very low pressure.
Harper
Do you know which machine is used to that process?
s.
how to fabricate graphene ink ?
for screen printed electrodes ?
SUYASH
What is lattice structure?
of graphene you mean?
Ebrahim
or in general
Ebrahim
in general
s.
Graphene has a hexagonal structure
tahir
On having this app for quite a bit time, Haven't realised there's a chat room in it.
Cied
what is biological synthesis of nanoparticles
what's the easiest and fastest way to the synthesize AgNP?
China
Cied
types of nano material
I start with an easy one. carbon nanotubes woven into a long filament like a string
Porter
many many of nanotubes
Porter
what is the k.e before it land
Yasmin
what is the function of carbon nanotubes?
Cesar
I'm interested in nanotube
Uday
what is nanomaterials​ and their applications of sensors.
what is nano technology
what is system testing?
preparation of nanomaterial
Yes, Nanotechnology has a very fast field of applications and their is always something new to do with it...
what is system testing
what is the application of nanotechnology?
Stotaw
In this morden time nanotechnology used in many field . 1-Electronics-manufacturad IC ,RAM,MRAM,solar panel etc 2-Helth and Medical-Nanomedicine,Drug Dilivery for cancer treatment etc 3- Atomobile -MEMS, Coating on car etc. and may other field for details you can check at Google
Azam
anybody can imagine what will be happen after 100 years from now in nano tech world
Prasenjit
after 100 year this will be not nanotechnology maybe this technology name will be change . maybe aftet 100 year . we work on electron lable practically about its properties and behaviour by the different instruments
Azam
name doesn't matter , whatever it will be change... I'm taking about effect on circumstances of the microscopic world
Prasenjit
how hard could it be to apply nanotechnology against viral infections such HIV or Ebola?
Damian
silver nanoparticles could handle the job?
Damian
not now but maybe in future only AgNP maybe any other nanomaterials
Azam
Hello
Uday
I'm interested in Nanotube
Uday
this technology will not going on for the long time , so I'm thinking about femtotechnology 10^-15
Prasenjit
can nanotechnology change the direction of the face of the world
how did you get the value of 2000N.What calculations are needed to arrive at it
Privacy Information Security Software Version 1.1a
Good
Berger describes sociologists as concerned with
Got questions? Join the online conversation and get instant answers!