The presentation stresses important differences between machine learning and conventional optimisation approaches and proposes some solutions. The first part discusses the the interaction of two kind of asympotic properties: those of the statistics and those of optimization algorithm. Unlikely optimization algorithm such as stochastic gradient show amazing performance for large-scale machine learning problems. The second part shows how the deeper causes of this performance suggests the theoretical possibility learn large-scale problems with a single pass over the data. Practical algorithms will be discussed: various second order stochastic gradients, averaging methods, dual methods with data reprocessing
Attribution: The Open Education Consortium
http://www.ocwconsortium.org/courses/view/a057c346ff52e7efb871d9939e099da5/
Course Home http://videolectures.net/opt08_bottou_lsml/