Speaker: Prof. Yuanzhi Li
Affiliation: CMU, Machine Learning Department
Via Zoom only: https://ucla.zoom.us/j/99930463780
Zoom Meeting ID: 999 3046 3780
Abstract: Deep learning, as large scale, multi-layer learning methods, have overwhelmed traditional shallow learners across a great variety of applications. However, despite the convincing empirical success, there has not yet been a comprehensive theorem explaining why deep learners are actually better than shallow ones. In this talk, I will present some preliminary steps towards understanding the principles behind the powers of deep learners, as well as why shallow ones are not blessed with such powers. As a concrete example, we compare the power of a three-layer ResNet and the prevailing traditional one-layer shallow learners: The kernel method, and linear regression over (prescribed) feature mappings. We show that the former multi-layer model can provably and efficiently learn certain concept classes where the latter shallow learners cannot. We also identify the critical reason behind this advantage: The three-layer ResNet, trained by SGD from random initialization, can perform hierarchical learning to learn these concept classes. On the other hand, the two shallow learners can only perform one-shot learning: For this reason, when the shallow learners have more parameters than the ResNet, they cannot use these parameters efficiently during the training process and thus fail to learn the concept classes.
Biography: Yuanzhi Li is an assistant professor at CMU, Machine Learning Department. He received his Ph.D. at Princeton, under the advice of Sanjeev Arora (2014-2018), and a one-year postdoc at Stanford.
For more information, contact Prof. Lin Yang ()
Date(s) - Jul 06, 2020
12:00 pm - 1:30 pm