[10-14] Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning
Title:Optimization, Generalization and Implicit bias of Gradient Methods in Deep Learning
Venue:线下：5号楼三层 334报告厅，线上：腾讯会议 208849802
Abstract:Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly nonconvex optimization problem,simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly, can generalize well to out-of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that the optimization algorithms (various gradient-based methods) contribute greatly to the generalization properties of deep learning. However, recently, researchers have found that gradient methods (even gradient descent) may not converge to a stationary point, the loss graduately decreases but not necessarily monotonically, and the sharpness of the loss landscape (i.e., the max eigenvalue of the Hessian) may oscillate, entering a regime called edge of stability. These behaviors are inconsistent with several classical presumptions widely studied in the field of optimization. Moreover, what bias is introduced by the gradient-based algorithms in neural network training? What characteristics of the training ensures good generalization in deep learning? In this talk, we investigate these question from the perspective of the gradient based optimization methods. In particular, we attempt to explain some of the behaviors of the optimization trajectory (e.g., edge of stability), prove new generalization bounds and investigate the implicit bias of various gradient methods.
Bio:李建目前是清华大学交叉信息研究院（长聘）副教授，博士生导师。他在中山大学取得的学士学位和复旦大学取得的硕士学位，马里兰大学博士毕业。他的研究兴趣主要包括算法设计与分析，机器学习，数据库，金融科技。他已经在主流国际会议和杂志上发表了100余篇论文等，并获得了 VLDB 2009 和 ESA 2010 的最佳论文奖，ICDT 2017最佳新人奖，清华221基础研究青年人才支持计划，教育部新世纪人才支持计划，国家自然科学基金优秀青年基金。他主持并参与了多项科研项目，包括自然科学基金青年基金，面上项目，中以国际合作项目，青年973计划等，以及多个企业级合作项目，包括蚂蚁金服、华泰证券、易方达、微软、百度、滴滴等。