Distributed Stochastic Optimization in Non-Differentiable and Non-Convex Environments

By Bilderback, Dayna

June 17, 2019

Speaker: Stefan Vlaski
Affiliation: UCLA Ph.D. Candidate

Abstract: The first part of this dissertation considers distributed learning problems over networked agents. The general objective of distributed adaptation and learning is the solution of global, stochastic optimization problems through localized interactions and without information about the statistical properties of the data. Regularization is a useful technique to encourage or enforce structural properties of the resulting solution, such as sparsity or constraints. A substantial number of regularizers are inherently non-smooth, while many cost functions are differentiable. We propose distributed and adaptive strategies, that are able to minimize aggregate sums of objectives. In doing so, we exploit the structure of the individual objectives as sums of differentiable costs and non-differentiable regularizers. Resulting algorithms are adaptive in nature and able to continuously track drifts in the problem; their recursions, however, are subject to persistent perturbations arising from the stochastic nature of the gradient approximations, and from disagreement across agents in the network. The presence of non-smooth, and potentially unbounded, regularizers enriches the dynamics of these recursions. We quantify the impact of this interplay and draw implications for steady-state performance as well as algorithm design and present applications in distributed machine learning and image reconstruction.

Driven by the need to solve increasingly complex optimization problems in signal processing and machine learning, there has also been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. Most available works on distributed non-convex optimization problems focus on the deterministic setting where exact gradients are available at each agent. In this work, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning algorithm continues to yield meaningful estimates in these more challenging, non-convex environments, in the sense that (a) despite the distributed implementation, individual agents cluster in a small region around the network centroid, and (b) the network centroid inherits many properties of the centralized, stochastic gradient descent recursion, including the escape from strict saddle points in O(1/μ) iterations and return of approximately second-order stationary points in a polynomial number of iterations.

In the second part of this thesis, we consider centralized learning problems over networked feature spaces. Rapidly growing capabilities to observe, collect and process ever increasing quantities of information, necessitate methods for identifying and exploiting structure in high-dimensional feature spaces. Networks, frequently referred to as graphs in this context, have emerged as a useful tool for modeling interrelations among different parts of a data set. We consider graph signals that evolve dynamically according to a heat diffusion process and are subject to persistent perturbations. The model is not limited to heat diffusion but can be applied to modeling other processes such as the evolution of interest over social networks and the movement of people in cities. We develop an online algorithm that is able to learn the underlying graph structure from observations of the signal evolution and derive expressions for its performance. The algorithm is adaptive in nature and in particular able to respond to changes in the graph structure and the perturbation statistics.

In order to incorporate prior graphical knowledge to improve classification performance, we propose a BRAIN strategy for learning, which enhances the performance of traditional algorithms, such as logistic regression and SVM learners, by incorporating a graphical layer that tracks and learns in real-time the underlying correlation structure among feature subspaces. In this way, the algorithm is able to identify salient subspaces and their correlations, while simultaneously dampening the effect of irrelevant features.

Biography: Stefan Vlaski is a Ph.D. candidate in the UCLA Electrical and Computer Engineering Department under the supervision of Professor Ali. H. Sayed. He received his B.S. degree from Technical University Darmstadt in 2013, the M.S. degree from UCLA in 2014 and has interned at Apple Inc. as well as Amazon Lab126. His research interests focus on the development and study of learning algorithms with a particular emphasis on adaptive and distributed solutions.

For more information, contact Prof. Ali H. Sayed (sayed@ee.ucla.edu)

Date/Time:
Date(s) - Jun 17, 2019
4:00 pm - 6:00 pm

Location:
E-IV Faraday Room #67-124
420 Westwood Plaza - 6th Flr., Los Angeles CA 90095