Team Work and Exploration in Reinforcement Learning

Speaker: Lucas Cassano
Affiliation: Ph.D. Candidate

       Via Zoom Only:     https://epfl.zoom.us/j/4684474708

Abstract: In this dissertation, we study exploration and teamwork in cooperative multi-agent reinforcement learning (MARL). Several challenges arise when considering collaborative MARL. One of these challenges is decentralization. In many cases, due to design constraints, it is undesirable or inconvenient to constantly relay data between agents and a centralized location back and forth. Therefore, fully distributed solutions become preferable. The first part of this dissertation addresses the challenge of designing fully decentralized MARL algorithms. We consider two problems: policy evaluation and policy optimization. To address the policy evaluation problem, we introduce Fast Diffusion for Policy Evaluation (FDPE), an algorithm that converges at a faster rate than previous solutions. We then consider the policy optimization problem, where the objective is for all agents to learn an optimal team policy, for this case we introduce the Diffusion for Team Policy Optimization (DTPO). DTPO is more data efficient than previous algorithms and does not converge to Nash equilibria. For both of these cases, we provide experimental studies that show the effectiveness of the proposed methods.

Another challenge that arises in collaborative MARL, is that of scalability. The parameters that need to be estimated when full team policies are learned, grow exponentially with the number of agents. Hence, algorithms that learn joint team policies quickly become intractable. A solution to this problem is for each agent to learn an individual policy, such that the resulting joint team policy is optimal. This problem has been the object of much research lately. However, most solution methods are data inefficient and often make unrealistic assumptions that greatly limit the applicability of these approaches. To address this problem we introduce Logical Team Q-learning (LTQL), an algorithm that learns factored policies in a data efficient manner and is applicable to any cooperative MARL environment. We show that LTQL outperforms previous methods in a range of environments.

Another challenge is that of efficient exploration. This is a problem both in the single-agent and multi-agent settings, although in MARL it becomes more severe due to the larger state-action space. The challenge of deriving policies that are efficient at exploring the state space has been addressed in many recent works. However, most of these approaches rely on heuristics, and more importantly, they consider the problem of exploring the state space separately from that of learning an optimal policy. To address this challenge, we introduce the Information Seeking Learner (ISL), an algorithm that displays state of the art performance in difficult exploration benchmarks. The fundamental value of our work on exploration is that we take a fundamentally different approach from previous works. As opposed to earlier methods we consider the problem of exploring the state space and learning an optimal policy jointly. The main insight of our approach is that in RL, obtaining point estimates of the quantities of interest is not sufficient and confidence bound estimates are also necessary.

Biography: Lucas Cassano is a Ph.D. candidate in the UCLA Electrical and Computer Engineering Department under the supervision of Professor Ali. H. Sayed. He received his Electronics Engineer degree from Buenos Aires Institute of Technology in 2013, the M.S. degree from UCLA in 2015 and has worked as an engineer in Satellogic and Mojix. His research interests focus on Reinforcement Learning.

For more information, contact Prof. Ali H. Sayed (sayed@ee.ucla.edu)

Date/Time:
Date(s) - Jul 02, 2020
9:00 am - 11:00 am

Location:
Map Unavailable