Decision Making in an Unknown and Changing World: Decentralized Multiuser Online Learning
Jan 07, 2013
from 01:00 PM to 02:00 PM
|Where||Engr. IV Bldg., Shannon Room 54-134|
|Contact Name||Prof. Abeer Alwan|
|Add event to calendar||
University of Michigan
In this talk I will discuss a type of online learning commonly used when dealing with sequential decision making under uncertainty in an environment. A common feature shared by this type of learning algorithms is to use a combination of "exploration", where the user samples different options to find out good actions to take, and "exploitation", where the user takes what he/she believes to be good actions to maximize his/her gain. The design of a good algorithm often boils down to determining when to explore, when to exploit and how. This can be challenging depending on the nature of the environment and the associated uncertainty. This process is made even more complex when there are multiple such users, uncoordinated, present in the system.
Within this context we are particularly interested in the class of multiarmed bandit problems, where an environment may be described by a collection of arbitrary finite-state Markov chains with unknown transition probability matrices, each representing an option for the user with unknown rewards. The performance of a learning algorithm is often measured by the notion of regret, the difference in total rewards between the algorithm and a benchmark. In this talk we will consider both weak regret, when the benchmark is the best single-action policy (or the best static policy), and strong regret, when the benchmark is given by the best dynamic policy when statistics of the underlying Markov chains are completely known. We show a sequence of algorithms that achieve weak or strong regret logarithmic in time, and highlight the difference in computational requirement under different regret measures. We also show how to modify these algorithms such that multiple uncoordinated users with competing interest can simultaneously learn in the same environment. Our motivating applications include dynamic spectrum access and pricing design in an unknown market.
Mingyan Liu received her Ph.D. Degree in electrical engineering from the University of Maryland, College Park, in 2000 and has since been with the Department of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor, where she is currently a Professor. Her research interests are in optimal resource allocation, performance modeling and analysis, and energy efficient design of wireless, mobile ad hoc, and sensor networks. She is the recipient of the 2002 NSF CAREER Award, the University of Michigan Elizabeth C. Crosby Research Award in 2003, and the 2010 EECS Department Outstanding Achievement Award. She also received a Best Paper Award at the International Conference on Information Processing in Sensor Networks (IPSN) in 2012. She serves/has served on the editorial board of IEEE/ACM Trans. Networking, IEEE Trans. Mobile Computing, and ACM Trans. Sensor Networks.