The Art of Sequential Optimization via Simulations

By Bilderback, Dayna

November 6, 2015

Speaker: Prof. Rahul Jain
Affiliation: USC

Abstract:

I will start by talking about a natural framework for simulation-based optimization and control of MDP models. The idea is very simple: Replace the Bellman operator by its ‘empirical’ variant wherein expectation is replaced by a sample average approximation. This leads to a random Bellman operator in the dynamic programming equations. We introduce several notions of probabilistic fixed points of such random operators and show their asymptotic equivalence. We establish convergence of empirical Value and Policy Iteration algorithms by a stochastic dominance argument. The idea can be generalized to asynchronous dynamic programming leading to a ‘new’ reinforcement learning algorithm that ‘converges’ much faster than traditional RL algorithms such as Q-Learning and other stochastic approximation schemes. We also show how the ‘empirical’ DP method can be combined with state space sampling and function approximation for solving continuous state space MDP problems. The mathematical technique introduced is useful for analyzing other iterated random operators. Numerical results show better convergence rate and actual runtime performance than QL and other commonly used schemes.

Biography:

Rahul Jain is the K. C. Dahlberg Early Career Chair and (since spring 2013) an Associate Professor in the EE, CS & ISE Departments at the University of Southern California, Los Angeles. He received his B.Tech from IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC in the Fall 2008, he was at the IBM T. J. Watson Research Center, Yorktown Heights, NY. He is a recipient of the NSF CAREER award in 2010, the ONR Young Investigator award in 2012, an IBM Faculty award in 2010, and the James H. Zumberge Faculty Research and Innovation Award in 2009. His main interests are in stochastic models, statistical learning and game theory. Of late, he has also been working on statistical learning, queueing theory, risk-aware stochastic optimization, and power system economics. He also has recent interest in scheduling problems in healthcare and has been working with a number of hospitals.

For more information, contact Prof. Suhas Diggavi (suhasdiggavi@ucla.edu)

Date/Time:
Date(s) - Nov 06, 2015
12:00 pm - 1:00 pm

Location:
E-IV Faraday Room #67-124
420 Westwood Plaza - 6th Flr., Los Angeles CA 90095