**Air Force Office of Scientific Research**

** 1 July 2011 – 30 June 2014**

** Co-P.I. Dr. Suman Chakravorty**

** Total award $420,000**

Optimal Control is the most general framework for posing and solving sequential decision making problems. Much progress has been made in solving such problems for deterministic systems and very efficient transcription based techniques, where the original infinite dimensional optimization problem is approximated by a finite dimensional nonlinear programming problem. For instance, the pseudo-spectral methods have been devised to solve open loop optimal control problems, with and without constraints. Unfortunately, the same cannot be said about problems under uncertainty. If we assume a stochastic model of uncertainty in the system process model, the sequential optimization problem can be posed as a so-called Markov Decision Problem (MDP) whose solution is given by a stochastic Dynamic Programming (DP) equation.

However, it is also very well known that solutions to the DP problem are subject to Bellman’s famous Curse of dimensionality, i.e. the fact that solution complexity grows exponentially in the dimension of the state-space. This makes solutions to the stochastic DP problem for continuous state and control spaces in particular, tractable only in low dimensional state-spaces, even given the computational resources available today. Moreover, to the best of our knowledge, it is very difficult to consider constraints on such continuous state/control space MDPs. There is also a need to consider the extension of the MDP techniques to multi-agent sequential optimization problems where the control computations for the individual agents need to take place in a collaborative and decentralized fashion, given DoD’s increasing interest in such highly decentralized networked control systems. In addition, if there is sensing uncertainty in the system state, the sequential optimization problem transforms into the so-called “Partially Observed Markov Decion Problem (POMDP)”, whose solution is given by an infinite dimensional Information Space DP problem which is virtually intractable for continuous state-space problems.

Furthermore, we have previously developed a theory of Reinforcement Learning or Approximate Dynamic Programming (ADP) combined with Adaptive Control holds the promise to be effective for controlling various aerospace systems of interest, but has been developed to date for only a specialized class of dynamical systems. We are currently extending this approach to a much more realistic class of dynamical systems.

TECHNICAL OBJECTIVES

- Extend ADP techniques to control of nonlinear, multiple time scale, non-affine systems in an Adaptive Control framework
- Develop solution techniques for MDPs that scale to continuous state and control spaces with constraints
- Extend MDP techniques to solve multi-agent co-ordination and control problems in a decentralized fashion.
- Develop solution techniques that scale to continuous state-space POMDP and their multi-agent generalizations.

Working with me on this program are Graduate Research Assistants:

**Anshu Siddarth**, Ph.D. student**Kenton Kirkpatrick**, Ph.D. student**Elizabeth Rollins**, Ph.D. student**Caroline Dunn**, B.S. student