Research

Air Force Office of Scientific Research
1 July 2011 – 30 June 2014
Co-P.I. Dr. Suman Chakravorty
Total award $420,000

Optimal Control is the most general framework for posing and solving sequential decision making problems. Much progress has been made in solving such problems for deterministic systems and very efficient transcription based techniques, where the original infinite dimensional optimization problem is approximated by a finite dimensional nonlinear programming problem. For instance, the pseudo-spectral methods have been devised to solve open loop optimal control problems, with and without constraints. Unfortunately, the same cannot be said about problems under uncertainty. If we assume a stochastic model of uncertainty in the system process model, the sequential optimization problem can be posed as a so-called Markov Decision Problem (MDP) whose solution is given by a stochastic Dynamic Programming (DP) equation.

However, it is also very well known that solutions to the DP problem are subject to Bellman’s famous Curse of dimensionality, i.e. the fact that solution complexity grows exponentially in the dimension of the state-space. This makes solutions to the stochastic DP problem for continuous state and control spaces in particular, tractable only in low dimensional state-spaces, even given the computational resources available today. Moreover, to the best of our knowledge, it is very difficult to consider constraints on such continuous state/control space MDPs. There is also a need to consider the extension of the MDP techniques to multi-agent sequential optimization problems where the control computations for the individual agents need to take place in a collaborative and decentralized fashion, given DoD’s increasing interest in such highly decentralized networked control systems. In addition, if there is sensing uncertainty in the system state, the sequential optimization problem transforms into the so-called “Partially Observed Markov Decion Problem (POMDP)”, whose solution is given by an infinite dimensional Information Space DP problem which is virtually intractable for continuous state-space problems.

Furthermore, we have previously developed a theory of Reinforcement Learning or Approximate Dynamic Programming (ADP) combined with Adaptive Control holds the promise to be effective for controlling various aerospace systems of interest, but has been developed to date for only a specialized class of dynamical systems. We are currently extending this approach to a much more realistic class of dynamical systems.

TECHNICAL OBJECTIVES

Extend ADP techniques to control of nonlinear, multiple time scale, non-affine systems in an Adaptive Control framework
Develop solution techniques for MDPs that scale to continuous state and control spaces with constraints
Extend MDP techniques to solve multi-agent co-ordination and control problems in a decentralized fashion.
Develop solution techniques that scale to continuous state-space POMDP and their multi-agent generalizations.

Working with me on this program are Graduate Research Assistants:

Anshu Siddarth, Ph.D. student
Kenton Kirkpatrick, Ph.D. student
Elizabeth Rollins, Ph.D. student
Caroline Dunn, B.S. student

Air Force Office of Scientific Research
1 January 2008 – 30 November 2010
Co-P.I. Dr. Suman Chakravorty
Total award $450,000

This project investigates a creative and bioinspired theory of learning control which is capable of addressing the essential functionalities of a morphing Micro Air Vehicle (MAV), and which is also extensible to capabilities such as flapping and perching. The objective is to address the optimal shape control of an entire air vehicle configuration as a function of flight condition, not just simple changes such as wing sweep angle or incidence angle. The project spans theory to computation to experiment, and incorporates machine learning concepts integrated with model reference adaptive control. It uses nonlinear synthesis and simulation models of appropriate fidelity validated and verified with a hardware testbed, and culminates in a flight test demonstration.

The Defense Advanced Research Projects Agency (DARPA) defines a morphing air vehicle as a platform that is able to change its state substantially (on the order of 50%) to adapt to changing mission environments, thereby providing a superior system capability that is not possible without reconfiguration. In the context of intelligent systems, three essential functionalities of a practical morphing air vehicle are:

When to reconfigure
How to reconfigure
Learning to reconfigure

When to reconfigure is a major issue, as the ability for a given air vehicle to successfully perform multiple missions can directly be attributed to shape, at least if aerodynamic performance is the primary consideration. Each task or mission has an ideal or optimal vehicle shape, e.g. configuration. However, this optimality criteria may not be known over the entire flight envelope in actual practice, and the mission may be modified or completely changed during operation. How to reconfigure is a problem of sensing, actuation, and control. It is important and challenging since large shape changes produce time-varying vehicle properties, and especially, time-varying moments and products of inertia. The controller must therefore be sufficiently robust to handle these potentially wide variations. Learning to reconfigure is perhaps the most challenging of the three functionalities, and the one which has received the least attention. Even if optimal shapes are known, the actuation scheme(s) to produce them may be only poorly understood, or not understood at all; life long learning for reconfiguration strategies provide a robust evolutionary response to changing needs and missions. This permits the vehicle to be more survivable, and multi-role.

Our approach combines Machine Learning and Adaptive Dynamic Inversion Control, and is called Adaptive-Reinforcement Learning Control (A-RLC). A-RLC is a control architecture and methodology for systems with a high degree of reconfigurability, such as changing shape during flight, flapping, perching, or morphing. The key difference between our approach and the very few existing approaches to morphing control lies with how learning is used. Morphing research reported in the current literature focuses on structures and actuation of at most three degrees of morphing freedom. For a morphing MAV, even if an optimal control law is known, the actuation scheme(s) to produce this capability may be only poorly understood, or not understood at all. A-RLC is capable of addressing the optimal shape control of an entire air vehicle configuration as a function of flight condition, not just simple changes such as wing sweep angle or incidence angle. A-RLC uses Structured Adaptive Model Inversion as the trajectory tracking controller for handling time-varying time varying inertias, large variations in aerodynamic and structural properties, parametric uncertainties, and disturbances. A-RLC uses Reinforcement Learning for learning the optimality relations between the operating conditions and the desired shape, over the lifespan of the vehicle. The Reinforcement Learning module has no prior knowledge of the relationship between commands and the dimensions of the vehicle, and it does not know the relationship between the flight conditions, costs and the optimal shapes. However, the Reinforcement Learning module does know the set of all possible inputs that can be applied. From complete ignorance of the system dynamics and actuation, A-RLC is capable of learning the optimal control policy (commands) which produce the optimal shape as a function of flight condition, while maintaining accurate flight path tracking. In addition, the Reinforcement Learning module of A-RLC can function in real-time, which results in robustness with respect to model errors and environmental disturbances during system operation. Our preliminary research has demonstrated that A-RLC works well for several nonlinear, time-varying, aerodynamically effected models. Key issues we will investigate are learning and control of the morphing, aeroelastic effects, hysteretic effects, and structural effects of the high fidelity, biologically inspired models developed in this research program.

Working with me on this program are Graduate Research Assistants:

Amanda Lampton, Ph.D. student
Anshu Narang, Ph.D student
Adam Niksch, M.S. student
Kenton Kirkpatrick, M.S. student
Monika Marwaha, M.S. student

and Undergraduate Research Assistants:

Brian Eisenbeis
Clark Moody
Claire Hazelbaker

Machine Learning Control of Nonlinear, High Dimensional, Reconfigurable Systems

Machine Learning Control of Morphing Micro Air Vehicles