Dr. John Valasek, Professor in the Department of Aerospace Engineering at Texas A&M University and Director of the Vehicle Systems & Control Laboratory, gave a virtual seminar titled “Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time” for the AI Seminar Series hosted by the NASA Jet Propulsion Laboratory (JPL). The date of the seminar was 18 August 2021.
Cycle-of-Learning (CoL) (https://www.youtube.com/watch?v=AQwsk6kZfok) was presented as a framework using an actor-critic architecture with a loss function that combines behavior cloning and 1-step Q-learning losses with an off-policy pre-training step from human demonstrations. This enables transition from behavior cloning to reinforcement learning without performance degradation and improves reinforcement learning in terms of overall performance and training time. This approach is shown to outperform state-of-the-art techniques for combining behavior cloning and reinforcement learning, for both dense and sparse reward scenarios. Results are presented for haptic and eye tracking input modalities, and suggest that directly including the behavior cloning loss on demonstration data helps to ensure stable learning and ground future policy updates.