Reinforcement Learning

VSCL Students Present at 2026 AIAA SciTech Forum

Posted on January 7, 2026 by Cassie-Kay McQuinn

VSCL researchers Raul Santos, Seth Johnson, Carla Zaramella, Zach Curtis will present papers in January at the 2026 AIAA SciTech Forum in Orlando, Florida.

On 12 January, Raul Santos will present the paper ”Deep Reinforcement Learning Waypoint Generation for Attitude Station-Keeping with Sun Avoidance”. This work studies deep reinforcement learning–based waypoint generation for autonomous on-orbit attitude control and examines how observation and action space design influence neural network performance.

Santos, Raul, Binz, Sadie, McQuinn,Cassie-Kay, Valasek, John, Hamilton, Nathaniel, Hobbs, Kerianne L., and Dulap, Kyle, ”Deep Reinforcement Learning Waypoint Generation for Attitude Station-Keeping with Sun Avoidance,” 2026 AIAA Science and Technology Forum and Exposition, Orlando, FL, 12 January 2026

On 12 January, Seth Johnson will present the paper “Modular Open System Architecture for Low-cost Integrated Avionics (MOSA LINA)”. This work investigates a modular, open-system avionics architecture for experimental vehicles that reduces integration complexity and supports platform-agnostic mission reconfiguration through plug-and-play sensor integration. Two case studies are investigated: one focused on synchronized high-fidelity data collection and the other on autonomous fixed-wing target tracking.

Johnson, Seth, Santos, Raul, Martinez-Banda, Isabella, Luna, Noah, and Valasek, John, “Modular Open System Architecture for Low-cost Integrated Avionics (MOSA LINA),” 2026 AIAA Science and Technology Forum and Exposition, Orlando, FL, 12 January 2026.

On 15 January, Carla Zaramella will present the paper “Identification of Non-Dimensional Aerodynamic Derivatives using Markov Parameter Based Least Squares Identification Algorithm”. This work expands apon previous developments of the MARBLES algorithm to directly identify non-dimensional stability and control derivatives using computed Markov Parameters with a least squares estimator and a priori information.

Leshikar, Christopher, Zaramella, Carla, Madewell, Evelyn, and Valasek, John, “Identification of Non-Dimensional Aerodynamic Derivatives using Markov Parameter Based Least Squares Identification Algorithm,” 2026 AIAA Science and Technology Forum and Exposition, Orlando, FL, 15 January 2026

On 16 January, Zachary Curtis will present the paper “Real-Time Controller Architecture for sUAS Flight Test”. This work investigates a C++/ROS architecture for real -time controller implementation. The said architecture, Kanan, allows safe and fast integration of custom controllers across a broad range of vehicles and controller types.

Luna, Noah, Valasek, John, and Curtis, Zachary, “Real-Time Controller Architecture for sUAS Flight Test,” 2026 AIAA Science and Technology Forum and Exposition, Orlando, FL, 16 January 2026.

Hannah Lehman Defends Ph.D. Dissertation

Posted on June 19, 2025 by Cassie-Kay McQuinn

Hannah Lehman successfully defended her Ph.D. dissertation on May 27th, 2025. Hannah has been with VSCL since her freshman year in Spring 2017, for a total of 8.42 years with VSCL during which she implemented the Theory-Computation-Experiment paradigm. The title of her dissertation is: Hierarchical Auctions for the Coordination of Heterogeneous Agents using Machine Learning

Hannah’s dissertation investigates autonomous multiagent coordination. Machine learning has long been discussed as a candidate for facilitating autonomous multiagent vehicle coordination. Many methods of autonomous multiagent coordination have been proposed, however few if any solutions consider realistic communication challenges. By using machine learning on multiple levels, and a self organizing hierarchical system, an autonomous, pseudo decentralized, heterogeneous, system can dynamically complete tasks without being fully connected. This approach, called Hierarchical Auctions for the Coordination of Heterogeneous Agents (HACHA) will be investigated and demonstrated on four simple, proof of concept simulations. Each simulation scenario is designed to demonstrate HACHA’s applicability to a different subset of multiagent problems and address specific requirements. Within HACHA, specific algorithm and data choices will be motivated real-world hardware constraints and informed by time complexity analysis of sub-algorithms. Results show that a parallel auction coordination framework can be used to organize multiple heterogeneous agents with different sensors, movement modalities, graph connectedness, and controllers to complete a task requiring multiple agents. The auction framework is independent of individual agents and has been utilized in this paper by a combination of reinforcement learning trained agents and optimally controlled agents to complete tasks. HACHA auction propagation methods are explored and recommended use case rules are developed based on theoretical and computational investigations and results. The HACHA auction choice is explored and compared to other popular auction methods over a variety of relevant network characteristics including dynamicism, sparsity, and number of tasks.

Hannah will be doing a short postdoc with VSCL and then starting full-time at Sandia National Laboratories, where she has now done four graduate internships, in July. Hannah’s is the 62^nd graduate degree advised by Dr. John Valasek and the 15^th Ph.D. student.

Dr. John Valasek Reaches Career Milestone

Posted on October 25, 2024 by Cassie-Kay McQuinn

In October Dr. John Valasek reached a career milestone by presenting at his 100th invited seminar/lecture/panelist.

Chronologically:

#1 “Fighter Agility Metrics, Research, and Test,” Lockheed Advanced Development Projects Division (Skunk Works), Burbank, CA, 13 July 1990.

#100 “Multiple-Time-Scale Nonlinear Output Feedback Control of Systems With Model Uncertainties,” Department of Aerospace Engineering, University of Maryland, College Park, MD, 9 October 2024.

Congratulations Dr. Valasek!

MD Sunbeam Defends Masters Thesis

Posted on July 17, 2024 by Cassie-Kay McQuinn

MD Sunbeam (B.S. Aerospace Engineering, University of Texas) successfully defended his Masters thesis: “Gaze-Regularized Imitation Learning”.

Approaches for teaching learning agents via human demonstrations have been widely studied and successfully applied to multiple domains. However, the majority of imitation learning work utilizes only behavioral information from the demonstrator, i.e. which actions were taken, and ignores other useful information. In particular, eye gaze information can give valuable insight towards where the demonstrator is allocating visual attention, and holds the potential to improve agent performance and generalization. In this work, we propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware, imitation learning architecture that learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context. We apply GRIL to a visual navigation task, in which an unmanned quadrotor is trained to search for and navigate to a target vehicle in a photo-realistic simulated environment. We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data.

This work is sponsored by the Army Research Laboratory (ARL) through the Cycle of Learning Project. MD Sunbeam is employed as a researcher at the Human Research and Engineering Directorate (HRED), ARL.

Lehman and Valasek Publish “Design, Selection, Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking,” in Journal of Aerospace Information Systems

Posted on September 14, 2023 by Cassie-Kay McQuinn

Ph.D. student Hannah Lehman and Dr. John Valasek of VSCL published the paper “Design, Selection, Evaluation of Reinforcement Learning Single Agents for Ground Target Tracking,” in Journal of Aerospace Information Systems.

Previous approaches for small fixed-wing unmanned air systems that carry strapdown rather than gimbaled cameras achieved satisfactory ground object tracking performance using both standard and deep reinforcement learning algorithms. However, these approaches have significant restrictions and abstractions to the dynamics of the vehicle such as constant airspeed and constant altitude because the number of states and actions were necessarily limited. Thus extensive tuning was required to obtain good tracking performance. The expansion from four state-action degrees-of-freedom to 15 enabled the agent to exploit previous reward functions which produced novel, yet undesirable emergent behavior. This paper investigates the causes of, and various potential solutions to, undesirable emergent behavior in the ground target tracking problem. A combination of changes to the environment, reward structure, action space simplification, command rate, and controller implementation provide insight into obtaining stable tracking results. Consideration is given to reward structure selection to mitigate undesirable emergent behavior. Results presented in the paper are on a simulated environment of a single unmanned air system tracking a randomly moving single ground object and show that a soft actor-critic algorithm can produce feasible tracking trajectories without limiting the state-space and action-space provided the environment is properly posed.

This publication is part of VSCL’s ongoing work in the area of Reinforcement Learning and Control. The early access version of the article can be viewed at https://arc.aiaa.org/journal/jais