All topics | KAIST Breakthroughs

KAIST
BREAKTHROUGHS

Research Webzine of the KAIST College of Engineering since 2014

Spring 2025 Vol. 24

Engineering

Multi-agent reinforcement learning for mission planning of a communication-aware multi-UAV system

July 27, 2023 hit 133

Multi-agent reinforcement learning for mission planning of a communication-aware multi-UAV system

The research focused on a multi-target sensing mission operated by the autonomous multi-UAV system under the air-to-ground and air-to-air communication channel models. A multi-agent reinforcement learning technique is applied to obtain a distributed real-time mission planning algorithm.

Article | Spring 2021

Recent advances in unmanned aerial vehicle (UAV) technologies, associated with both the hardware and software utilized, are increasing the viability of highly autonomous UAVs capable of planning and executing missions with minimal human intervention. Decisions for autonomous operations can be obtained by solving a UAV mission planning problem designed to find the appropriate flight trajectory and task schedule during a long-term mission period under certain constraints with high-level objectives.

Among many mission-level considerations, communication is an essential modeling component for the autonomous UAV system. A reliable communication link between a UAV and a ground control station is required to uplink the mission command and downlink the sensor/monitoring data. For a multi-UAV system, stable air-to-air links through an aerial ad-hoc communication network are mandatory to facilitate autonomous cooperative tasks based on situational awareness and consensus, even without ground user intervention.

This study suggests integrated mathematical formulations for the communication-aware multi-UAV mission planning problem under multi-target sensing mission scenarios. Such an integrated planning problem is likely to be a large-scale optimization problem that is hard to solve in real time by a typical onboard flight computer. Commercial linear/nonlinear solvers could spend several minutes or even hours to obtain an optimal planning solution, depending on the complexity of the problem (e.g., the number of UAVs, the number of targets/tasks, or communication network connectivity constraints). However, the UAV system can experience sudden environmental changes that require prompt reactions of the agent(s) that frequently include runtime re-planning. Faced with the change in the mission environment (e.g., a pop-up threat or change of target information), the time that a UAV can spend for re-planning is minimal in most cases.

This study proposes a learning-based planning framework to apply the deep neural network into the heuristic online path/task/mission planning algorithm. It can be a fast and near-optimal online controller/planner, even with low onboard computation resources.

The communication-aware multi-UAV mission planning problem is formulated as a multi-agent Markov decision process (MDP). The deep neural network (or neural mission planner) is implemented for each UAV agent and is trained for a distributed sequential decision-maker in the MDP formulation. The distributed neural planners are trained by multi-agent reinforcement learning (MARL) under centralized training and decentralized execution (CTDE) scheme, which can overcome the tractability issue of multi-agent MDP.

To validate the optimality of the learning-based planner, its mission planning performance metrics are compared with a globally optimal planning solution and baseline online planning algorithm. After several days of training, the multi-agent reinforcement learning can provide the distributed neural mission planner for the multi-UAV network with 3~5 UAVs. The MILP/NLP for the same mission planning problems are formulated and solved, providing optimal (reference) solutions for comparison. The mission simulations confirm that the proposed MARL-based planner achieves an average relative mission finish time gap of 10~20% compared to the MILP optimal solution. It is also shown that the MARL-based planner outperforms the baseline algorithm (receding horizon planner) in terms of both mission performance index optimality and online algorithm runtime.