References

pribor

Известия высших учебных заведений. Приборостроение

Journal of Instrument Engineering

0021-34542500-0381

Национальный исследовательский университет ИТМО

10.17586/0021-3454-2025-68-12-1034-1045

pribor-439

Research Article

СИСТЕМНЫЙ АНАЛИЗ, УПРАВЛЕНИЕ И ОБРАБОТКА ИНФОРМАЦИИ

SYSTEM ANALYSIS, MANAGEMENT AND INFORMATION PROCESSING

Структурированное обучение с подкреплением для оптимального по времени полета квадрокоптера

Structured Reinforcement Learning for Time-Optimal Quadrotor Flight

Бархум

М.

Barhoum

Мажд Бархум — аспирант; факультет систем управления и робототехники

Санкт-Петербург

Majd Barhoum — Post-Graduate Student; Faculty of Control Systems and Robotics

St. Petersburg

barhoum.majd213@gmail.com

Пыркин

А. А.

Pyrkin

A. A.

Антон Александрович Пыркин — д-р техн. наук, профессор; факультет систем управления и робототехники; профессор

Санкт-Петербург

Anton A. Pyrkin — Dr. Sci., Professor; Faculty of Control Systems and Robotics; Professor

St. Petersburg

pyrkin@itmo.ru

Университет ИТМОITMO University

2025

19012026

681210341045

2026

Национальный исследовательский университет ИТМО

https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice

https://pribor.ifmo.ru/jour/article/view/439

Проблема синтеза реактивного, оптимального по времени управления для квадрокоптеров усугубляется их сложной неполноприводной динамикой и практической невозможностью точного решения краевых задач на борту в реальном времени. Для преодоления этих проблем предложен фреймворк обучения с подкреплением, позволяющий агенту автономно осваивать стратегии точного достижения путевых точек в свободном пространстве. Центральными элементами предлагаемого подхода являются: (1) новаторская каскадная архитектура актора, заимствующая концепцию раздельного управления позицией и скоростью; (2) продуманная композитная функция вознаграждения с ключевыми радиальными слагаемыми скорости и ускорения, направляющая агента на максимально быстрое продвижение к цели и выполнение (bang-bang-like) маневров с высокой энергетической эффективностью. Результаты всестороннего количественного сравнения с современными методами подтверждают превосходство: агент обеспечивает плавность управляющих сигналов, что гарантирует оптимальность траекторий по времени и их соответствие заданному маршруту с минимальными отклонениями.

The problem of synthesizing reactive, time-optimal control for quadcopters is aggravated by their multifaceted, underactuated dynamics and the complexity of solving boundary-value problems in real time. This work addresses these challenges, presenting a reinforcement learning framework that learns to autonomously navigate in collision-free environments with optimal waypoint-reaching policies. Our contributions include a cascaded actor architecture inspired by position-velocity separation in classical control to improve flight stability and smooth actions, as well as a composite reward function incorporating radial velocity and acceleration components, promoting maximal progress toward targets and steering the agent toward bang-bang-like maneuvers. Quantitative comparisons prove that our agent achieves smooth control actions, leading to optimal trajectories that adhere tightly with minimal deviations to the desired path.

квадрокоптерыобучение с подкреплениемавтономная навигацияоптимальная траекториянейронные сети

quadrotorsreinforcement learningautonomous navigationoptimal trajectoryneural networks

статья подготовлена при финансовой поддержке Министерства науки и высшего образования Российской Федерации, проект № FSER-2025-0002 и Университета ИТМО, проект НИРСИИ № 640112.

Supported by the Ministry of Science and Higher Education of the Russian Federation (project no. FSER-2025-0002) and ITMO University Research Projects in AI Initiative (RPAII) №640112.

References1

Richter C., Bry A., and Roy N. Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments // Intern. Symposium of Robotics Research. 2016. Р. 649–666. DOI:10.1007/978-3-319-28872-7_37.

Richter C., Bry A., and Roy N. Robotics Research, Springer Tracts in Advanced Robotics, 2016, рр. 649–666, DOI:10.1007/978-3-319-28872-7_37.

Foehn P., Romero A., and Scaramuzza D. Time-optimal planning for quadrotor waypoint flight // Science Robotics. 2021. Vol. 56, N 6. DOI:10.1126/scirobotics.abh1221.

Foehn P., Romero A., and Scaramuzza D. Science Robotics, 2021, no. 6(56), DOI:10.1126/scirobotics.abh1221.

Pˇeniˇcka R. and Scaramuzza D. Minimum-time quadrotor waypoint flight in cluttered environments // IEEE robotics automation letters. 2022. arXiv:2202.03947v1 [cs.RO].

Pěnička R. and Scaramuzza D. IEEE Robotics Automation Letters, arXiv:2202.03947v1 [cs.RO] 8 Feb 2022.

Romero A., Sun S., Foehn P., and Scaramuzza D. Model predictive contouring control for time-optimal quadrotor flight // IEEE Transactions on Robotics. 2022. Vol. 99. P. 1–17. DOI:10.1109/TRO.2022.3173711.

Romero A., Sun S., Foehn P., and Scaramuzza D. IEEE Transactions on Robotics, 2022, vol. 99, pp. 1–17, DOI:10.1109/TRO.2022.3173711.

Khojasteh M. S. and Salimi-Badr A. Autonomous quadrotor path planning through deep reinforcement learning with monocular depth estimation // IEEE Open Journal of Vehicular Technology. 2025. Vol. 99, N 6. P. 34–51. DOI:10.1109/OJVT.2024.3502296.

Khojasteh M.S. and Salimi-Badr A. IEEE Open Journal of Vehicular Technology, 2024, no. 6(99), pp. 34–51, DOI:10.1109/OJVT.2024.3502296.

Zhong L., Zhao J., Luo H., and Hou Z. Hybrid path planning and following of a quadrotor UAV based on deep reinforcement learning // Chinese Control and Decision Conference. Under Review, Xi’an, China, May 25–27, 2024.

Zhong L., Zhao J., Luo H., and Hou Z. Proceedings of the 36th Chinese Control and Decision Conference, Under Review, Xi’an, China, May 25–27, 2024.

Tsai T.-H. and Li Q. Quadrotor mapless navigation in static and dynamic environments based on deep reinforcement learning // 3rd Intern. Conf. on Industrial Artificial Intelligence (IAI). 2021. DOI:10.1109/IAI53119.2021.9619200.

Tsai T.-H. and Li Q. 3rd International Conference on Industrial Artificial Intelligence (IAI), 2021, DOI:10.1109/IAI53119.2021.9619200.

Wang J., Wang T., He Z., He Z., Cai W., and Sun C. Towards better generalization in quadrotor landing using deep reinforcement learning // Applied Intelligence. 2022. Vol. 53, N 1. DOI:10.1007/s10489-022-03503-6.

Wang J., Wang T., He Z., Cai W. Applied Intelligence, 2022, no. 1(53), DOI:10.1007/s10489-022-03503-6.

Li X., Yu H., Hu M., Xiao L., Han J., and Fang Y. Immersion and invariance-based adaptive control for quadrotor transportation systems using deep reinforcement learning // Intern. Conf. on Advanced Robotics and Mechatronics. Guilin, China, July 09–11, 2022. P. 1076–1081. DOI: 10.1109/ICARM54641.2022.9959439.

Li X., Yu H., Hu M., Xiao L., Han J., and Fang Y. International Conference on Advanced Robotics and Mechatronics (ICARM), Guilin, China, July 09–11, 2022, pp. 1076–1081, DOI: 10.1109/ICARM54641.2022.9959439.

Himanshu K., Kumar H., and Pushpangathan J. V. Waypoint navigation of quadrotor using deep reinforcement learning // IFAC PapersOnLine. 2022. Vol. 55, N 22. P. 281–286. DOI:10.1016/j.ifacol.2023.03.047.

Himanshu K., Kumar H., and Pushpangathan J.V. IFAC-PapersOnLine, 2022, no. 22(55), pp. 281–286, DOI:10.1016/j.ifacol.2023.03.047.

Mokhtar M. and El-Badawy A. Autonomous navigation and control of a quadrotor using deep reinforcement learning // Intern. Conf. on Unmanned Aircraft Systems. 2023. DOI:10.1109/ICUAS57906.2023.10156126.

Mokhtar M. and El-Badawy A. International Conference on Unmanned Aircraft Systems, June 2023, DOI:10.1109/ICUAS57906.2023.10156126.

Trad T. Y., Choutri K., Lagha M., Meshoul S., Khenfri F., Fareh R., and Shaiba H. Real-time implementation of quadrotor uav control system based on a deep reinforcement learning approach // Computers, Materials amp; Continua. 2024. Vol. 81, N 3. P. 4757–4786. DOI:10.32604/cmc.2024.055634.

Trad T.Y., Choutri K., Lagha M., Meshoul S., Khenfri F., Fareh R., and Shaiba H. Computers, Materials amp. Continua, 2024, no. 3(81), pp. 4757–4786, DOI:10.32604/cmc.2024.055634.

Wang Y., Sun J. L., He H., and Sun C. Deterministic policy gradient with integral compensator for robust quadrotor control // IEEE Transactions on Systems, Man, and Cybernetics. 2020. Vol. 50, N 10. P. 3713–3725.

Wang Y., Sun J.L., He H., and Sun C. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, no. 10(50), pp. 3713–3725.

Lopez-Sanchez I. and Moreno-Valenzuela J. Pid control of quadrotor uavs: A survey // Annual Reviews in Control. 2023. Vol. 56. Р. 100900. DOI: 10.1016/j.arcontrol.2023.100900.

Lopez-Sanchez I. and Moreno-Valenzuela J. Annual Reviews in Control, 2023, vol. 56, рр. 100900, DOI: 10.1016/j.arcontrol.2023.100900.

Idrissi M., Salami M. R., and Annaz F. Y. A review of quadrotor unmanned aerial vehicles: Applications, architectural design and control algorithms // Journal of Intelligent and Robotic Systems. 2022. Vol. 104, N 2. P. 22. DOI: 10.1007/s10846-021-01527-7.

Idrissi M., Salami M.R., and Annaz F.Y. Journal of Intelligent and Robotic Systems, 2022, no. 2(104), pp. 22, DOI: 10.1007/s10846-021-01527-7.

Ren Y., Zhu F., Sui S., Yi Z., and Chen K. Enhancing quadrotor control robustness with multi-proportional–integral– derivative self-attention-guided deep reinforcement learning // Drones. 2024. Vol. 8, N 7. P. 315. DOI:10.3390/drones8070315.

Ren Y., Zhu F., Sui S., Yi Z., and Chen K. Drones, 2024, no. 7(8), pp. 315, DOI:10.3390/drones8070315.

Rub´ı B., Morcego B., and P´erez R. A. Deep reinforcement learning for quadrotor path following with adaptive velocity // Autonomous Robots. 2021. Vol. 45. P. 119–134.

Rub´ı B., Morcego B., and P´erez R.A. Autonomous Robots, 2021, vol. 45, pp. 119–134.

Mien T., Tu T., and An V. Cascade pid control for altitude and angular position stabilization of 6-dof uav quadcopter //

Mien T., Tu T., and An V. International Journal of Robotics and Control Systems, 2024, no. 2(4), pp. 814–831, ttps://pubs2.ascee.org/index.php/IJRCS/article/view/1410.

Intern. Journal of Robotics and Control Systems. 2024. Vol. 4, N 2. P. 814–831.

Idres M., Mustapha O., and Okasha M. IOP Conference Series: Materials Science and Engineering, 2017, no. 1(270), pp. 012010, https://dx.doi.org/10.1088/1757-899X/270/1/012010.

Idres M., Mustapha O., and Okasha M. Quadrotor trajectory tracking using pid cascade control // IOP Conf. Series: Materials Science and Engineering. 2017. Vol. 270, N 1. P. 012010.

Noordin A., Basri M.A.M., Mohamed Z., and Lazim I.M. Arabian Journal for Science and Engineering, 2020, vol. 46, pр. 963–981.

Noordin A., Basri M. A. M., Mohamed Z., and Lazim I. M. Adaptive pid controller using sliding mode control approaches for quadrotor uav attitude and position stabilization // Arabian Journal for Science and Engineering. 2020. Vol. 46. Р. 963–981.

Shah S., Dey D., Lovett C., and Kapoor A. Field and Service Robotics, 2017, https://arxiv.org/abs/1705.05065.

Shah S., Dey D., Lovett C., and Kapoor A. Airsim: High-fidelity visual and physical simulation for autonomous vehicles // Field and Service Robotics. 2017 [Электронный ресурс]: https://arxiv.org/abs/1705.05065.

The authors declare that there are no conflicts of interest present.