DOI 10.17586/0021-3454-2025-68-12-1034-1045
UDC 004.896
STRUCTURED REINFORCEMENT LEARNING FOR TIME-OPTIMAL QUADROTOR FLIGHT
ITMO University, Saint Petersburg, 197101, Russian Federation; PhD Student
A. A. Pyrkin
ITMO University, Saint Petersburg, 197101, Russian Federation; Full Professor, Dean
Reference for citation: Barhoum M., Pyrkin A. A. Structured reinforcement learning for time-optimal quadrotor flight. Journal of Instrument Engineering. 2025. Vol. 68, N 12. P. 1034–1045. DOI: 10.17586/0021-3454-2025-68-12-1034-1045.
Abstract. The problem of synthesizing reactive, time-optimal control for quadcopters is aggravated by their multifaceted, underactuated dynamics and the complexity of solving boundary-value problems in real time. This work addresses these challenges, presenting a reinforcement learning framework that learns to autonomously navigate in collision-free environments with optimal waypoint-reaching policies. Our contributions include a cascaded actor architecture inspired by position-velocity separation in classical control to improve flight stability and smooth actions, as well as a composite reward function incorporating radial velocity and acceleration components, promoting maximal progress toward targets and steering the agent toward bang-bang-like maneuvers. Quantitative comparisons prove that our agent achieves smooth control actions, leading to optimal trajectories that adhere tightly with minimal deviations to the desired path.
Abstract. The problem of synthesizing reactive, time-optimal control for quadcopters is aggravated by their multifaceted, underactuated dynamics and the complexity of solving boundary-value problems in real time. This work addresses these challenges, presenting a reinforcement learning framework that learns to autonomously navigate in collision-free environments with optimal waypoint-reaching policies. Our contributions include a cascaded actor architecture inspired by position-velocity separation in classical control to improve flight stability and smooth actions, as well as a composite reward function incorporating radial velocity and acceleration components, promoting maximal progress toward targets and steering the agent toward bang-bang-like maneuvers. Quantitative comparisons prove that our agent achieves smooth control actions, leading to optimal trajectories that adhere tightly with minimal deviations to the desired path.
Keywords: quadrotors, reinforcement learning, autonomous navigation, optimal trajectory, neural networks
Acknowledgement: Supported by the Ministry of Science and Higher Education of the Russian Federation (project no. FSER-2025-0002) and ITMO University Research Projects in AI Initiative (RPAII) №640112.
References:
Acknowledgement: Supported by the Ministry of Science and Higher Education of the Russian Federation (project no. FSER-2025-0002) and ITMO University Research Projects in AI Initiative (RPAII) №640112.
References:
- Richter C., Bry A., and Roy N. Robotics Research, Springer Tracts in Advanced Robotics, 2016, рр. 649–666, DOI:10.1007/978-3-319-28872-7_37.
- Foehn P., Romero A., and Scaramuzza D. Science Robotics, 2021, no. 6(56), DOI:10.1126/scirobotics.abh1221.
- Pěnička R. and Scaramuzza D. IEEE Robotics Automation Letters, arXiv:2202.03947v1 [cs.RO] 8 Feb 2022.
- Romero A., Sun S., Foehn P., and Scaramuzza D. IEEE Transactions on Robotics, 2022, vol. 99, pp. 1–17, DOI:10.1109/ TRO.2022.3173711.
- Khojasteh M.S. and Salimi-Badr A. IEEE Open Journal of Vehicular Technology, 2024, no. 6(99), pp. 34–51, DOI:10.1109/OJVT.2024.3502296.
- Zhong L., Zhao J., Luo H., and Hou Z. Proceedings of the 36th Chinese Control and Decision Conference, Under Review, Xi’an, China, May 25–27, 2024.
- Tsai T.-H. and Li Q. 3rd International Conference on Industrial Artificial Intelligence (IAI), 2021, DOI:10.1109/ IAI53119.2021.9619200.
- Wang J., Wang T., He Z., Cai W. Applied Intelligence, 2022, no. 1(53), DOI:10.1007/s10489-022-03503-6.
- Li X., Yu H., Hu M., Xiao L., Han J., and Fang Y. International Conference on Advanced Robotics and Mechatronics (ICARM), Guilin, China, July 09–11, 2022, pp. 1076–1081, DOI: 10.1109/ICARM54641.2022.9959439.
- Himanshu K., Kumar H., and Pushpangathan J.V. IFAC-PapersOnLine, 2022, no. 22(55), pp. 281–286, DOI:10.1016/j. ifacol.2023.03.047.
- Mokhtar M. and El-Badawy A. International Conference on Unmanned Aircraft Systems, June 2023, DOI:10.1109/ ICUAS57906.2023.10156126.
- Trad T.Y., Choutri K., Lagha M., Meshoul S., Khenfri F., Fareh R., and Shaiba H. Computers, Materials amp. Continua, 2024, no. 3(81), pp. 4757–4786, DOI:10.32604/cmc.2024.055634.
- Wang Y., Sun J.L., He H., and Sun C. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, no. 10(50), pp. 3713–3725.
- Lopez-Sanchez I. and Moreno-Valenzuela J. Annual Reviews in Control, 2023, vol. 56, рр. 100900, DOI: 10.1016/j. arcontrol.2023.100900.
- Idrissi M., Salami M.R., and Annaz F.Y. Journal of Intelligent and Robotic Systems, 2022, no. 2(104), pp. 22, DOI: 10.1007/s10846-021-01527-7.
- Ren Y., Zhu F., Sui S., Yi Z., and Chen K. Drones, 2024, no. 7(8), pp. 315, DOI:10.3390/drones8070315.
- Rub´ı B., Morcego B., and P´erez R.A. Autonomous Robots, 2021, vol. 45, pp. 119–134.
- Mien T., Tu T., and An V. International Journal of Robotics and Control Systems, 2024, no. 2(4), pp. 814–831, ttps:// pubs2.ascee.org/index.php/IJRCS/article/view/1410.
- Idres M., Mustapha O., and Okasha M. IOP Conference Series: Materials Science and Engineering, 2017, no. 1(270), pp. 012010, https://dx.doi.org/10.1088/1757-899X/270/1/012010.
- Noordin A., Basri M.A.M., Mohamed Z., and Lazim I.M. Arabian Journal for Science and Engineering, 2020, vol. 46, pр. 963–981.
- Shah S., Dey D., Lovett C., and Kapoor A. Field and Service Robotics, 2017, https: //arxiv.org/abs/1705.05065.








