<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">pribor</journal-id><journal-title-group><journal-title xml:lang="ru">Известия высших учебных заведений. Приборостроение</journal-title><trans-title-group xml:lang="en"><trans-title>Journal of Instrument Engineering</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">0021-3454</issn><issn pub-type="epub">2500-0381</issn><publisher><publisher-name>Национальный исследовательский университет ИТМО</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.17586/0021-3454-2025-68-12-1034-1045</article-id><article-id custom-type="elpub" pub-id-type="custom">pribor-439</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>СИСТЕМНЫЙ АНАЛИЗ, УПРАВЛЕНИЕ И ОБРАБОТКА ИНФОРМАЦИИ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>SYSTEM ANALYSIS, MANAGEMENT AND INFORMATION PROCESSING</subject></subj-group></article-categories><title-group><article-title>Структурированное обучение с подкреплением для оптимального по времени полета квадрокоптера</article-title><trans-title-group xml:lang="en"><trans-title>Structured Reinforcement Learning for Time-Optimal Quadrotor Flight</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Бархум</surname><given-names>М.</given-names></name><name name-style="western" xml:lang="en"><surname>Barhoum</surname><given-names>M.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Мажд Бархум — аспирант; факультет систем управления и робототехники</p><p>Санкт-Петербург</p></bio><bio xml:lang="en"><p>Majd Barhoum — Post-Graduate Student; Faculty of Control Systems and Robotics</p><p>St. Petersburg</p></bio><email xlink:type="simple">barhoum.majd213@gmail.com</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Пыркин</surname><given-names>А. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Pyrkin</surname><given-names>A. A.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Антон Александрович Пыркин — д-р техн. наук, профессор; факультет систем управления и робототехники; профессор</p><p>Санкт-Петербург</p></bio><bio xml:lang="en"><p>Anton A. Pyrkin — Dr. Sci., Professor; Faculty of Control Systems and Robotics; Professor</p><p>St. Petersburg</p></bio><email xlink:type="simple">pyrkin@itmo.ru</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Университет ИТМО</institution></aff><aff xml:lang="en"><institution>ITMO University</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2025</year></pub-date><pub-date pub-type="epub"><day>19</day><month>01</month><year>2026</year></pub-date><volume>68</volume><issue>12</issue><fpage>1034</fpage><lpage>1045</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Национальный исследовательский университет ИТМО, 2026</copyright-statement><copyright-year>2026</copyright-year><copyright-holder xml:lang="ru">Национальный исследовательский университет ИТМО</copyright-holder><copyright-holder xml:lang="en">Национальный исследовательский университет ИТМО</copyright-holder><license xlink:href="https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice" xlink:type="simple"><license-p>https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice</license-p></license></permissions><self-uri xlink:href="https://pribor.ifmo.ru/jour/article/view/439">https://pribor.ifmo.ru/jour/article/view/439</self-uri><abstract><p>Проблема синтеза реактивного, оптимального по времени управления для квадрокоптеров усугубляется их сложной неполноприводной динамикой и практической невозможностью точного решения краевых задач на борту в реальном времени. Для преодоления этих проблем предложен фреймворк обучения с подкреплением, позволяющий агенту автономно осваивать стратегии точного достижения путевых точек в свободном пространстве. Центральными элементами предлагаемого подхода являются: (1) новаторская каскадная архитектура актора, заимствующая концепцию раздельного управления позицией и скоростью; (2) продуманная композитная функция вознаграждения с ключевыми радиальными слагаемыми скорости и ускорения, направляющая агента на максимально быстрое продвижение к цели и выполнение (bang-bang-like) маневров с высокой энергетической эффективностью. Результаты всестороннего количественного сравнения с современными методами подтверждают превосходство: агент обеспечивает плавность управляющих сигналов, что гарантирует оптимальность траекторий по времени и их соответствие заданному маршруту с минимальными отклонениями.</p></abstract><trans-abstract xml:lang="en"><p>The problem of synthesizing reactive, time-optimal control for quadcopters is aggravated by their multifaceted, underactuated dynamics and the complexity of solving boundary-value problems in real time. This work addresses these challenges, presenting a reinforcement learning framework that learns to autonomously navigate in collision-free environments with optimal waypoint-reaching policies. Our contributions include a cascaded actor architecture inspired by position-velocity separation in classical control to improve flight stability and smooth actions, as well as a composite reward function incorporating radial velocity and acceleration components, promoting maximal progress toward targets and steering the agent toward bang-bang-like maneuvers. Quantitative comparisons prove that our agent achieves smooth control actions, leading to optimal trajectories that adhere tightly with minimal deviations to the desired path.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>квадрокоптеры</kwd><kwd>обучение с подкреплением</kwd><kwd>автономная навигация</kwd><kwd>оптимальная траектория</kwd><kwd>нейронные сети</kwd></kwd-group><kwd-group xml:lang="en"><kwd>quadrotors</kwd><kwd>reinforcement learning</kwd><kwd>autonomous navigation</kwd><kwd>optimal trajectory</kwd><kwd>neural networks</kwd></kwd-group><funding-group><funding-statement xml:lang="ru">статья подготовлена при финансовой поддержке Министерства науки и высшего образования Российской Федерации, проект № FSER-2025-0002 и Университета ИТМО, проект НИРСИИ № 640112.</funding-statement><funding-statement xml:lang="en">Supported by the Ministry of Science and Higher Education of the Russian Federation (project no. FSER-2025-0002) and ITMO University Research Projects in AI Initiative (RPAII) №640112.</funding-statement></funding-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Richter C., Bry A., and Roy N. Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments // Intern. Symposium of Robotics Research. 2016. Р. 649–666. DOI:10.1007/978-3-319-28872-7_37.</mixed-citation><mixed-citation xml:lang="en">Richter C., Bry A., and Roy N. Robotics Research, Springer Tracts in Advanced Robotics, 2016, рр. 649–666, DOI:10.1007/978-3-319-28872-7_37.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Foehn P., Romero A., and Scaramuzza D. Time-optimal planning for quadrotor waypoint flight // Science Robotics. 2021. Vol. 56, N 6. DOI:10.1126/scirobotics.abh1221.</mixed-citation><mixed-citation xml:lang="en">Foehn P., Romero A., and Scaramuzza D. Science Robotics, 2021, no. 6(56), DOI:10.1126/scirobotics.abh1221.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Pˇeniˇcka R. and Scaramuzza D. Minimum-time quadrotor waypoint flight in cluttered environments // IEEE robotics automation letters. 2022. arXiv:2202.03947v1 [cs.RO].</mixed-citation><mixed-citation xml:lang="en">Pěnička R. and Scaramuzza D. IEEE Robotics Automation Letters, arXiv:2202.03947v1 [cs.RO] 8 Feb 2022.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Romero A., Sun S., Foehn P., and Scaramuzza D. Model predictive contouring control for time-optimal quadrotor flight // IEEE Transactions on Robotics. 2022. Vol. 99. P. 1–17. DOI:10.1109/TRO.2022.3173711.</mixed-citation><mixed-citation xml:lang="en">Romero A., Sun S., Foehn P., and Scaramuzza D. IEEE Transactions on Robotics, 2022, vol. 99, pp. 1–17, DOI:10.1109/TRO.2022.3173711.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Khojasteh M. S. and Salimi-Badr A. Autonomous quadrotor path planning through deep reinforcement learning with monocular depth estimation // IEEE Open Journal of Vehicular Technology. 2025. Vol. 99, N 6. P. 34–51. DOI:10.1109/OJVT.2024.3502296.</mixed-citation><mixed-citation xml:lang="en">Khojasteh M.S. and Salimi-Badr A. IEEE Open Journal of Vehicular Technology, 2024, no. 6(99), pp. 34–51, DOI:10.1109/OJVT.2024.3502296.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Zhong L., Zhao J., Luo H., and Hou Z. Hybrid path planning and following of a quadrotor UAV based on deep reinforcement learning // Chinese Control and Decision Conference. Under Review, Xi’an, China, May 25–27, 2024.</mixed-citation><mixed-citation xml:lang="en">Zhong L., Zhao J., Luo H., and Hou Z. Proceedings of the 36th Chinese Control and Decision Conference, Under Review, Xi’an, China, May 25–27, 2024.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Tsai T.-H. and Li Q. Quadrotor mapless navigation in static and dynamic environments based on deep reinforcement learning // 3rd Intern. Conf. on Industrial Artificial Intelligence (IAI). 2021. DOI:10.1109/IAI53119.2021.9619200.</mixed-citation><mixed-citation xml:lang="en">Tsai T.-H. and Li Q. 3rd International Conference on Industrial Artificial Intelligence (IAI), 2021, DOI:10.1109/IAI53119.2021.9619200.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Wang J., Wang T., He Z., He Z., Cai W., and Sun C. Towards better generalization in quadrotor landing using deep reinforcement learning // Applied Intelligence. 2022. Vol. 53, N 1. DOI:10.1007/s10489-022-03503-6.</mixed-citation><mixed-citation xml:lang="en">Wang J., Wang T., He Z., Cai W. Applied Intelligence, 2022, no. 1(53), DOI:10.1007/s10489-022-03503-6.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Li X., Yu H., Hu M., Xiao L., Han J., and Fang Y. Immersion and invariance-based adaptive control for quadrotor transportation systems using deep reinforcement learning // Intern. Conf. on Advanced Robotics and Mechatronics. Guilin, China, July 09–11, 2022. P. 1076–1081. DOI: 10.1109/ICARM54641.2022.9959439.</mixed-citation><mixed-citation xml:lang="en">Li X., Yu H., Hu M., Xiao L., Han J., and Fang Y. International Conference on Advanced Robotics and Mechatronics (ICARM), Guilin, China, July 09–11, 2022, pp. 1076–1081, DOI: 10.1109/ICARM54641.2022.9959439.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Himanshu K., Kumar H., and Pushpangathan J. V. Waypoint navigation of quadrotor using deep reinforcement learning // IFAC PapersOnLine. 2022. Vol. 55, N 22. P. 281–286. DOI:10.1016/j.ifacol.2023.03.047.</mixed-citation><mixed-citation xml:lang="en">Himanshu K., Kumar H., and Pushpangathan J.V. IFAC-PapersOnLine, 2022, no. 22(55), pp. 281–286, DOI:10.1016/j.ifacol.2023.03.047.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Mokhtar M. and El-Badawy A. Autonomous navigation and control of a quadrotor using deep reinforcement learning // Intern. Conf. on Unmanned Aircraft Systems. 2023. DOI:10.1109/ICUAS57906.2023.10156126.</mixed-citation><mixed-citation xml:lang="en">Mokhtar M. and El-Badawy A. International Conference on Unmanned Aircraft Systems, June 2023, DOI:10.1109/ICUAS57906.2023.10156126.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Trad T. Y., Choutri K., Lagha M., Meshoul S., Khenfri F., Fareh R., and Shaiba H. Real-time implementation of quadrotor uav control system based on a deep reinforcement learning approach // Computers, Materials amp; Continua. 2024. Vol. 81, N 3. P. 4757–4786. DOI:10.32604/cmc.2024.055634.</mixed-citation><mixed-citation xml:lang="en">Trad T.Y., Choutri K., Lagha M., Meshoul S., Khenfri F., Fareh R., and Shaiba H. Computers, Materials amp. Continua, 2024, no. 3(81), pp. 4757–4786, DOI:10.32604/cmc.2024.055634.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Wang Y., Sun J. L., He H., and Sun C. Deterministic policy gradient with integral compensator for robust quadrotor control // IEEE Transactions on Systems, Man, and Cybernetics. 2020. Vol. 50, N 10. P. 3713–3725.</mixed-citation><mixed-citation xml:lang="en">Wang Y., Sun J.L., He H., and Sun C. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, no. 10(50), pp. 3713–3725.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Lopez-Sanchez I. and Moreno-Valenzuela J. Pid control of quadrotor uavs: A survey // Annual Reviews in Control. 2023. Vol. 56. Р. 100900. DOI: 10.1016/j.arcontrol.2023.100900.</mixed-citation><mixed-citation xml:lang="en">Lopez-Sanchez I. and Moreno-Valenzuela J. Annual Reviews in Control, 2023, vol. 56, рр. 100900, DOI: 10.1016/j.arcontrol.2023.100900.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Idrissi M., Salami M. R., and Annaz F. Y. A review of quadrotor unmanned aerial vehicles: Applications, architectural design and control algorithms // Journal of Intelligent and Robotic Systems. 2022. Vol. 104, N 2. P. 22. DOI: 10.1007/s10846-021-01527-7.</mixed-citation><mixed-citation xml:lang="en">Idrissi M., Salami M.R., and Annaz F.Y. Journal of Intelligent and Robotic Systems, 2022, no. 2(104), pp. 22, DOI: 10.1007/s10846-021-01527-7.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Ren Y., Zhu F., Sui S., Yi Z., and Chen K. Enhancing quadrotor control robustness with multi-proportional–integral– derivative self-attention-guided deep reinforcement learning // Drones. 2024. Vol. 8, N 7. P. 315. DOI:10.3390/drones8070315.</mixed-citation><mixed-citation xml:lang="en">Ren Y., Zhu F., Sui S., Yi Z., and Chen K. Drones, 2024, no. 7(8), pp. 315, DOI:10.3390/drones8070315.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Rub´ı B., Morcego B., and P´erez R. A. Deep reinforcement learning for quadrotor path following with adaptive velocity // Autonomous Robots. 2021. Vol. 45. P. 119–134.</mixed-citation><mixed-citation xml:lang="en">Rub´ı B., Morcego B., and P´erez R.A. Autonomous Robots, 2021, vol. 45, pp. 119–134.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Mien T., Tu T., and An V. Cascade pid control for altitude and angular position stabilization of 6-dof uav quadcopter //</mixed-citation><mixed-citation xml:lang="en">Mien T., Tu T., and An V. International Journal of Robotics and Control Systems, 2024, no. 2(4), pp. 814–831, ttps://pubs2.ascee.org/index.php/IJRCS/article/view/1410.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Intern. Journal of Robotics and Control Systems. 2024. Vol. 4, N 2. P. 814–831.</mixed-citation><mixed-citation xml:lang="en">Idres M., Mustapha O., and Okasha M. IOP Conference Series: Materials Science and Engineering, 2017, no. 1(270), pp. 012010, https://dx.doi.org/10.1088/1757-899X/270/1/012010.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Idres M., Mustapha O., and Okasha M. Quadrotor trajectory tracking using pid cascade control // IOP Conf. Series: Materials Science and Engineering. 2017. Vol. 270, N 1. P. 012010.</mixed-citation><mixed-citation xml:lang="en">Noordin A., Basri M.A.M., Mohamed Z., and Lazim I.M. Arabian Journal for Science and Engineering, 2020, vol. 46, pр. 963–981.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Noordin A., Basri M. A. M., Mohamed Z., and Lazim I. M. Adaptive pid controller using sliding mode control approaches for quadrotor uav attitude and position stabilization // Arabian Journal for Science and Engineering. 2020. Vol. 46. Р. 963–981.</mixed-citation><mixed-citation xml:lang="en">Shah S., Dey D., Lovett C., and Kapoor A. Field and Service Robotics, 2017, https://arxiv.org/abs/1705.05065.</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Shah S., Dey D., Lovett C., and Kapoor A. Airsim: High-fidelity visual and physical simulation for autonomous vehicles // Field and Service Robotics. 2017 [Электронный ресурс]: https://arxiv.org/abs/1705.05065.</mixed-citation><mixed-citation xml:lang="en">Shah S., Dey D., Lovett C., and Kapoor A. Airsim: High-fidelity visual and physical simulation for autonomous vehicles // Field and Service Robotics. 2017 [Электронный ресурс]: https://arxiv.org/abs/1705.05065.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
