<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">pribor</journal-id><journal-title-group><journal-title xml:lang="ru">Известия высших учебных заведений. Приборостроение</journal-title><trans-title-group xml:lang="en"><trans-title>Journal of Instrument Engineering</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">0021-3454</issn><issn pub-type="epub">2500-0381</issn><publisher><publisher-name>Национальный исследовательский университет ИТМО</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.17586/0021-3454-2024-67-11-958-968</article-id><article-id custom-type="elpub" pub-id-type="custom">pribor-314</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>МЕТОДИЧЕСКОЕ И ПРОГРАММНО-ИНФОРМАЦИОННОЕ ОБЕСПЕЧЕНИЕ ФУНКЦИОНИРОВАНИЯ АВТОМАТИЗИРОВАННЫХ СИСТЕМ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>METHODOLOGICAL AND SOFTWARE-INFORMATION SUPPORT FOR THE FUNCTIONING OF AUTOMATED SYSTEMS</subject></subj-group></article-categories><title-group><article-title>Анализ статистических характеристик искусственно сгенерированных текстов</article-title><trans-title-group xml:lang="en"><trans-title>Analysis of Statistical Characteristics of Artificially Generated Texts</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Кулешов</surname><given-names>С. В.</given-names></name><name name-style="western" xml:lang="en"><surname>Kuleshov</surname><given-names>S. V.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Сергей Викторович Кулешов — д-р техн. наук, профессор РАН; СПИИРАН, лаборатория автоматизации научных исследований; гл. научный сотрудник</p></bio><bio xml:lang="en"><p>Sergey V. Kuleshov — Dr. Sci., Professor; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Automation of Scientific Research, Chief Researcher</p></bio><email xlink:type="simple">kuleshov@iias.spb.su</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Зайцева</surname><given-names>А. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Zaytseva</surname><given-names>A. A.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Александра Алексеевна Зайцева — канд. техн. наук; СПИИРАН, лаборатория автоматизации научных исследований; ст. научный сотрудник</p></bio><bio xml:lang="en"><p>Alexandra A. Zaytseva — PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Automation of Scientific Research, Senior Researcher</p></bio><email xlink:type="simple">cher@iias.spb.su</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Аксенов</surname><given-names>А. Ю.</given-names></name><name name-style="western" xml:lang="en"><surname>Aksenov</surname><given-names>A. Yu.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Алексей Юрьевич Аксенов — канд. техн. наук; СПИИРАН, лаборатория автоматизации научных исследований; ст. научный сотрудник</p></bio><bio xml:lang="en"><p>Alexey Yu. Aksenov — PhD; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory ofAutomation of Scientific Research, Senior Researcher</p></bio><email xlink:type="simple">a_aksenov@iias.spb.su</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Санкт-Петербургский федеральный исследовательский центр Российской академии наук</institution></aff><aff xml:lang="en"><institution>St. Petersburg Federal Research Center of the RAS</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2024</year></pub-date><pub-date pub-type="epub"><day>07</day><month>12</month><year>2024</year></pub-date><volume>67</volume><issue>11</issue><fpage>958</fpage><lpage>968</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Национальный исследовательский университет ИТМО, 2024</copyright-statement><copyright-year>2024</copyright-year><copyright-holder xml:lang="ru">Национальный исследовательский университет ИТМО</copyright-holder><copyright-holder xml:lang="en">Национальный исследовательский университет ИТМО</copyright-holder><license xlink:href="https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice" xlink:type="simple"><license-p>https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice</license-p></license></permissions><self-uri xlink:href="https://pribor.ifmo.ru/jour/article/view/314">https://pribor.ifmo.ru/jour/article/view/314</self-uri><abstract><p>Рассматривается новый тренд — формирование контента с применением инструментов и технологий искусственного интеллекта. Активное внедрение технологий искусственного интеллекта для генерации данных приводит к увеличению доли искусственно сгенерированных данных, которые необходимо выявлять в автоматическом режиме для предотвращения ошибок (недостоверности, введения в заблуждение). Предложены подходы к идентификации текстовых данных, созданных при помощи нейросетевых технологий, включающие эвристические правила, основанные на критерии зависимости объема реферата от порога реферирования, что позволяет проводить автоматическую оценку текстовых документов в мониторинговых и поисковых системах при обработке больших объемов неструктурированных данных. Полученные результаты закладывают технологическую базу для реализации широкого спектра практических решений по обеспечению интеллектуальной поддержки коллективного поведения участников в человекомашинных сообществах за счет разработки теоретических и технологических основ обработки неструктурированных данных.</p></abstract><trans-abstract xml:lang="en"><p>A new trend is considered, namely, the formation of content using artificial intelligence tools and technologies. Active implementation of artificial intelligence technologies for data generation leads to an increase in the share of artificially generated data that must be identified automatically to prevent errors (unreliability, misleading). Approaches to identifying text data created using neural network technologies are proposed, including heuristic rules based on the criterion of dependence of the abstract volume on the abstracting threshold, which allows for automatic evaluation of text documents in monitoring and search systems when processing large volumes of unstructured data. The obtained results lay the technological basis for the implementation of a wide range of practical solutions to ensure intellectual support for the collective behavior of participants in human-machine communities through the development of theoretical and technological foundations for processing unstructured data.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>интернет-документы</kwd><kwd>искусственные нейронные сети</kwd><kwd>большая языковая модель</kwd><kwd>интернет-ресурсы</kwd><kwd>методы искусственного интеллекта</kwd><kwd>генерация данных</kwd></kwd-group><kwd-group xml:lang="en"><kwd>internet documents</kwd><kwd>artificial neural networks</kwd><kwd>large language model</kwd><kwd>Internet resources</kwd><kwd>artificial intelligence methods</kwd><kwd>data generation</kwd></kwd-group><funding-group><funding-statement xml:lang="ru">работа выполнена при поддержке гос. заданием на 2024 г. № FFZF-2022-0005.</funding-statement></funding-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">YouTube обяжет маркировать контент, созданный нейросетями [Электронный ресурс]: https://www.fontanka.ru/2023/11/14/72913286/,27.06.2024.</mixed-citation><mixed-citation xml:lang="en">https://www.fontanka.ru/2023/11/14/72913286/. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Fang X., Che Sh., Mao M., Zhang H., Zhao M., Zhao X. Bias of AI-Generated Content: An Examination of News Produced by Large Language Models [Электронный ресурс]: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4574226,27.06.2024.</mixed-citation><mixed-citation xml:lang="en">Fang X., Che Sh., Mao M., Zhang H., Zhao M., Zhao X. Sci. Rep., 2024, no. 1(14), pp. 5224, doi: 10.1038/s41598-024-55686-2.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Chen Ch., Fu J., Lyu L. A Pathway Towards Responsible AI Generated Content. 2023. DOI: 10.48550/arXiv.2303.01325.</mixed-citation><mixed-citation xml:lang="en">Chen Ch., Fu J., Lyu L. arXiv:2303.01325v3, 27 Dec. 2023, https://doi.org/10.48550/arXiv.2303.01325.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Wahle J.Ph., Ruas T., Mohammad S.M., Meuschke N., Gipp B. AI Usage Cards: Responsibly Reporting AI-Generated Content // Proc. of ACM/IEEE Joint Conf. on Digital Libraries (JCDL 2023), June 2023, Mexico, Santa Fe. 2023. P. 282–284.</mixed-citation><mixed-citation xml:lang="en">Wahle J.Ph., Ruas T., Mohammad S.M., Meuschke N., Gipp B. Proc. of 2023 ACM/IEEE Joint Conf. on Digital Libraries (JCDL 2023), Mexico, Santa Fe, June 2023, рр. 282–284.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Huang X., Li P., Du H., Kang J., Niyato D., Kim D.I., Wu Y. Federated Learning-Empowered AI-Generated Content in Wireless Networks. 2023. DOI: 10.48550/arXiv.2307.07146.</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.48550/arXiv.2307.07146.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Gragnaniello D., Marra F., Verdoliva L. Detection of AI-Generated Synthetic Faces. Handbook of Digital Face Manipulation and Detection // Advances in Computer Vision and Pattern Recognition. 2022. P. 191–212.</mixed-citation><mixed-citation xml:lang="en">Gragnaniello D., Marra F., Verdoliva L. Advances in Computer Vision and Pattern Recognition, 2022, рр. 191–212.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Xi Z., Wenmin H., Kangkang W., Weiqi L., Peijia Zh. AI-Generated Image Detection using a Cross-Attention Enhanced Dual-Stream Network // Proc. of Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Nov. 2023, Taiwan, Taipei. P. 1463–1470.</mixed-citation><mixed-citation xml:lang="en">Xi Z., Wenmin H., Kangkang W., Weiqi L., Peijia Zh. Proc. of 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taiwan, Taipei, November 2023, рр. 1463–1470.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Weber-Wulff D., Anohina-Naumeca A., Bjelobaba S., Foltýnek T., Guerrero-Dib J., Popoola O., Šigut P., Waddington L. Testing of Detection Tools for AI-Generated Text. 2023. DOI: 10.48550/arXiv.2306.15666.</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.48550/arXiv.2306.15666.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Joo-Wha H., Fischer K., Ha Y., Zeng Y. Human, I wrote a song for you: An experiment testing the influence of machines’ attributes on the AI-composed music evaluation//Computers in Human Behavior. 2022. Vol. 131. 107239.</mixed-citation><mixed-citation xml:lang="en">Joo-Wha H., Fischer K., Ha Y., Zeng Y. Computers in Human Behavior, 2022, vol. 131, art. no. 107239.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Cao Y. Li S., Liu Y., Yan Zh., Dai Y., Yu Ph., Sun L. A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT. 2023. DOI: 10.48550/arXiv.2303.04226.</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.48550/arXiv.2303.04226.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Wu J., Wensheng G., Zefeng Ch., Shicheng W., Hong L. AI-Generated Content (AIGC): A Survey. 2023. DOI: 10.48550/arXiv.2304.06632.</mixed-citation><mixed-citation xml:lang="en">https://doi.org/10.48550/arXiv.2304.06632.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Ruchika L., Priyanka Bh., Neha V., Anshika J. AI-Generated Text Detection: A Review // Intern. Journal of Creative Research Thoughts (IJCRT). 2023. Vol. 11(10). P. d784–d789.</mixed-citation><mixed-citation xml:lang="en">Ruchika L., Priyanka Bh., Neha V., Anshika J. Intern. J. of Creative Research Thoughts (IJCRT), 2023, no. 10(11), pp. d784–d789.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Zhengyuan J., Jinghuai Zh., Neil Zh.G. Evading Watermark based Detection of AI-Generated Content // Proc. of the ACM SIGSAC Conf. on Computer and Communications Security (CCS ‘23), Nov. 2023, Copenhagen. 2023. P.1168–1181.</mixed-citation><mixed-citation xml:lang="en">Zhengyuan J., Jinghuai Zh., Neil Zh.G. Proc. of the 2023 ACM SIGSAC Conf. on Computer and Communications Security (CCS '23), Denmark, Copenhagen, November 2023, рр. 1168–1181.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Elkhatat A., Elsaid Kh., Almeer S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text // Intern. Journal for Educational Integrity. 2023. Vol. 19. P. 17.</mixed-citation><mixed-citation xml:lang="en">Elkhatat A., Elsaid Kh., Almeer S. Intern. J. for Educational Integrity, 2023, vol. 19, рр. 17.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Elkhatat A. M. Evaluating the authenticity of ChatGPT responses: a study on text-matching capabilities // Intern. Journal for Educational Integrity. 2023. Vol. 19. P. 15. DOI: 10.1007/s40979-023-00137-0.</mixed-citation><mixed-citation xml:lang="en">Elkhatat A.M. Intern. J. for Educational Integrity, 2023, vol. 19, рр. 15, https://doi.org/10.1007/s40979-023-00137-0.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Otterbacher J. Why technical solutions for detecting AI-generated content in research and education are insufficient// Patterns. 2023. Vol. 4(7). P. 100796.</mixed-citation><mixed-citation xml:lang="en">Otterbacher J. Patterns, 2023, no. 7(4), pp. 100796.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Pengyu W., Linyang K. R., Botian J., Dong Zh., Xipeng Q. SeqXGPT: Sentence-Level AI-Generated Text Detection // Proc. of the Conf. on Empirical Methods in Natural Language Processin, Dec. 2023. Singapore. 2023. P. 1144–1156.</mixed-citation><mixed-citation xml:lang="en">Pengyu W., Linyang K.R., Botian J., Dong Zh., Xipeng Q. Proc. of the 2023 Conf. on Empirical Methods in Natural Language Processing 2023, Singapore, December 2023, рр. 1144–1156.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Price G. Sakellarios M. The Effectiveness of Free Software for Detecting AI-Generated Writing // Intern. Journal of Teaching, Learning and Education. 2023. Vol. 2. P. 31–38.</mixed-citation><mixed-citation xml:lang="en">Price G. Sakellarios M. Intern. J. of Teaching, Learning and Education, 2023, vol. 2, рр. 31–38.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Qu Y., Liu P., Song W., Liu L., Cheng M. A Text Generation and Prediction System: Pre-training on New Corpora Using BERT and GPT-2 // IEEE 10th Int. Conf. on Electronics Information and Emergency Communication (ICEIEC), July 2020, China, Beijing. 2020. P. 323–326.</mixed-citation><mixed-citation xml:lang="en">Qu Y., Liu P., Song W., Liu L., Cheng M. IEEE 10th Intern. Conf. on Electronics Information and Emergency Communication (ICEIEC), China, Beijing, July 2020, рр. 323–326.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Chen W., Su Y., Yan X., Wang W. Y. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation. [Электронный ресурс]: https://arxiv.org/abs/2010.02307,27.06.2024.</mixed-citation><mixed-citation xml:lang="en">https://arxiv.org/abs/2010.02307.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">GPT для чайников: от токенизации до файнтюнинга [Электронный ресурс]: https://habr.com/ru/articles/599673/27.06.2024.</mixed-citation><mixed-citation xml:lang="en">https://habr.com/ru/articles/599673/. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Ackley D., Hinton G., Sejnowski T. A learning algorithm for Boltzman nmachines//Cognitive Science. 1985. Vol. 9. N 1. P. 147–169.</mixed-citation><mixed-citation xml:lang="en">Ackley D., Hinton G., Sejnowski T. Cognitive Science, 1985, no. 1(9), pp. 147–169.</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">OpenAI Codex [Электронный ресурс]: https://openai.com/blog/openai-codex,27.06.2024.</mixed-citation><mixed-citation xml:lang="en">OpenAI Codex, https://openai.com/blog/openai-codex.</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">GPT-4 Technical Report. OpenAI [Электронный ресурс]: https://cdn.openai.com/papers/gpt-4.pdf,27.06.2024.</mixed-citation><mixed-citation xml:lang="en">GPT-4 Technical Report. OpenAI, https://cdn.openai.com/papers/gpt-4.pdf.</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">GPTZero [Электронный ресурс]: https://gptzero.me/technology,27.06.2024.</mixed-citation><mixed-citation xml:lang="en">GPTZero, https://gptzero.me/technology.</mixed-citation></citation-alternatives></ref><ref id="cit26"><label>26</label><citation-alternatives><mixed-citation xml:lang="ru">Chaka C. Detecting AI content in responses generated by ChatGPT, YouChat, and Chatsonic: The case of five AI content detection tools//Journal of Applied Learning and Teaching. 2023. Vol. 6(2). DOI: 10.37074/jalt.2023.6.2.12.</mixed-citation><mixed-citation xml:lang="en">Chaka C. Journal of Applied Learning and Teaching, 2023, no. 2(6), https://doi.org/10.37074/jalt.2023.6.2.12.</mixed-citation></citation-alternatives></ref><ref id="cit27"><label>27</label><citation-alternatives><mixed-citation xml:lang="ru">Yang X., Cheng W., Petzold L., Wang W.Y., Chen H. DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text//ArXiv, abs/2305.17359. 2024.</mixed-citation><mixed-citation xml:lang="en">Yang X., Cheng W., Petzold L., Wang W.Y., Chen H. ArXiv, abs/2305.17359, https://www.semanticscholar.org/paper/DNA-GPT%3A-Divergent-N-Gram-Analysis-for-Detection-of-Yang-Cheng/08145978da4c8912f4a05444a6bbf048778dc4af.</mixed-citation></citation-alternatives></ref><ref id="cit28"><label>28</label><citation-alternatives><mixed-citation xml:lang="ru">Кулешов С. В., Зайцева А. А., Марков С. В. Ассоциативно-онтологический подход к обработке текстов на естественном языке // Интеллектуальные технологии на транспорте. 2015. № 4. С. 40–45.</mixed-citation><mixed-citation xml:lang="en">Kuleshov S.V., Zaytseva A.A., Markov S.V. Intellectual Technologies on Transport, 2015, no. 4, pp. 40–45. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit29"><label>29</label><citation-alternatives><mixed-citation xml:lang="ru">Jiang A. Q. et al. Mistral 7B [Электронный ресурс]: https://arxiv.org/abs/2310.06825,27.06.2020.</mixed-citation><mixed-citation xml:lang="en">https://arxiv.org/abs/2310.06825</mixed-citation></citation-alternatives></ref><ref id="cit30"><label>30</label><citation-alternatives><mixed-citation xml:lang="ru"></mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref><ref id="cit31"><label>31</label><citation-alternatives><mixed-citation xml:lang="ru"></mixed-citation><mixed-citation xml:lang="en"></mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
