<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">pribor</journal-id><journal-title-group><journal-title xml:lang="ru">Известия высших учебных заведений. Приборостроение</journal-title><trans-title-group xml:lang="en"><trans-title>Journal of Instrument Engineering</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">0021-3454</issn><issn pub-type="epub">2500-0381</issn><publisher><publisher-name>Национальный исследовательский университет ИТМО</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.17586/0021-3454-2024-67-11-984-993</article-id><article-id custom-type="elpub" pub-id-type="custom">pribor-317</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>МЕТОДИЧЕСКОЕ И ПРОГРАММНО-ИНФОРМАЦИОННОЕ ОБЕСПЕЧЕНИЕ ФУНКЦИОНИРОВАНИЯ АВТОМАТИЗИРОВАННЫХ СИСТЕМ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>METHODOLOGICAL AND SOFTWARE-INFORMATION SUPPORT FOR THE FUNCTIONING OF AUTOMATED SYSTEMS</subject></subj-group></article-categories><title-group><article-title>Методика создания многомодальных корпусов данных для аудио-визуального анализа вовлеченности и эмоций участников виртуальной коммуникации</article-title><trans-title-group xml:lang="en"><trans-title>Method of Creating Multimodal Databases for Audiovisual Analysis of Engagement and Emotions of Virtual Communication Participants</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Двойникова</surname><given-names>А. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Dvoynikova</surname><given-names>A. A.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Анастасия Александровна Двойникова — СПИИРАН, лаборатория речевых и многомодальных интерфейсов; мл. научный сотрудник</p></bio><bio xml:lang="en"><p>Anastasia A. Dvoynikova — St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Speech and Multimodal Interfaces, Junior Researcher</p></bio><email xlink:type="simple">dvoynikova.a@iias.spb.su</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Карпов</surname><given-names>А. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Karpov</surname><given-names>A. A.</given-names></name></name-alternatives><bio xml:lang="ru"><p>Алексей Анатольевич Карпов — д-р техн. наук, профессор; СПИИРАН, лаборатория речевых и многомодальных интерфейсов; руководитель лаборатории</p></bio><bio xml:lang="en"><p>Alexey A. Karpov — Dr. Sci., Professor; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Speech and Multimodal Interfaces; Head of the Laboratory</p></bio><email xlink:type="simple">karpov@iias.spb.su</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Санкт-Петербургский федеральный исследовательский центр Российской академии наук</institution></aff><aff xml:lang="en"><institution>St. Petersburg Federal Research Center of the RAS</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2024</year></pub-date><pub-date pub-type="epub"><day>07</day><month>12</month><year>2024</year></pub-date><volume>67</volume><issue>11</issue><fpage>984</fpage><lpage>993</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Национальный исследовательский университет ИТМО, 2024</copyright-statement><copyright-year>2024</copyright-year><copyright-holder xml:lang="ru">Национальный исследовательский университет ИТМО</copyright-holder><copyright-holder xml:lang="en">Национальный исследовательский университет ИТМО</copyright-holder><license xlink:href="https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice" xlink:type="simple"><license-p>https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice</license-p></license></permissions><self-uri xlink:href="https://pribor.ifmo.ru/jour/article/view/317">https://pribor.ifmo.ru/jour/article/view/317</self-uri><abstract><p>Представлена методика создания многомодальных корпусов данных, предназначенных для анализа поведенческих проявлений участников виртуальной коммуникации. Предложенная методика направлена на создание корпусов данных групповой коммуникации (более двух собеседников) с использованием систем телеконференций и учитывает особенности естественных проявлений поведенческих аспектов (вовлеченности и эмоций) участников разговора. Выделенные особенности составляют новизну предложенной методики. Методика состоит из трех основных этапов — подготовительного, записи и аннотирования данных. Методика была апробирована и валидирована при создании нового многомодального корпуса данных ENERGI, содержащего русскоязычные аудиовизуальные записи групповой коммуникации участников с помощью систем телеконференций. Созданный корпус предназначен для решения задач распознавания вовлеченности участников в коммуникацию, а также анализа проявления эмоций во время диалога. Предложенная методика является универсальной и может быть применима для сбора различных корпусов данных виртуальной коммуникации.</p></abstract><trans-abstract xml:lang="en"><p>A method is presented for creating multimodal data bases designed to analyze behavioral manifestations of virtual communication participants. The proposed methodology is aimed at developing database of group communication (more than two interlocutors) using teleconference systems. The technique also takes into account the peculiarities of the natural manifestations of behavioral aspects (engagement and emotions) of the participants in the conversation. The identified features constitute the novelty of the proposed technique. The technique consists of three main stages — preparatory, recording, and annotation of data. The technique is tested and validated when creating a new multimodal data corpus ENERGI, containing Russian-language audiovisual recordings of group communication of participants using teleconferencing systems. The created corpus is designed to solve the problems of recognizing the involvement of participants in communication, as well as analyzing the manifestation of emotions during a dialogue. The proposed technique is universal and can be applied to collecting various corpora of virtual communication data.</p></trans-abstract><kwd-group xml:lang="ru"><kwd>методика создания корпусов данных</kwd><kwd>многомодальный корпус</kwd><kwd>анализ вовлеченности</kwd><kwd>анализ эмоций</kwd><kwd>аннотирование данных</kwd><kwd>виртуальная коммуникация</kwd></kwd-group><kwd-group xml:lang="en"><kwd>methodology for database creating</kwd><kwd>multimodal database</kwd><kwd>engagement analysis</kwd><kwd>emotion analysis</kwd><kwd>data annotation</kwd><kwd>virtual communication</kwd></kwd-group><funding-group><funding-statement xml:lang="ru">работа выполнена в рамках бюджетной темы № FFZF-2022-0005.</funding-statement></funding-group></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Ткаченя А. В., Давыдов А. Г., Киселёв В. В., Хитров М. В. Классификация эмоционального состояния диктора с использованием метода опорных векторов и критерия Джини // Изв. вузов. Приборостроение. 2013. Т. 56, № 2. С. 61–66</mixed-citation><mixed-citation xml:lang="en">Tkachenya A.V., Davydov A.G., Kiselev V.V., Khitrov M.V. Journal of Instrument Engineering, 2013, no. 2(56), pp. 61–66. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Cafaro A., Wagne, J., Baur T., Dermouche S., Torres Torres M. et al. The NoXi database: multimodal recordings of mediated novice-expert interactions // Proc. of the 19th ACM Intern. Conf. on Multimodal Interaction. 2017. P. 350–359. DOI: 10.1145/3136755.313678.</mixed-citation><mixed-citation xml:lang="en">Cafaro A., Wagne J., Baur T., Dermouche S., Torres Torres M. et al. Proc. of the 19th ACM Intern. Conf. on Multimodal Interaction, 2017, рр. 350–359, DOI: 10.1145/3136755.313678.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Guhan P., Agarwal M., Awasthi N., Reeves G., Manocha D. et al. ABC-Net: Semi-Supervised Multimodal GAN-based Engagement Detection using an Affective, Behavioral and Cognitive Model // arXiv preprint arXiv:2011.08690. 2020.</mixed-citation><mixed-citation xml:lang="en">Guhan P., Agarwal M., Awasthi N., Reeves G., Manocha D. et al. arXiv preprint arXiv:2011.08690, 2020.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Celiktutan O., Skordos E., Gunes H. Multimodal human-human-robot interactions (MHHRI) dataset for studying personality and engagement //IEEE Trans. on Affective Computing. 2017. Vol. 10, N 4. P. 484–497. DOI: 10.1109/TAFFC.2017.2737019.</mixed-citation><mixed-citation xml:lang="en">Celiktutan O., Skordos E., Gunes H. IEEE Transactions on Affective Computing, 2017, no. 4(10), pp. 484–497, DOI: 10.1109/TAFFC.2017.2737019.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Ringeval F., Sonderegger A., Sauer J., Lalanne D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions // Proc. of the 10th IEEE Intern. Conf. and Workshops on Automatic Face and Gesture Recognition. 2013. P. 1–8. DOI: 10.1109/FG.2013.6553805.</mixed-citation><mixed-citation xml:lang="en">Ringeval F., Sonderegger A., Sauer J., Lalanne D. Proc. of the 10th IEEE Intern. Conf. and Workshops on Automatic Face and Gesture Recognition, 2013, рр. 1–8, DOI: 10.1109/FG.2013.6553805.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Kaur A., Mustafa A., Mehta L., Dhall A. Prediction and localization of student engagement in the wild // Digital Image Computing: Techniques and Applications (DICTA). 2018. P. 1–8. DOI: 10.1109/DICTA.2018.8615851.</mixed-citation><mixed-citation xml:lang="en">Kaur A., Mustafa A., Mehta L., Dhall A. 2018 Digital Image Computing: Techniques and Applications (DICTA), 2018, рр. 1–8, DOI: 10.1109/DICTA.2018.8615851.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Gupta A., D’Cunha A., Awasthi K., Balasubramanian V. DAiSEE: Towards user engagement recognition in the wild // arXiv preprint arXiv:1609.01885. 2016.</mixed-citation><mixed-citation xml:lang="en">Gupta A., D'Cunha A., Awasthi K., Balasubramanian V. arXiv preprint arXiv:1609.01885, 2016.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Sümer Ö., Goldberg P., D’Mello S., Gerjets P., Trautwein U., Kasneci E. Multimodal engagement analysis from facial videos in the classroom // IEEE Trans. on Affective Computing. 2021. Vol. 14, N 2. P. 1012–1027. DOI: 10.1109/TAFFC.2021.3127692.</mixed-citation><mixed-citation xml:lang="en">Sümer Ö., Goldberg P., D’Mello S., Gerjets P., Trautwein U., Kasneci E. IEEE Transactions on Affective Computing, 2021, no. 2(14), pp. 1012–1027, DOI: 10.1109/TAFFC.2021.3127692.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Whitehill J., Serpell Z., Lin Y. C., Foster A., Movellan J. R. The faces of engagement: Automatic recognition of student engagementfrom facial expressions // IEEE Trans. on Affective Computing. 2014. Vol. 5, N 1. P. 86–98. DOI: 10.1109/TAFFC.2014.2316163.</mixed-citation><mixed-citation xml:lang="en">Whitehill J., Serpell Z., Lin Y.C., Foster A., Movellan J.R. IEEE Transactions on Affective Computing, 2014, no. 1(5), pp. 86–98, DOI: 10.1109/TAFFC.2014.2316163.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Psaltis A., Apostolakis K. C., Dimitropoulos K., Daras P. Multimodal student engagement recognition in prosocial games // IEEE Trans. on Games. 2017. Vol. 10, N 3. P. 292–303. DOI: 10.1109/TCIAIG.2017.2743341.</mixed-citation><mixed-citation xml:lang="en">Psaltis A., Apostolakis K. C., Dimitropoulos K., Daras P. IEEE Transactions on Games, 2017, no. 3(10), pp. 292–303, DOI: 10.1109/TCIAIG.2017.2743341.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Двойникова А. А., Кагиров И. А., Карпов А. А. Аналитический обзор методов автоматического распознавания вовлеченности пользователя в виртуальную коммуникацию // Информационно-управляющие системы. 2022. № 5(120). С. 12–22. DOI: 10.31799/1684-8853-2022-5-12-22.</mixed-citation><mixed-citation xml:lang="en">Dvoynikova A.A., Kagirov I.A., Karpov A.A. Information and Control Systems, 2022, no. 5(120), pp. 12–22, DOI: 10.31799/1684-8853-2022-5-12-22. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Двойникова А. А., Маркитантов М. В., Рюмина Е. В., Уздяев М. Ю., Величко А. Н. и др. Анализ информационного и математического обеспечения для распознавания аффективных состояний человека // Информатика и автоматизация. 2022. Т. 21, № 6. С. 1097–1144. DOI: 10.15622/ia.21.6.2.</mixed-citation><mixed-citation xml:lang="en">Dvoynikova A.A., Markitantov M.V., Ryumina E.V., Uzdyaev M.Yu., Velichko A.N. et al. Informatics and Automation, 2022, no. 6(21), pp. 1097–1144, DOI: 10.15622/ia.21.6.2 (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Dhall A., Goecke R., Gedeon T. Collecting large, richly annotated facial-expression databases from movies // Journal of Latex Class Files. 2007. Vol. 6, N 1.</mixed-citation><mixed-citation xml:lang="en">Dhall A., Goecke R., Gedeon T. Journal of latex class files, 2007, no. 1(6).</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Kollias D., Zafeiriou S. Aff-wild2: Extending the aff-wild database for affect recognition // arXiv preprint arXiv:1811.07770. 2018.</mixed-citation><mixed-citation xml:lang="en">Kollias D., Zafeiriou S. arXiv preprint arXiv:1811.07770, 2018.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Busso C., Bulut M., Lee C. C., Kazemzadeh A., Mower E. et al. IEMOCAP: Interactive emotional dyadic motion capture database // Language Resources and Evaluation. 2008. Vol. 42, N 4. P. 335–359. DOI: 10.1007/s10579-008-9076-6.</mixed-citation><mixed-citation xml:lang="en">Busso C., Bulut M., Lee C.C., Kazemzadeh A., Mower E. et al. Language Resources and Evaluation, 2008, no. 4(42), pp. 335–359, DOI: 10.1007/s10579-008-9076-6.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Poria S., Hazarika D., Majumder N., Naik G., Cambria E. et al. Meld: A multimodal multi-party dataset for emotion recognition in conversations // Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 527–536.</mixed-citation><mixed-citation xml:lang="en">Poria S., Hazarika D., Majumder N., Naik G., Cambria E. et al. Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, рр. 527–536.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Zadeh A. B., Liang P. P., Poria S., Cambria E., Morency L. P. Multimodal Language Analysis in the Wild: CMU MOSEI Dataset and Interpretable Dynamic Fusion Graph // Proc. of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. P. 2236–2246. DOI: 10.18653/v1/P18-1208.</mixed-citation><mixed-citation xml:lang="en">Zadeh A.B., Liang P.P., Poria S., Cambria E., Morency L.P. Proc. of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, рр. 2236–2246, DOI: 10.18653/v1/P18-1208.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing // Proc. of the Intern. Conf. on Speech and Computer. 2018. P. 501–510. DOI: 10.1007/978-3-319-99579-3_52.</mixed-citation><mixed-citation xml:lang="en">Perepelkina O., Kazimirova E., Konstantinova M. Proc. of the Intern. Conf. on Speech and Computer, 2018, рр. 501– 510, DOI: 10.1007/978-3-319-99579-3_52.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">Jones S. R. G. Was there a Howthorne effect? // American Journal of Sociology. 1992. Vol. 98, N 3. P. 451–468.</mixed-citation><mixed-citation xml:lang="en">Jones S.R.G. American Journal of sociology, 1992, no. 3(98), pp. 451–468.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">Viola P., Jones M. Rapid Object Detection using a Boosted Cascade of Simple Features // Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. (CVPR). 2001. Vol. 1. P. I–I. DOI: 10.1109/CVPR.2001.990517.</mixed-citation><mixed-citation xml:lang="en">Viola P., Jones M. Proc. of the 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), 2001, vol. 1, рр. I-I, DOI: 10.1109/CVPR.2001.990517.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Pat. 3069654 USA. Method and means for recognizing complex patterns / P. V. C. Hough. 1962 [Электронный ресурс]: https://patents.google.com/patent/US3069654.</mixed-citation><mixed-citation xml:lang="en">Patent USA 3069654, Method and means for recognizing complex, P.V.C. Hough, Priority 1962.</mixed-citation></citation-alternatives></ref><ref id="cit22"><label>22</label><citation-alternatives><mixed-citation xml:lang="ru">Lausberg H., Sloetjes H. Coding gestural behavior with the NEUROGES-ELAN system // Behavior Research Methods. 2009. Vol. 41, N 3. P. 841–849. DOI: 10.3758/BRM.41.3.841.</mixed-citation><mixed-citation xml:lang="en">Lausberg H., Sloetjes H. Behavior research methods, 2009, no. 3(41), pp. 841–849, DOI: 10.3758/BRM.41.3.841</mixed-citation></citation-alternatives></ref><ref id="cit23"><label>23</label><citation-alternatives><mixed-citation xml:lang="ru">Люсин Д. В. Новая методика для измерения эмоционального интеллекта: опросник ЭмИн // Психологическая диагностика. 2006. Т. 4. С. 3–22.</mixed-citation><mixed-citation xml:lang="en">Lyusin D.V. Psychological diagnostics, 2006, vol. 4, рр. 3–22. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit24"><label>24</label><citation-alternatives><mixed-citation xml:lang="ru">Люсин Д. В., Овсянникова В. В. Измерение способности к распознаванию эмоций с помощью видеотеста // Психологический журнал. 2013. Т. 34, № 6. С. 82–94.</mixed-citation><mixed-citation xml:lang="en">Lyusin D.V., Ovsyannikova V.V. Psychological journal, 2013, no. 6(34), pp. 82–94. (in Russ.)</mixed-citation></citation-alternatives></ref><ref id="cit25"><label>25</label><citation-alternatives><mixed-citation xml:lang="ru">Свид. о рег. № 2023624954. База данных проявлений вовлеченности и эмоций русскоязычных участников те леконференций (ENERGI — ENgagement and Emotion Russian Gathering Interlocutors) / А. А. Двойникова, А. А. Карпов. 25.12.2023.</mixed-citation><mixed-citation xml:lang="en">Certificate of registration of the database 2023624954, Baza dannykh proyavleniy vovlechennosti i emotsiy russkoyazychnykh uchastnikov telekonferentsiy (ENERGI — ENgagement and Emotion Russian Gathering Interlocutors) (Database of Manifestations of Engagement and Emotions of Russian-Speaking Participants in Teleconferences (ENERGI —ENgagement and Emotion Russian Gathering Interlocutors)), A.A. Dvoynikova, A.A. Karpov, Priority 25.12.2023. (in Russ.)</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
