References

pribor

Известия высших учебных заведений. Приборостроение

Journal of Instrument Engineering

0021-34542500-0381

Национальный исследовательский университет ИТМО

10.17586/0021-3454-2024-67-11-984-993

pribor-317

Research Article

МЕТОДИЧЕСКОЕ И ПРОГРАММНО-ИНФОРМАЦИОННОЕ ОБЕСПЕЧЕНИЕ ФУНКЦИОНИРОВАНИЯ АВТОМАТИЗИРОВАННЫХ СИСТЕМ

METHODOLOGICAL AND SOFTWARE-INFORMATION SUPPORT FOR THE FUNCTIONING OF AUTOMATED SYSTEMS

Методика создания многомодальных корпусов данных для аудио-визуального анализа вовлеченности и эмоций участников виртуальной коммуникации

Method of Creating Multimodal Databases for Audiovisual Analysis of Engagement and Emotions of Virtual Communication Participants

Двойникова

А. А.

Dvoynikova

A. A.

Анастасия Александровна Двойникова — СПИИРАН, лаборатория речевых и многомодальных интерфейсов; мл. научный сотрудник

Anastasia A. Dvoynikova — St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Speech and Multimodal Interfaces, Junior Researcher

dvoynikova.a@iias.spb.su

Карпов

А. А.

Karpov

A. A.

Алексей Анатольевич Карпов — д-р техн. наук, профессор; СПИИРАН, лаборатория речевых и многомодальных интерфейсов; руководитель лаборатории

Alexey A. Karpov — Dr. Sci., Professor; St. Petersburg Institute for Informatics and Automation of the RAS, Laboratory of Speech and Multimodal Interfaces; Head of the Laboratory

karpov@iias.spb.su

Санкт-Петербургский федеральный исследовательский центр Российской академии наукSt. Petersburg Federal Research Center of the RAS

2024

07122024

6711984993

2024

Национальный исследовательский университет ИТМО

https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice

https://pribor.ifmo.ru/jour/article/view/317

Представлена методика создания многомодальных корпусов данных, предназначенных для анализа поведенческих проявлений участников виртуальной коммуникации. Предложенная методика направлена на создание корпусов данных групповой коммуникации (более двух собеседников) с использованием систем телеконференций и учитывает особенности естественных проявлений поведенческих аспектов (вовлеченности и эмоций) участников разговора. Выделенные особенности составляют новизну предложенной методики. Методика состоит из трех основных этапов — подготовительного, записи и аннотирования данных. Методика была апробирована и валидирована при создании нового многомодального корпуса данных ENERGI, содержащего русскоязычные аудиовизуальные записи групповой коммуникации участников с помощью систем телеконференций. Созданный корпус предназначен для решения задач распознавания вовлеченности участников в коммуникацию, а также анализа проявления эмоций во время диалога. Предложенная методика является универсальной и может быть применима для сбора различных корпусов данных виртуальной коммуникации.

A method is presented for creating multimodal data bases designed to analyze behavioral manifestations of virtual communication participants. The proposed methodology is aimed at developing database of group communication (more than two interlocutors) using teleconference systems. The technique also takes into account the peculiarities of the natural manifestations of behavioral aspects (engagement and emotions) of the participants in the conversation. The identified features constitute the novelty of the proposed technique. The technique consists of three main stages — preparatory, recording, and annotation of data. The technique is tested and validated when creating a new multimodal data corpus ENERGI, containing Russian-language audiovisual recordings of group communication of participants using teleconferencing systems. The created corpus is designed to solve the problems of recognizing the involvement of participants in communication, as well as analyzing the manifestation of emotions during a dialogue. The proposed technique is universal and can be applied to collecting various corpora of virtual communication data.

методика создания корпусов данныхмногомодальный корпусанализ вовлеченностианализ эмоцийаннотирование данныхвиртуальная коммуникация

methodology for database creatingmultimodal databaseengagement analysisemotion analysisdata annotationvirtual communication

работа выполнена в рамках бюджетной темы № FFZF-2022-0005.

References1

Ткаченя А. В., Давыдов А. Г., Киселёв В. В., Хитров М. В. Классификация эмоционального состояния диктора с использованием метода опорных векторов и критерия Джини // Изв. вузов. Приборостроение. 2013. Т. 56, № 2. С. 61–66

Tkachenya A.V., Davydov A.G., Kiselev V.V., Khitrov M.V. Journal of Instrument Engineering, 2013, no. 2(56), pp. 61–66. (in Russ.)

Cafaro A., Wagne, J., Baur T., Dermouche S., Torres Torres M. et al. The NoXi database: multimodal recordings of mediated novice-expert interactions // Proc. of the 19th ACM Intern. Conf. on Multimodal Interaction. 2017. P. 350–359. DOI: 10.1145/3136755.313678.

Cafaro A., Wagne J., Baur T., Dermouche S., Torres Torres M. et al. Proc. of the 19th ACM Intern. Conf. on Multimodal Interaction, 2017, рр. 350–359, DOI: 10.1145/3136755.313678.

Guhan P., Agarwal M., Awasthi N., Reeves G., Manocha D. et al. ABC-Net: Semi-Supervised Multimodal GAN-based Engagement Detection using an Affective, Behavioral and Cognitive Model // arXiv preprint arXiv:2011.08690. 2020.

Guhan P., Agarwal M., Awasthi N., Reeves G., Manocha D. et al. arXiv preprint arXiv:2011.08690, 2020.

Celiktutan O., Skordos E., Gunes H. Multimodal human-human-robot interactions (MHHRI) dataset for studying personality and engagement //IEEE Trans. on Affective Computing. 2017. Vol. 10, N 4. P. 484–497. DOI: 10.1109/TAFFC.2017.2737019.

Celiktutan O., Skordos E., Gunes H. IEEE Transactions on Affective Computing, 2017, no. 4(10), pp. 484–497, DOI: 10.1109/TAFFC.2017.2737019.

Ringeval F., Sonderegger A., Sauer J., Lalanne D. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions // Proc. of the 10th IEEE Intern. Conf. and Workshops on Automatic Face and Gesture Recognition. 2013. P. 1–8. DOI: 10.1109/FG.2013.6553805.

Ringeval F., Sonderegger A., Sauer J., Lalanne D. Proc. of the 10th IEEE Intern. Conf. and Workshops on Automatic Face and Gesture Recognition, 2013, рр. 1–8, DOI: 10.1109/FG.2013.6553805.

Kaur A., Mustafa A., Mehta L., Dhall A. Prediction and localization of student engagement in the wild // Digital Image Computing: Techniques and Applications (DICTA). 2018. P. 1–8. DOI: 10.1109/DICTA.2018.8615851.

Kaur A., Mustafa A., Mehta L., Dhall A. 2018 Digital Image Computing: Techniques and Applications (DICTA), 2018, рр. 1–8, DOI: 10.1109/DICTA.2018.8615851.

Gupta A., D’Cunha A., Awasthi K., Balasubramanian V. DAiSEE: Towards user engagement recognition in the wild // arXiv preprint arXiv:1609.01885. 2016.

Gupta A., D'Cunha A., Awasthi K., Balasubramanian V. arXiv preprint arXiv:1609.01885, 2016.

Sümer Ö., Goldberg P., D’Mello S., Gerjets P., Trautwein U., Kasneci E. Multimodal engagement analysis from facial videos in the classroom // IEEE Trans. on Affective Computing. 2021. Vol. 14, N 2. P. 1012–1027. DOI: 10.1109/TAFFC.2021.3127692.

Sümer Ö., Goldberg P., D’Mello S., Gerjets P., Trautwein U., Kasneci E. IEEE Transactions on Affective Computing, 2021, no. 2(14), pp. 1012–1027, DOI: 10.1109/TAFFC.2021.3127692.

Whitehill J., Serpell Z., Lin Y. C., Foster A., Movellan J. R. The faces of engagement: Automatic recognition of student engagementfrom facial expressions // IEEE Trans. on Affective Computing. 2014. Vol. 5, N 1. P. 86–98. DOI: 10.1109/TAFFC.2014.2316163.

Whitehill J., Serpell Z., Lin Y.C., Foster A., Movellan J.R. IEEE Transactions on Affective Computing, 2014, no. 1(5), pp. 86–98, DOI: 10.1109/TAFFC.2014.2316163.

Psaltis A., Apostolakis K. C., Dimitropoulos K., Daras P. Multimodal student engagement recognition in prosocial games // IEEE Trans. on Games. 2017. Vol. 10, N 3. P. 292–303. DOI: 10.1109/TCIAIG.2017.2743341.

Psaltis A., Apostolakis K. C., Dimitropoulos K., Daras P. IEEE Transactions on Games, 2017, no. 3(10), pp. 292–303, DOI: 10.1109/TCIAIG.2017.2743341.

Двойникова А. А., Кагиров И. А., Карпов А. А. Аналитический обзор методов автоматического распознавания вовлеченности пользователя в виртуальную коммуникацию // Информационно-управляющие системы. 2022. № 5(120). С. 12–22. DOI: 10.31799/1684-8853-2022-5-12-22.

Dvoynikova A.A., Kagirov I.A., Karpov A.A. Information and Control Systems, 2022, no. 5(120), pp. 12–22, DOI: 10.31799/1684-8853-2022-5-12-22. (in Russ.)

Двойникова А. А., Маркитантов М. В., Рюмина Е. В., Уздяев М. Ю., Величко А. Н. и др. Анализ информационного и математического обеспечения для распознавания аффективных состояний человека // Информатика и автоматизация. 2022. Т. 21, № 6. С. 1097–1144. DOI: 10.15622/ia.21.6.2.

Dvoynikova A.A., Markitantov M.V., Ryumina E.V., Uzdyaev M.Yu., Velichko A.N. et al. Informatics and Automation, 2022, no. 6(21), pp. 1097–1144, DOI: 10.15622/ia.21.6.2 (in Russ.)

Dhall A., Goecke R., Gedeon T. Collecting large, richly annotated facial-expression databases from movies // Journal of Latex Class Files. 2007. Vol. 6, N 1.

Dhall A., Goecke R., Gedeon T. Journal of latex class files, 2007, no. 1(6).

Kollias D., Zafeiriou S. Aff-wild2: Extending the aff-wild database for affect recognition // arXiv preprint arXiv:1811.07770. 2018.

Kollias D., Zafeiriou S. arXiv preprint arXiv:1811.07770, 2018.

Busso C., Bulut M., Lee C. C., Kazemzadeh A., Mower E. et al. IEMOCAP: Interactive emotional dyadic motion capture database // Language Resources and Evaluation. 2008. Vol. 42, N 4. P. 335–359. DOI: 10.1007/s10579-008-9076-6.

Busso C., Bulut M., Lee C.C., Kazemzadeh A., Mower E. et al. Language Resources and Evaluation, 2008, no. 4(42), pp. 335–359, DOI: 10.1007/s10579-008-9076-6.

Poria S., Hazarika D., Majumder N., Naik G., Cambria E. et al. Meld: A multimodal multi-party dataset for emotion recognition in conversations // Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 527–536.

Poria S., Hazarika D., Majumder N., Naik G., Cambria E. et al. Proc. of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, рр. 527–536.

Zadeh A. B., Liang P. P., Poria S., Cambria E., Morency L. P. Multimodal Language Analysis in the Wild: CMU MOSEI Dataset and Interpretable Dynamic Fusion Graph // Proc. of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. P. 2236–2246. DOI: 10.18653/v1/P18-1208.

Zadeh A.B., Liang P.P., Poria S., Cambria E., Morency L.P. Proc. of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, рр. 2236–2246, DOI: 10.18653/v1/P18-1208.

Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing // Proc. of the Intern. Conf. on Speech and Computer. 2018. P. 501–510. DOI: 10.1007/978-3-319-99579-3_52.

Perepelkina O., Kazimirova E., Konstantinova M. Proc. of the Intern. Conf. on Speech and Computer, 2018, рр. 501– 510, DOI: 10.1007/978-3-319-99579-3_52.

Jones S. R. G. Was there a Howthorne effect? // American Journal of Sociology. 1992. Vol. 98, N 3. P. 451–468.

Jones S.R.G. American Journal of sociology, 1992, no. 3(98), pp. 451–468.

Viola P., Jones M. Rapid Object Detection using a Boosted Cascade of Simple Features // Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. (CVPR). 2001. Vol. 1. P. I–I. DOI: 10.1109/CVPR.2001.990517.

Viola P., Jones M. Proc. of the 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), 2001, vol. 1, рр. I-I, DOI: 10.1109/CVPR.2001.990517.

Pat. 3069654 USA. Method and means for recognizing complex patterns / P. V. C. Hough. 1962 [Электронный ресурс]: https://patents.google.com/patent/US3069654.

Patent USA 3069654, Method and means for recognizing complex, P.V.C. Hough, Priority 1962.

Lausberg H., Sloetjes H. Coding gestural behavior with the NEUROGES-ELAN system // Behavior Research Methods. 2009. Vol. 41, N 3. P. 841–849. DOI: 10.3758/BRM.41.3.841.

Lausberg H., Sloetjes H. Behavior research methods, 2009, no. 3(41), pp. 841–849, DOI: 10.3758/BRM.41.3.841

Люсин Д. В. Новая методика для измерения эмоционального интеллекта: опросник ЭмИн // Психологическая диагностика. 2006. Т. 4. С. 3–22.

Lyusin D.V. Psychological diagnostics, 2006, vol. 4, рр. 3–22. (in Russ.)

Люсин Д. В., Овсянникова В. В. Измерение способности к распознаванию эмоций с помощью видеотеста // Психологический журнал. 2013. Т. 34, № 6. С. 82–94.

Lyusin D.V., Ovsyannikova V.V. Psychological journal, 2013, no. 6(34), pp. 82–94. (in Russ.)

Свид. о рег. № 2023624954. База данных проявлений вовлеченности и эмоций русскоязычных участников те леконференций (ENERGI — ENgagement and Emotion Russian Gathering Interlocutors) / А. А. Двойникова, А. А. Карпов. 25.12.2023.

Certificate of registration of the database 2023624954, Baza dannykh proyavleniy vovlechennosti i emotsiy russkoyazychnykh uchastnikov telekonferentsiy (ENERGI — ENgagement and Emotion Russian Gathering Interlocutors) (Database of Manifestations of Engagement and Emotions of Russian-Speaking Participants in Teleconferences (ENERGI —ENgagement and Emotion Russian Gathering Interlocutors)), A.A. Dvoynikova, A.A. Karpov, Priority 25.12.2023. (in Russ.)

The authors declare that there are no conflicts of interest present.