References

pribor

Известия высших учебных заведений. Приборостроение

Journal of Instrument Engineering

0021-34542500-0381

Национальный исследовательский университет ИТМО

10.17586/0021-3454-2023-66-10-818-827

pribor-180

Research Article

ИНФОРМАТИКА И ИНФОРМАЦИОННЫЕ ПРОЦЕССЫ

INFORMATICS AND INFORMATION PROCESSES

Подход к автоматическому распознаванию эмоций в транскрипциях речи

Approach to automatic recognition of emotions in speech transcriptions

Двойникова

А. А.

Dvoynikova

A. A.

Анастасия Александровна Двойникова - лаборатория речевых и многомодальных интерфейсов; мл. научный сотрудник

Санкт-Петербург

Anastasia A. Dvoynikova - Speech and Multimodal Interfaces Laboratory; Junior Researcher

St. Petersburg

dvoynikova.a@iias.spb.su

Кондратенко

К. О.

Kondratenko

K. K.

Кристина Олеговна Кондратенко - бакалавр; кафедра фонетики и методики преподавания иностранных языков

Санкт-Петербург

Khrystyna O. Kondratenko - Bachelor

St. Petersburg

st076959@student.spbu.ru

Санкт-Петербургский Федеральный исследовательский центр РАНSt. Petersburg Federal Research Center of the RAS

Санкт-Петербургский государственный университетSt. Petersburg State University

2023

29112024

6610818827

2024

Национальный исследовательский университет ИТМО

https://pribor.ifmo.ru/jour/about/submissions#copyrightNotice

https://pribor.ifmo.ru/jour/article/view/180

Исследован актуальный в различных областях вопрос распознавания эмоций в транскрипциях речи. Проанализировано влияние методов предобработки (удаление стоп-слов, лемматизация, стемминг) на точность распознавания эмоций в текстовых данных на русском и английском языках. Для проведения экспериментальных исследований использовались орфографические транскрипции диалогов из многомодальных корпусов RAMAS и CMU-MOSEI на русском и английском языке соответственно. Аннотирование этих корпусов выполнялось по следующим эмоциям: радость, удивление, страх, злость, грусть, отвращение и нейтральное состояние. Предобработка текстовых данных включала в себя удаление знаков пунктуации и стоп-слов, токенизацию, лемматизацию и стемминг. Векторизация полученного материала была осуществлена при помощи методов TF-IDF, BoW, Word2Vec. В качестве классификаторов выступили метод опорных векторов и логистическая регрессия. Разработан подход автоматического распознавания эмоций в текстовых данных, представляющий собой комбинацию методов. Для русского языка достигнута наибольшая точность распознавания эмоций по взвешенной F-мере = 92,63 %, для английского языка — 47,21 %. Кроме того, проведены исследования по выявлению количества удаленных стоп-стоп для эффективного распознавания эмоций по текстовым данным. Результаты экспериментов показывают, что сохранение стоп-слов в исходном тексте позволяет достичь наиболее высокой точности классификации текстов.

The issue of recognizing emotions in speech transcriptions, which is relevant in various fields, is studied. The influence of preprocessing methods (stop word removal, lemmatization, stemming) on the accuracy of emotion recognition in text data in Russian and English is analyzed. To conduct experimental studies, orthographic transcriptions of dialogues from the multimodal corpora RAMAS and CMU-MOSEI in Russian and English, respectively, are used. These corpora are annotated for the following emotions: joy, surprise, fear, anger, sadness, disgust and neutral. Preprocessing of text data includes removal of punctuation marks and stop words, tokenization, lemmatization and stemming. Vectorization of the resulting material is carried out using the TF-IDF, BoW, Word2Vec methods. The used classifiers are support vector machines and logistic regression. An approach is developed that is a combination of the above methods. For the Russian language, the highest accuracy of emotion recognition achieved using a weighted F-measure is 92.63 %, for the English language – 47.21 %. In addition, studies are conducted to identify the number of remote stops for effective emotion recognition from text data. Experimental results show that storing stop words in the source text allows to achieve the highest accuracy of text classification.

распознавание эмоцийметоды предобработки текстовых данныхудаление стоп-словмногоклассовая классификацияанализ текстовых данных

Работа выполнена в рамках проекта Российского научного фонда (раздел „Подход к классификации текстовых данных по эмоциям“ выполнен в рамках проекта № 22-11-00321), остальные исследования выполнены частично в рамках ведущей научной школы РФ (грант № НШ-17.2022.1.6) и бюджетной темы СПб ФИЦ РАН (№ FFZF-2022-0005).

The work was carried out within the framework of a project of the Russian Science Foundation (section “An approach to classifying text data by emotions” was carried out within the framework of project No. 22-11-00321), the rest of the research was carried out partially within the framework of the leading scientific school of the Russian Federation (grant No. NSh-17.2022.1.6) and the budget theme of the St. Petersburg Federal Research Center of the Russian Academy of Sciences (No. FFZF-2022-0005).

References1

Acheampong F. A., Wenyu C., Nunoo-Mensah H. Text-based emotion detection: Advances, challenges, and opportunities // Engineering Reports. 2020. Vol. 2, N 7. P. e12189. DOI: 10.1002/eng2.12189.

Acheampong F.A., Wenyu C., Nunoo-Mensah H. Engineering Reports, 2020, no. 7(2), pp. e12189, DOI: 10.1002/eng2.12189.

Dzedzickis A., Kaklauskas A., Bucinskas V. Human emotion recognition: Review of sensors and methods // Sensors. 2020. Vol. 20, N 3. P. 592. DOI: 10.3390/s20030592.

Dzedzickis A., Kaklauskas A., Bucinskas V. Sensors, 2020, no. 3(20), pp. 592, DOI: 10.3390/s20030592.

Рюмина Е. В., Карпов А. А. Аналитический обзор методов распознавания эмоций по выражениям лица человека // Научно-технический вестник информационных технологий, механики и оптики. 2020. Т. 20, № 2. С. 163—176. DOI: 10.17586/2226-1494-2020-20-2-163-176.

Ryumina E.V., Karpov A.A. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, no. 2(20), pp. 163–176, DOI: 10.17586/2226-1494-2020-20-2-163-176. (in Russ.)

Мубаракшина Р. Т., Яковенко Р. Т. Обзор подходов к проблеме распознавания эмоций по параметрам устной речи // Системный анализ в проектировании и управлении. 2019. Т. 23, № 1. С. 392—397.

Mubarakshina R.T., Yakovenko R.T. Sistemnyy analiz v proyektirovanii i upravlenii (System Analysis in Design and Management), 2019, no. 1(23), pp. 392–397. (in Russ.)

Богданов А. Л., Дуля И. С. Сентимент-анализ коротких русскоязычных текстов в социальных медиа // Вестник Томского государственного университета. Экономика. 2019. № 47. С. 220—241. DOI: 10.17223/19988648/47/17.

Bogdanov A.L., Dulya I.S. Tomsk State University Journal of Economics, 2019, no. 47, pp. 220–241, DOI: 10.17223/19988648/47/17. (in Russ.)

Дюличева Ю. Ю. Учебная аналитика МООК как инструмент анализа математической тревожности // Вопросы образования. 2021. № 4. С. 243—265. DOI: 10.17323/1814-9545-2021-4-243-265.

Dyulicheva Yu. Voprosy obrazovaniya (Educational Studies), 2021, no. 4, pp. 243–265, DOI: 10.17323/1814-9545-2021-4-243-265. (in Russ.)

Adoma A. F., Henry N. M., Chen W. Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition // 2020 17th Intern. Computer Conf. on Wavelet Active Media Technology and Information Processing (ICCWAMTIP). 2020. P. 117—121. DOI: 10.1109/iccwamtip51612.2020.9317379.

Adoma A.F., Henry N.M., Chen W. 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2020. рр. 117–121, DOI: 10.1109/iccwamtip51612.2020.9317379.

Verkholyak O., Dvoynikova A., Karpov A. A Bimodal Approach for Speech Emotion Recognition using Audio and Text // J. Internet Serv. Inf. Secur. 2021. Vol. 11, N 1. P. 80—96.

Verkholyak O., Dvoynikova A., Karpov A. J. Internet Serv. Inf. Secur., 2021, no. 1(11), pp. 80–96.

Liu Y., Fu G. Emotion recognition by deeply learned multi-channel textual and EEG features // Future Generation Computer Systems. 2021. Vol. 119. P. 1—6. DOI: 10.1016/j.future.2021.01.010.

Liu Y., Fu G. Future Generation Computer Systems, 2021, vol. 119, рр. 1–6, DOI: 10.1016/j.future.2021.01.010.

Овсянникова В. В. К вопросу о классификации эмоций: категориальный и многомерный подходы // Финансовая аналитика: проблемы и решения. 2013. Т. 37, № 175. С. 43—48.

Ovsyannikova V.V. Financial Analytics: Science and Experience, 2013, no. 175(37), pp. 43–48. (in Russ.)

Ekman P. Basic emotions // Handbook of cognition and emotion. 1999. P. 45—60.

Ekman P. Handbook of cognition and emotion, 1999, pp. 45–60.

Изард К. Э. Психология эмоций. СПб: Питер, 1999. 464 с.

Izard C.E. The psychology of emotions, NY, London, Plenum Press, 1991.

Sogancioglu G., Verkholyak O., Kaya H., Fedotov D., Cadée T., Salah A. A., Karpov A. Is Everything Fine, Grandma? Acoustic and Linguistic Modeling for Robust Elderly Speech Emotion Recognition // INTERSPEECH. 2020. P. 2097—2101. DOI: 10.21437/interspeech.2020-3160.

Sogancioglu G., Verkholyak O., Kaya H., Fedotov D., Cadée T., Salah A. A., Karpov A. INTERSPEECH, 2020, рр. 2097–2101, DOI: 10.21437/interspeech.2020-3160.

Russell J. A. Culture and the categorization of emotions // Psychological bulletin. 1991. Vol. 110, N 3. P. 426—450. DOI: 10.1037/0033-2909.110.3.426.

Russell J.A. Psychological bulletin, 1991, no. 3(110), pp. 426–450, DOI: 10.1037/0033-2909.110.3.426.

Двойникова А. А., Карпов А. А. Аналитический обзор подходов к распознаванию тональности русскоязычных текстовых данных // Информационно-управляющие системы. 2020. № 4(107). С. 20—30. DOI:10.31799/1684-8853-2020-4-20-30.

Dvoynikova A.A., Karpov А.А. Information and Control Systems, 2020, no. 4(107), pp. 20–30, DOI:10.31799/1684-8853-2020-4-20-30. (in Russ.)

Henry E. R., Hofrichter J. Singular value decomposition: Application to analysis of experimental data // Methods in enzymology. Academic Press, 1992. Vol. 210. P. 129—192. DOI: 10.1016/0076-6879(92)10010-B.

Henry E.R., Hofrichter J. Methods in enzymology, Academic Press, 1992, vol. 210, рр. 129–192, DOI: 10.1016/0076-6879(92)10010-B.

Pennington J., Socher R., Manning C. D. Glove: Global vectors for word representation // Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). 2014. P. 1532—1543. DOI: 10.3115/v1/d14-1162.

Pennington J., Socher R., Manning C.D. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, рр. 1532–1543, DOI: 10.3115/v1/d14-1162.

Bojanowski P., Grave E., Joulin A., Mikolov T. Enriching word vectors with subword information // Transactions of the association for computational linguistics. 2017. Vol. 5. P. 135—146. DOI: 10.1162/tacl_a_00051.

Bojanowski P., Grave E., Joulin A., Mikolov T. Transactions of the association for computational linguistics, 2017, vol. 5, рр. 135–146, DOI: 10.1162/tacl_a_00051.

Mikolov T., Sutskever I., Chen K., Corrado G. S., Dean J. Distributed representations of words and phrases and their compositionality // Advances in neural information processing systems. 2013. Vol. 26. P. 1—9.

Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J. Advances in neural information processing systems, 2013, vol. 26, рр. 1–9.

Devlin J., Chang M. W. Lee K., Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding // arXiv preprint arXiv:1810.04805. 2018. DOI: 10.48550/arXiv.1810.04805.

Devlin J., Chang M.W., Lee K., Toutanova K. arXiv preprint arXiv:1810.04805, 2018, DOI: 10.48550/arXiv.1810.04805.

Peters M., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettle-moyer L. Deep contextualized word representations // Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018. Vol. 1. P. 2227—2237.

Peters M., Neumann M., Iyyer M., Gardner M., Clark C., Lee K., Zettle-moyer L. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, vol. 1, рр. 2227–2237.

Halim L. R., Suryadibrata A. Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine // IJNMT (International Journal of New Media Technology). 2021. Vol. 8, N 1. P. 57—64. DOI: 10.31937/ijnmt.v8i1.2047.

Halim L.R., Suryadibrata A. IJNMT (International Journal of New Media Technology), 2021, no. 1(8), pp. 57–64, DOI: 10.31937/ijnmt.v8i1.2047.

Duong H. T., Nguyen-Thi T. A. A review: preprocessing techniques and data augmentation for sentiment analysis // Computational Social Networks. 2021. Vol. 8, N 1. P. 1—16. DOI: 10.1186/s40649-020-00080-x.

Duong H.T., Nguyen-Thi T.A. Computational Social Networks, 2021, no. 1(8), pp. 1–16, DOI: 10.1186/s40649-020-00080-x.

Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing // Intern. Conf. on Speech and Computer. Springer, Cham, 2018. P. 501—510. DOI: 10.1007/978-3-319-99579-3_52.

Perepelkina O., Kazimirova E., Konstantinova M. International Conference on Speech and Computer, Springer, Cham, 2018, рр. 501–510, DOI: 10.1007/978-3-319-99579-3_52.

Двойникова А. А., Верхоляк О. В., Карпов А. А. Сентимент-анализ разговорной речи при помощи метода, основанного на тональных словарях // Альманах научных работ молодых ученых Университета ИТМО. 2020. Т. 3. С. 75—80.

Dvoynikova A.A., Verkholyak О.V., Karpov А.А. Almanac of Scientific Works of Young Scientists of ITMO University, 2020, vol. 3, рр. 75–80. (in Russ.)

Zadeh A. B., Liang P. P., Poria S., Cambria E., Morency L. P. Multimodal language analysis in the wild: Cmumosei dataset and interpretable dynamic fusion graph // Proc. of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. Vol. 1: Long Papers. P. 2236—2246. DOI: 10.18653/v1/p18-1208.

Zadeh A.B., Liang P.P., Poria S., Cambria E., Morency L.P. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, vol. 1, Long Papers, рр. 2236–2246, DOI: 10.18653/v1/p18-1208.

The authors declare that there are no conflicts of interest present.