Журнал Сибирского федерального университета. Математика и физика. Journal of Siberian Federal University, Mathematics & Physics / №4 2016

SPEECH-BASED EMOTION RECOGNITION AND SPEAKER IDENTIﬁCATION: STATIC VS. DYNAMIC MODE OF SPEECH REPRESENTATION (150,00 руб.)

Первый автор	Maxim Sidorov
Авторы	Minker Wolfgang, Semenkin EugeneS.
Страниц	6

150,00р

ID	576462
Аннотация	In this paper we present the performance of diﬀerent machine learning algorithms for the problems of speech-based Emotion Recognition (ER) and Speaker Identiﬁcation (SI) in static and dynamic modes of speech signal representation. We have used a multi-corporal, multi-language approach in the study. 3 databases for the problem of SI and 4 databases for the ER task of 3 diﬀerent languages (German, English and Japanese) have been used in our study to evaluate the models. More than 45 machine learning algorithms were applied to these tasks in both modes and the results alongside discussion are presented here.
УДК	519.87

Maxim, S. SPEECH-BASED EMOTION RECOGNITION AND SPEAKER IDENTIﬁCATION: STATIC VS. DYNAMIC MODE OF SPEECH REPRESENTATION / S. Maxim, Wolfgang Minker, EugeneS. Semenkin // Журнал Сибирского федерального университета. Математика и физика. Journal of Siberian Federal University, Mathematics & Physics .— 2016 .— №4 .— С. 118-123 .— URL: https://rucont.ru/efd/576462 (дата обращения: 21.10.2025)

Вы уже смотрели

Вестник Волгоградской академии МВД Росси...

Ночь. Сочинение С. Темного...

Право и жизнь №1 2002 440,00 руб

Protecting the health of indigenous peoples of the Arctic: the experience of the regions of the Russian Arctic // IOP Conf. Series: Earth and Environmental Science 263 (2019) 012067 IOP Publishing

Protecting the health of indigenous peop... 80,00 руб

Система защиты прав человека в Российской Федерации

Система защиты прав человека в Российско... 6000,00 руб

Красная звезда №51 2021 40,00 руб

Предпросмотр (выдержки из произведения)

Mathematics & Physics 2016, 9(4), 518-523 УДК 519.87 Speech-based Emotion Recognition and Speaker Identification: Static vs. Dynamic Mode of Speech Representation Maxim Sidorov∗ Wolfgang Minker† Institute of Communications Engineering, Ulm University, Albert-Einstein-Allee, 43, Ulm, 89081 Germany Eugene S. Semenkin‡ Informatics and Telecommunications Institute Reshetnev Siberian State Aerospace University Krasnoyarskiy Rabochiy, 31, Krasnoyarsk, 660037 Russia Received 28.12.2015, received in revised form 24.02.2016, accepted 15.09.2016 In this paper we present the performance of different machine learning algorithms for the problems of speech-based Emotion Recognition (ER) and Speaker Identification (SI) in static and dynamic modes of speech signal representation. <...> Keywords: emotion recognition from speech, speaker identification from speech, machine learning algorithms, speaker adaptive emotion recognition from speech. <...> DOI: 10.17516/1997-1397-2016-9-4-518-523 Introduction The main task of the speaker identification procedure is to determine who has produced a concrete utterance, but emotion recognition from speech can also reveal the emotional state of the person who has produced this utterance. <...> Among them the choice of the modelling algorithm and speech signal features could be mentioned. <...> Moreover, state-of-the-art approaches include different schemes of speech signal representation: a static mode results in a single feature vector for each utterance, whereas in a dynamic regime the corresponding feature ∗ maxim.sidorov@uni-ulm.de † wolfgang.minker@uni-ulm.de ‡eugenesemenkin@yandex.ru ⃝ Siberian Federal University. <...> All rights reserved c – 518 – Maxim Sidorov, Wolfgang Minker, Eugene Semenkin Speech-based Emotion Recognition and Speaker . set is calculated for each time window. <...> Each utterance has one of the following emotional labels: neutral, anger, fear, joy, sadness, boredom and disgust. <...> Each utterance has one of the following emotional labels: angry, slightly angry, very angry, neutral, friendly and non-speech — critical noisy recordings or just silence. <...> To produce the labels for the classification task we have used only a pleasantness (or evaluation) and arousal axis. <...> Each of 16 native speakers of American English reads about 50 sentences. – 519 – Maxim Sidorov, Wolfgang Minker, Eugene Semenkin Speech-based Emotion Recognition and Speaker . <...> The following speech signal features <...>

Облако ключевых слов *

* - вычисляется автоматически


	Для выхода нажмите Esc или