Emotion-Aware Speaker Identification With Transfer Learning | 신현정 교수 연구실 | 한국과학기술원 기계공학과

신현정 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

gold

·인용수 9

·2023

Emotion-Aware Speaker Identification With Transfer Learning

Kyoung Ju Noh, Hyuntae Jeong

IF 3.6IEEE Access

초록

Speech is a natural communication method used by humans. Speaker identification (SI) technology based on human speech has been used as an entry point for many human-computer-interaction applications. The performance of SI models can degrade when dealing with expressive speech uttered in emotional situations because emotion databases do not have sufficient data on expressive speech to train SI models for various emotional states. Generally, SI models are trained using relatively more samples of “neutral” speech than samples of other emotion classes. In this study, we propose an emotion-aware SI (em-SI) method that uses an emotion-embedding vector generated from a pre-trained speech emotion recognition (SER) model along with the acoustic features of speech data. We assess the performance of this method using individual English and Korean corpora and confirm that the proposed method provides an improved performance on multilingual corpora. The evaluation results show that the SI accuracy of em-SI on the Korean Emotion Multimodal Database (KEMDy19) improved by 3.2%, and the average speaker verification (SV) performance in terms of the equal error rate (EER) was improved by 1.3% compared to that of the baseline SI model. The visualization of the embedding vector of em-SI shows that em-SI maps speech data to an embedding space where both SI and emotional information are simultaneously represented. Through the experiments conducted in this study, we confirmed that the em-SI model, which learns by integrating emotion and speaker embedding information, improved the performance of SI for expressive speech.

키워드

Computer scienceSpeech recognitionEmbeddingWord error rateVisualizationArtificial intelligenceNatural language processing

타입

article

IF / 인용수

3.6 / 9

원문

https://doi.org/10.1109/access.2023.3297715

게재 연도

2023