기본 정보
연구 분야
프로젝트
발행물
구성원
article|
gold
·인용수 9
·2023
Emotion-Aware Speaker Identification With Transfer Learning
Kyoung Ju Noh, Hyuntae Jeong
IF 3.6IEEE Access
초록

Speech is a natural communication method used by humans. Speaker identification (SI) technology based on human speech has been used as an entry point for many human-computer-interaction applications. The performance of SI models can degrade when dealing with expressive speech uttered in emotional situations because emotion databases do not have sufficient data on expressive speech to train SI models for various emotional states. Generally, SI models are trained using relatively more samples of “neutral” speech than samples of other emotion classes. In this study, we propose an emotion-aware SI (em-SI) method that uses an emotion-embedding vector generated from a pre-trained speech emotion recognition (SER) model along with the acoustic features of speech data. We assess the performance of this method using individual English and Korean corpora and confirm that the proposed method provides an improved performance on multilingual corpora. The evaluation results show that the SI accuracy of em-SI on the Korean Emotion Multimodal Database (KEMDy19) improved by 3.2%, and the average speaker verification (SV) performance in terms of the equal error rate (EER) was improved by 1.3% compared to that of the baseline SI model. The visualization of the embedding vector of em-SI shows that em-SI maps speech data to an embedding space where both SI and emotional information are simultaneously represented. Through the experiments conducted in this study, we confirmed that the em-SI model, which learns by integrating emotion and speaker embedding information, improved the performance of SI for expressive speech.

키워드
Computer scienceSpeech recognitionEmbeddingWord error rateVisualizationArtificial intelligenceNatural language processing
타입
article
IF / 인용수
3.6 / 9
게재 연도
2023