Speed-Aware Audio-Driven Speech Animation using Adaptive Windows | 정선진 교수 연구실 | 성신여자대학교 컴퓨터공학과

정선진 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

논문

구성원

article|

hybrid

·인용수 4

·2024

Speed-Aware Audio-Driven Speech Animation using Adaptive Windows

Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonseo Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh

IF 9.5ACM Transactions on Graphics

초록

We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.

키워드

Computer scienceAnimationComputer graphics (images)Speech recognition

타입

article

IF / 인용수

9.5 / 4

원문

https://doi.org/10.1145/3691341

게재 연도

2024

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)