Sound to expression: Using emotional sound to guide facial expression editing

Autor: Wenjin Liu, Shudong Zhang, Lijuan Zhou, Ning Luo, Qian Chen
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Journal of King Saud University: Computer and Information Sciences, Vol 36, Iss 3, Pp 101998- (2024)
Druh dokumentu: article
ISSN: 1319-1578
DOI: 10.1016/j.jksuci.2024.101998
Popis: Recently, image generation technology has demonstrated surprising effects. However, precisely recognizing the emotion in sound to accurately express it on the face of a designated person is a huge challenge. To address this challenge, a new framework, Sound to Expression (S2E), which can use the emotion in sound to guide facial expression image generation, is proposed. A speech dataset for emotion recognition is constructed. S2E can edit facial expressions with different emotions in sounds for different people. S2E consists of Continuous Wavelet Transform (CWT), YOLOv3, ChatGPT-3, and facial expression diffusion editing model (FEDEM). CWT is utilized to extract emotional features from different sounds. YOLOv3 is employed to identify the emotion categories. The emotion category and a specific person's name are input into ChatGPT-3 to randomly generate a description of the person and emotion. The description is input into FEDEM to generate a facial expression image. To generate more accurate images and address emotional semantic deviation, a new facial detail emotional preservation loss is proposed. The experimental results show that S2E can accurately recognize the emotion in the voice and use this emotion to guide the editing of the facial expression for the specified person to generate more accurate images.
Databáze: Directory of Open Access Journals