Speech-to-Gesture Generation

Autor:	Hiroshi Sakuta, Dai Hasegawa, Kenta Takeuchi, Naoshi Kaneko, Kazuhiko Sumi, Shinichi Shirakawa
Rok vydání:	2017
Předmět:	Computer science business.industry Deep learning Speech recognition 05 social sciences Contrast (statistics) 020207 software engineering 02 engineering and technology Time sequence Motion (physics) Embodied cognition Gesture recognition 0202 electrical engineering electronic engineering information engineering 0501 psychology and cognitive sciences Computer vision Artificial intelligence business 050107 human factors Gesture
Zdroj:	HAI
Popis:	In this research, we take a first step in generating motion data for gestures directly from speech features. Such a method can make creating gesture animations for Embodied Conversational Agents much easier. We implemented a model using Bi-Directional LSTM taking phonemic features from speech audio data as input to output time sequence data of rotations of bone joints. We assessed the validity of the predicted gesture motion data by evaluating the final loss value of the network, and evaluating the impressions of the predicted gesture by comparing it with the actual motion data that accompanied the audio data used for input and motion data that accompanied a different audio data. The results showed that the accuracy of the prediction for the LSTM model was better than a simple RNN model. In contrast, the impressions evaluation of the predicted gesture was rated lower than the original and mismatched gestures, although individually some predicted gestures were rated the same degree as the mismatched gestures.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9558641d3aaa07421714cca5840fe978 https://doi.org/10.1145/3125739.3132594 Zobrazit plný text záznamu