Speech-to-Gesture Generation
Autor: | Hiroshi Sakuta, Dai Hasegawa, Kenta Takeuchi, Naoshi Kaneko, Kazuhiko Sumi, Shinichi Shirakawa |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer science
business.industry Deep learning Speech recognition 05 social sciences Contrast (statistics) 020207 software engineering 02 engineering and technology Time sequence Motion (physics) Embodied cognition Gesture recognition 0202 electrical engineering electronic engineering information engineering 0501 psychology and cognitive sciences Computer vision Artificial intelligence business 050107 human factors Gesture |
Zdroj: | HAI |
Popis: | In this research, we take a first step in generating motion data for gestures directly from speech features. Such a method can make creating gesture animations for Embodied Conversational Agents much easier. We implemented a model using Bi-Directional LSTM taking phonemic features from speech audio data as input to output time sequence data of rotations of bone joints. We assessed the validity of the predicted gesture motion data by evaluating the final loss value of the network, and evaluating the impressions of the predicted gesture by comparing it with the actual motion data that accompanied the audio data used for input and motion data that accompanied a different audio data. The results showed that the accuracy of the prediction for the LSTM model was better than a simple RNN model. In contrast, the impressions evaluation of the predicted gesture was rated lower than the original and mismatched gestures, although individually some predicted gestures were rated the same degree as the mismatched gestures. |
Databáze: | OpenAIRE |
Externí odkaz: |