An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation

Autor: Nick Campbell, Fasih Haider, Carl Vogel, Maria Koutsombogera, Owen Conlan, Saturnino Luz
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Frontiers in Computer Science, Vol 2 (2020)
Haider, F, Koutsombogera, M, Conlan, O, Vogel, C, Campbell, N & Luz, S 2020, ' An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation ', Frontiers in Computer Science, vol. 2, pp. 1 . https://doi.org/10.3389/fcomp.2020.00001
Frontiers in Computer Science
ISSN: 2624-9898
DOI: 10.3389/fcomp.2020.00001
Popis: Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioural cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping, and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience , and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.
Databáze: OpenAIRE