An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation

Autor:	Nick Campbell, Fasih Haider, Carl Vogel, Maria Koutsombogera, Owen Conlan, Saturnino Luz
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Computer science media_common.quotation_subject Feature extraction 02 engineering and technology social signal processing Pronunciation External Data Representation lcsh:QA75.5-76.95 Presentation Human–computer interaction 0202 electrical engineering electronic engineering information engineering media_common General Environmental Science Event (computing) 4. Education feature extraction Representation (systemics) General Engineering multimodal learning analytics 020207 software engineering Body language Public speaking machine learning multimedia signal processing General Earth and Planetary Sciences 020201 artificial intelligence & image processing video analysis and summarization lcsh:Electronic computers. Computer science
Zdroj:	Frontiers in Computer Science, Vol 2 (2020) Haider, F, Koutsombogera, M, Conlan, O, Vogel, C, Campbell, N & Luz, S 2020, ' An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation ', Frontiers in Computer Science, vol. 2, pp. 1 . https://doi.org/10.3389/fcomp.2020.00001 Frontiers in Computer Science
ISSN:	2624-9898
DOI:	10.3389/fcomp.2020.00001
Popis:	Public speaking is an important skill, the acquisition of which requires dedicated and time consuming training. In recent years, researchers have started to investigate automatic methods to support public speaking skills training. These methods include assessment of the trainee's oral presentation delivery skills which may be accomplished through automatic understanding and processing of social and behavioural cues displayed by the presenter. In this study, we propose an automatic scoring system for presentation delivery skills using a novel active data representation method to automatically rate segments of a full video presentation. While most approaches have employed a two step strategy consisting of detecting multiple events followed by classification, which involve the annotation of data for building the different event detectors and generating a data representation based on their output for classification, our method does not require event detectors. The proposed data representation is generated unsupervised using low-level audiovisual descriptors and self-organizing mapping, and used for video classification. This representation is also used to analyse video segments within a full video presentation in terms of several characteristics of the presenter's performance. The audio representation provides the best prediction results for self-confidence and enthusiasm, posture and body language, structure and connection of ideas, and overall presentation delivery. The video data representation provides the best results for presentation of relevant information with good pronunciation, usage of language according to audience , and maintenance of adequate voice volume for the audience. The fusion of audio and video data provides the best results for eye contact. Applications of the method to provision of feedback to teachers and trainees are discussed.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::741bdc60f954c5a2979373f833ec343d Zobrazit plný text záznamu