Analysis of Many-to-One Cross-Media Correlations among Language Lectures toward Its Effective Multimedia Presentation for Learning

Autor: Sheng-Wei Li, 李盛偉
Rok vydání: 2005
Druh dokumentu: 學位論文 ; thesis
Popis: 93
This thesis investigates the correlations between multimedia (particularly the speech and text) involved in language lectures in order to design an effective presentation mechanism for Web-based learning. The implicit temporal correlation between speech and text is the most important because it helps to negotiate supplementary lecture navigations like tele-pointer movement, lips-sync movement, and content scrolling. We propose a speech–text alignment framework, an iterative algorithm based on local alignment, to probe the many-to-one relations, not the one-to-one only. We show that our solution will be more practical for general language lectures, and that the algorithm’s time complexity conforms to the best possible cost, O(nm), without introducing additional computation. Besides, a simple heuristic detecting the speech structures based on the probed temporal correlations is also proposed in this thesis. The cross-media correlations are classified into implicit relation (retrieved by computing) and explicit relation (recorded during preprocessing stage). We show the feasibility to create the vivid presentation by exploiting implicit relation and artificially simulating some explicit media. We describe the spatial-to-temporal problems that any system may encounter when dealing with the many-to-one correlations. To facilitate the navigation of the integrated multimedia documents, we develop several visualization techniques for describing media correlations, including guidelines for speech-text correlations, visible-automatic scrolling, and level of details of timeline etc., to provide intuitive and easy-to-use random access mechanisms. We evaluate the performance of the analysis method and human perceptions of the synchronized presentation. The overall performance of the analysis method is that about 99.5% words analyzed are of temporal error within 0.5sec. And the subjective evaluation result shows that the synchronized presentation is highly acceptable for human beings.
Databáze: Networked Digital Library of Theses & Dissertations