Data-driven Quality of Experience for Digital Audio Archives

Autor: Ragano, Alessandro
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Popis: The digitization of sound archives began to safeguard records that naturally deteriorate due to the irreversible chemical processes of the sound carriers. The digitization process has improved the usability and accessibility of audio archives and provided the possibility of using digital restoration. Assessing the quality of digitization, restoration, and audio archive consumption is essential for evaluating sound archive practices. The state-of-the-art in digitization, restoration, and consumption of audio archives has neglected quality assessment approaches that are automatic and take into account the user's perspective. This thesis aims to understand and define the quality of experience (QoE) of sound archives and proposes data-driven objective metrics that can predict the QoE of music audio archives in the absence of human listeners. The author proposes a paradigm shift to deal with the problem of quality assessment in sound archives by focusing on quality metrics for musical signals based on deep learning which are developed and evaluated using annotations obtained with listening tests. The adaptation of the QoE framework for audio archive evaluation is proposed to consider the user's perspective and define QoE in sound archives. The author, in a case study of audio archive consumption, proposes a curated and annotated dataset of real-world music recordings from vinyl collections and three objective quality metrics. The thesis shows that annotating a dataset with real-world music recordings requires a different approach to prepare the stimuli and proposes a technique based on stratified random sampling from clusters. The three proposed quality metrics are based on learning feature representations with three different tasks: degradation classification, deep convolutional embedded clustering (DCEC), and self-supervised learning (SSL). The first two tasks are proposed using an architecture based on framewise convolutional neural networks, while the SSL task is based on pre-training and fine-tuning wav2vec 2.0 on musical signals. This thesis demonstrates that degradation classification, DCEC, and wav2vec 2.0 learn useful musical representations for predicting the quality of vinyl collections. More specifically, the proposed metrics overcome two baselines when fine-tuning small annotated sets. The author also proposes a new correlation-based feature representation for classifying audio carriers, which overcomes the raw feature representations in terms of speed and feature dimensionality. Classifying audio carriers can be used as a pre-step of the quality metrics mentioned above when predicting the quality of multiple collections. The significance of the proposed work is that audio archive metadata can be enriched by providing quality labels using the proposed metrics. Overall, the thesis encourages scholars and stakeholders to a paradigm shift when evaluating the quality of sound archives i.e. moving from a manual system-centric approach to a more automatic user-centric approach.
Databáze: OpenAIRE