Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Loweimi, Erfan"'
In this paper, we analyse the error patterns of the raw waveform acoustic models in TIMIT's phone recognition task. Our analysis goes beyond the conventional phone error rate (PER) metric. We categorise the phones into three groups: {affricate, dipht
Externí odkaz:
http://arxiv.org/abs/2406.00898
Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and
Externí odkaz:
http://arxiv.org/abs/2309.07606
Sound event detection (SED), as a core module of acoustic environmental analysis, suffers from the problem of data deficiency. The integration of semi-supervised learning (SSL) largely mitigates such problem while bringing no extra annotation budget.
Externí odkaz:
http://arxiv.org/abs/2110.11144
Although the lower layers of a deep neural network learn features which are transferable across datasets, these layers are not transferable within the same dataset. That is, in general, freezing the trained feature extractor (the lower layers) and re
Externí odkaz:
http://arxiv.org/abs/2102.04697
Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results. However, we note the range of the learned context increases
Externí odkaz:
http://arxiv.org/abs/2011.04906
Recently, Transformer based models have shown competitive automatic speech recognition (ASR) performance. One key factor in the success of these models is the multi-head attention mechanism. However, for trained models, we have previously observed th
Externí odkaz:
http://arxiv.org/abs/2011.04004
Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition. The key factor for the outstanding performance of self-attention models is their ability to captur
Externí odkaz:
http://arxiv.org/abs/2005.13895
Autor:
Loweimi, Erfan
The Fourier analysis plays a key role in speech signal processing. As a complex quantity, it can be expressed in the polar form using the magnitude and phase spectra. The magnitude spectrum is widely used in almost every corner of speech processing.
Externí odkaz:
https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.736567
Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed
Externí odkaz:
http://arxiv.org/abs/1909.13759
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.