Zobrazeno 1 - 10
of 18 248
pro vyhledávání: '"OOV"'
Autor:
Batsuren, Khuyagbaatar, Vylomova, Ekaterina, Dankers, Verna, Delgerbaatar, Tsetsuukhei, Uzan, Omri, Pinter, Yuval, Bella, Gábor
The popular subword tokenizers of current language models, such as Byte-Pair Encoding (BPE), are known not to respect morpheme boundaries, which affects the downstream performance of the models. While many improved tokenization algorithms have been p
Externí odkaz:
http://arxiv.org/abs/2404.13292
Autor:
Pinas, Mervin1 (AUTHOR) m.pinas@dji.minjus.nl, Schaap, Dorian2 (AUTHOR) d.schaap@jur.ru.nl, van Stokkom, Bas3 (AUTHOR) bas.vanstokkom@ru.nl
Publikováno v:
Tijdschrift voor Veiligheid. 2023, Vol. 22 Issue 1, p3-22. 20p.
Autor:
Grover, Ankit
In the following report we propose pipelines for Goodness of Pronunciation (GoP) computation solving OOV problem at testing time using Vocab/Lexicon expansion techniques. The pipeline uses different components of ASR system to quantify accent and aut
Externí odkaz:
http://arxiv.org/abs/2209.03787
Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poo
Externí odkaz:
http://arxiv.org/abs/2209.00859
Autor:
Sumit Singh, Uma Shanker Tiwary
Publikováno v:
IEEE Access, Vol 12, Pp 22707-22717 (2024)
Named entities are random, like emerging entities and complex entities. Most of the large language model’s tokenizers have fixed vocab; hence, they tokenize out-of-vocab (OOV) words into multiple sub-words during tokenization. During fine-tuning fo
Externí odkaz:
https://doaj.org/article/6efdec15324f4c7ba423951d94cfa011
A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the Com
Externí odkaz:
http://arxiv.org/abs/2107.08091
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Patel, Raj, Domeniconi, Carlotta
Semantic representations of words have been successfully extracted from unlabeled corpuses using neural network models like word2vec. These representations are generally high quality and are computationally inexpensive to train, making them popular.
Externí odkaz:
http://arxiv.org/abs/1910.10491
Acoustic-to-word (A2W) end-to-end automatic speech recognition (ASR) systems have attracted attention because of an extremely simplified architecture and fast decoding. To alleviate data sparseness issues due to infrequent words, the combination with
Externí odkaz:
http://arxiv.org/abs/1909.09993
Publikováno v:
In Neurocomputing 20 July 2021 445:267-275