Zobrazeno 1 - 10
of 101
pro vyhledávání: '"VAN NOORD, GERTJAN"'
Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks. However, there has been little research on their effectiveness for neur
Externí odkaz:
http://arxiv.org/abs/2302.14220
Subword-level models have been the dominant paradigm in NLP. However, character-level models have the benefit of seeing each character individually, providing the model with more detailed information that ultimately could lead to better models. Recen
Externí odkaz:
http://arxiv.org/abs/2212.01304
Character-based representations have important advantages over subword-based ones for morphologically rich languages. They come with increased robustness to noisy input and do not need a separate tokenization step. However, they also have a crucial d
Externí odkaz:
http://arxiv.org/abs/2205.14086
Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogen
Externí odkaz:
http://arxiv.org/abs/2205.12148
This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available. We find that, in very low resource scenarios, statistical n-gram language models outperform state-of-the-art neural models. Our
Externí odkaz:
http://arxiv.org/abs/2205.04810
This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German--Lower Sorbian (DE--DSB): a high-resource language to a low-resource one. Our system uses a
Externí odkaz:
http://arxiv.org/abs/2109.12012
Recent advances in multilingual dependency parsing have brought the idea of a truly universal parser closer to reality. However, cross-language interference and restrained model capacity remain major obstacles. To address this, we propose a novel mul
Externí odkaz:
http://arxiv.org/abs/2004.14327
Autor:
de Vries, Wietse, van Cranenburgh, Andreas, Bisazza, Arianna, Caselli, Tommaso, van Noord, Gertjan, Nissim, Malvina
The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT
Externí odkaz:
http://arxiv.org/abs/1912.09582
Autor:
van der Goot, Rob, van Noord, Gertjan
We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a non- canonical domain to a more canonical domain, in our case:
Externí odkaz:
http://arxiv.org/abs/1710.03476
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.