Zobrazeno 1 - 10
of 46
pro vyhledávání: '"de Lhoneux, Miryam"'
Autor:
Tatariya, Kushal, Kulmizev, Artur, Poelman, Wessel, Ploeger, Esther, Bollmann, Marcel, Bjerva, Johannes, Luo, Jiaming, Lent, Heather, de Lhoneux, Miryam
Wikipedia's perceived high quality and broad language coverage have established it as a fundamental resource in multilingual NLP. In the context of low-resource languages, however, these quality assumptions are increasingly being scrutinised. This pa
Externí odkaz:
http://arxiv.org/abs/2411.05527
Pixel-based language models have emerged as a compelling alternative to subword-based language modelling, particularly because they can represent virtually any script. PIXEL, a canonical example of such a model, is a vision transformer that has been
Externí odkaz:
http://arxiv.org/abs/2410.12011
POS tagging plays a fundamental role in numerous applications. While POS taggers are highly accurate in well-resourced settings, they lag behind in cases of limited or missing training data. This paper focuses on POS tagging for languages with limite
Externí odkaz:
http://arxiv.org/abs/2410.10576
Autor:
Remy, François, Delobelle, Pieter, Avetisyan, Hayastan, Khabibullina, Alfiya, de Lhoneux, Miryam, Demeester, Thomas
The development of monolingual language models for low and mid-resource languages continues to be hindered by the difficulty in sourcing high-quality training data. In this study, we present a novel cross-lingual vocabulary transfer strategy, trans-t
Externí odkaz:
http://arxiv.org/abs/2408.04303
Autor:
Ploeger, Esther, Poelman, Wessel, Høeg-Petersen, Andreas Holck, Schlichtkrull, Anders, de Lhoneux, Miryam, Bjerva, Johannes
Beyond individual languages, multilingual natural language processing (NLP) research increasingly aims to develop models that perform well across languages generally. However, evaluating these systems on all the world's languages is practically infea
Externí odkaz:
http://arxiv.org/abs/2407.05022
The NLP research community has devoted increased attention to languages beyond English, resulting in considerable improvements for multilingual NLP. However, these improvements only apply to a small subset of the world's languages. Aiming to extend t
Externí odkaz:
http://arxiv.org/abs/2402.04222
Emotion classification is a challenging task in NLP due to the inherent idiosyncratic and subjective nature of linguistic expression, especially with code-mixed data. Pre-trained language models (PLMs) have achieved high performance for many tasks an
Externí odkaz:
http://arxiv.org/abs/2402.03137
Autor:
Lent, Heather, Tatariya, Kushal, Dabre, Raj, Chen, Yiyi, Fekete, Marcell, Ploeger, Esther, Zhou, Li, Armstrong, Ruth-Ann, Eijansantos, Abee, Malau, Catriona, Heje, Hans Erik, Lavrinovics, Ernests, Kanojia, Diptesh, Belony, Paul, Bollmann, Marcel, Grobol, Loïc, de Lhoneux, Miryam, Hershcovich, Daniel, DeGraff, Michel, Søgaard, Anders, Bjerva, Johannes
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research.While the genealogical ties between Creoles and a number of highly-resourced languages imply a significant potential for transfer l
Externí odkaz:
http://arxiv.org/abs/2310.19567
Van Miltenburg et al. (2021) suggest NLP research should adopt preregistration to prevent fishing expeditions and to promote publication of negative results. At face value, this is a very reasonable suggestion, seemingly solving many methodological p
Externí odkaz:
http://arxiv.org/abs/2302.10086
Autor:
Rust, Phillip, Lotz, Jonas F., Bugliarello, Emanuele, Salesky, Elizabeth, de Lhoneux, Miryam, Elliott, Desmond
Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when we attempt to scale the number of supported languages. Tackling this bottleneck results in a trade-off between what can be represented in the embeddin
Externí odkaz:
http://arxiv.org/abs/2207.06991