Zobrazeno 1 - 10
of 17
pro vyhledávání: '"Kuzman, Taja"'
The world of language models is going through turbulent times, better and ever larger models are coming out at an unprecedented speed. However, we argue that, especially for the scientific community, encoder models of up to 1 billion parameters are s
Externí odkaz:
http://arxiv.org/abs/2404.05428
Autor:
Ljubešić, Nikola, Kuzman, Taja
This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official languages in the South Slavic language space. The coll
Externí odkaz:
http://arxiv.org/abs/2403.12721
Autor:
van Noord, Rik, Kuzman, Taja, Rupnik, Peter, Ljubešić, Nikola, Esplà-Gomis, Miquel, Ramírez-Sánchez, Gema, Toral, Antonio
Large, curated, web-crawled corpora play a vital role in training language models (LMs). They form the lion's share of the training data in virtually all recent LMs, such as the well-known GPT, LLaMA and XLM-RoBERTa models. However, despite this impo
Externí odkaz:
http://arxiv.org/abs/2403.08693
ChatGPT has shown strong capabilities in natural language generation tasks, which naturally leads researchers to explore where its abilities end. In this paper, we examine whether ChatGPT can be used for zero-shot text classification, more specifical
Externí odkaz:
http://arxiv.org/abs/2303.03953
This paper presents a new training dataset for automatic genre identification GINCO, which is based on 1,125 crawled Slovenian web documents that consist of 650 thousand words. Each document was manually annotated for genre with a new annotation sche
Externí odkaz:
http://arxiv.org/abs/2201.03857
Publikováno v:
Prispevki za novejšo zgodovino (before 1960: Prispevki za zgodovino delavskega gibanja) / Contributions to Contemporary History (before 1986: Contributions to the History of the Workers' Movement ). 59(1):99-119
Externí odkaz:
https://www.ceeol.com/search/article-detail?id=886559
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Bañón, Marta, Chichirău, Mălina, Esplà-Gomis, Miquel, Forcada, Mikel L., Galiano Jiménez, Aarón, Kuzman, Taja, Ljubešić, Nikola, van Noord, Rik, Pla Sempere, Leopoldo, Ramírez Sánchez, Gema, Rupnik, Peter, Suchomel, Vít, Toral, Antonio, Zaragoza Bernabeu, Jaume
We present the most relevant results of the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages in its second year. Parallel and monolingual corpora have been produced for eleven low-r
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=od_______935::b625840be362dbc8253f65a0008fd1c6
https://hdl.handle.net/10045/135117
https://hdl.handle.net/10045/135117
Publikováno v:
Machine Learning & Knowledge Extraction; Sep2023, Vol. 5 Issue 3, p1149-1175, 27p
Autor:
Bañón, Marta, Esplà-Gomis, Miquel, Forcada, Mikel L., García-Romero, Cristian, Kuzman, Taja, Ljubešić, Nikola, van Noord, Rik, Sempere, Leopoldo Pla, Ramírez-Sánchez, Gema, Rupnik, Peter, Suchomel, Vít, Toral, Antonio, van der Werff, Tobias, Zaragoza, Jaume, Macken, Lieve, Rufener, Andrew, Van den Bogaert, Joachim, Daems, Joke, Tezcan, Arda, Vanroy, Bram, Fonteyne, Margot, Barrault, Loic, Costa-Jussa, Marta R., Kemp, Ellie, Pilos, Spyridon, Declercq, Christophe, Koponen, Maarit, Scarton, Carolina, Moniz, Helena
Publikováno v:
EAMT 2022-Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 303-304
STARTPAGE=303;ENDPAGE=304;TITLE=EAMT 2022-Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
STARTPAGE=303;ENDPAGE=304;TITLE=EAMT 2022-Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
We introduce the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-re
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=narcis______::5a0a68e6c4f890c520eb13d3be112a97
https://research.rug.nl/en/publications/685514a8-947e-44f9-83cf-90356c5f1684
https://research.rug.nl/en/publications/685514a8-947e-44f9-83cf-90356c5f1684