Zobrazeno 1 - 10
of 75
pro vyhledávání: '"Kutuzov, Andrey"'
Autor:
Fedorova, Mariia, Mickus, Timothee, Partanen, Niko, Siewert, Janine, Spaziani, Elena, Kutuzov, Andrey
This paper describes the organization and findings of AXOLOTL'24, the first multilingual explainable semantic change modeling shared task. We present new sense-annotated diachronic semantic change datasets for Finnish and Russian which were employed
Externí odkaz:
http://arxiv.org/abs/2407.04079
We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD). In short, generated definitions are used as `senses', and the change score of a
Externí odkaz:
http://arxiv.org/abs/2406.14167
We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The con
Externí odkaz:
http://arxiv.org/abs/2403.18024
Autor:
de Gibert, Ona, Nail, Graeme, Arefyev, Nikolay, Bañón, Marta, van der Linde, Jelmer, Ji, Shaoxiong, Zaragoza-Bernabeu, Jaume, Aulamo, Mikko, Ramírez-Sánchez, Gema, Kutuzov, Andrey, Pyysalo, Sampo, Oepen, Stephan, Tiedemann, Jörg
We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive
Externí odkaz:
http://arxiv.org/abs/2403.14009
Autor:
Chen, Pinzhen, Ji, Shaoxiong, Bogoychev, Nikolay, Kutuzov, Andrey, Haddow, Barry, Heafield, Kenneth
Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically analyze cost-e
Externí odkaz:
http://arxiv.org/abs/2309.08958
We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage
Externí odkaz:
http://arxiv.org/abs/2305.11993
Autor:
Samuel, David, Kutuzov, Andrey, Touileb, Samia, Velldal, Erik, Øvrelid, Lilja, Rønningstad, Egil, Sigdel, Elina, Palatkina, Anna
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-d
Externí odkaz:
http://arxiv.org/abs/2305.03880
While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British Nation
Externí odkaz:
http://arxiv.org/abs/2303.09859
We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). Unlike prior WSI datasets for Russian, RuDSI is completely data-dr
Externí odkaz:
http://arxiv.org/abs/2209.13750
Publikováno v:
Northern European Journal of Language Technology (NEJLT). ISSN 2000-1533. 8(1)
We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized appr
Externí odkaz:
http://arxiv.org/abs/2209.00154