Výsledky vyhledávání - "Kutuzov, Andrey"

Report

AXOLOTL'24 Shared Task on Multilingual Explainable Semantic Change Modeling

Autor: Fedorova, Mariia, Mickus, Timothee, Partanen, Niko, Siewert, Janine, Spaziani, Elena, Kutuzov, Andrey

This paper describes the organization and findings of AXOLOTL'24, the first multilingual explainable semantic change modeling shared task. We present new sense-annotated diachronic semantic change datasets for Finnish and Russian which were employed

Externí odkaz: http://arxiv.org/abs/2407.04079

Zobrazit plný text záznamu

Report

Definition generation for lexical semantic change detection

Autor: Fedorova, Mariia, Kutuzov, Andrey, Scherrer, Yves

We use contextualized word definitions generated by large language models as semantic representations in the task of diachronic lexical semantic change detection (LSCD). In short, generated definitions are used as `senses', and the change score of a

Externí odkaz: http://arxiv.org/abs/2406.14167

Zobrazit plný text záznamu

Report

Enriching Word Usage Graphs with Cluster Definitions

Autor: Fedorova, Mariia, Kutuzov, Andrey, Arefyev, Nikolay, Schlechtweg, Dominik

We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The con

Externí odkaz: http://arxiv.org/abs/2403.18024

Zobrazit plný text záznamu

Report

A New Massive Multilingual Dataset for High-Performance Language Technologies

Autor: de Gibert, Ona, Nail, Graeme, Arefyev, Nikolay, Bañón, Marta, van der Linde, Jelmer, Ji, Shaoxiong, Zaragoza-Bernabeu, Jaume, Aulamo, Mikko, Ramírez-Sánchez, Gema, Kutuzov, Andrey, Pyysalo, Sampo, Oepen, Stephan, Tiedemann, Jörg

We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive

Externí odkaz: http://arxiv.org/abs/2403.14009

Zobrazit plný text záznamu

Report

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

Autor: Chen, Pinzhen, Ji, Shaoxiong, Bogoychev, Nikolay, Kutuzov, Andrey, Haddow, Barry, Heafield, Kenneth

Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically analyze cost-e

Externí odkaz: http://arxiv.org/abs/2309.08958

Zobrazit plný text záznamu

Report

Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

Autor: Giulianelli, Mario, Luden, Iris, Fernandez, Raquel, Kutuzov, Andrey

We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage

Externí odkaz: http://arxiv.org/abs/2305.11993

Zobrazit plný text záznamu

Report

NorBench -- A Benchmark for Norwegian Language Models

Autor: Samuel, David, Kutuzov, Andrey, Touileb, Samia, Velldal, Erik, Øvrelid, Lilja, Rønningstad, Egil, Sigdel, Elina, Palatkina, Anna

We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-d

Externí odkaz: http://arxiv.org/abs/2305.03880

Zobrazit plný text záznamu

Report

Trained on 100 million words and still in shape: BERT meets British National Corpus

Autor: Samuel, David, Kutuzov, Andrey, Øvrelid, Lilja, Velldal, Erik

While modern masked language models (LMs) are trained on ever larger corpora, we here explore the effects of down-scaling training to a modestly-sized but representative, well-balanced, and publicly available English text source -- the British Nation

Externí odkaz: http://arxiv.org/abs/2303.09859

Zobrazit plný text záznamu

Report

RuDSI: graph-based word sense induction dataset for Russian

Autor: Aksenova, Anna, Gavrishina, Ekaterina, Rykov, Elisey, Kutuzov, Andrey

We present RuDSI, a new benchmark for word sense induction (WSI) in Russian. The dataset was created using manual annotation and semi-automatic clustering of Word Usage Graphs (WUGs). Unlike prior WSI datasets for Russian, RuDSI is completely data-dr

Externí odkaz: http://arxiv.org/abs/2209.13750

Zobrazit plný text záznamu

Report

Contextualized language models for semantic change detection: lessons learned

Autor: Kutuzov, Andrey, Velldal, Erik, Øvrelid, Lilja

Publikováno v: Northern European Journal of Language Technology (NEJLT). ISSN 2000-1533. 8(1)

We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized appr

Externí odkaz: http://arxiv.org/abs/2209.00154

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání