Výsledky vyhledávání - "Yamshchikov ON"

Report

CleanComedy: Creating Friendly Humor through Generative Techniques

Autor: Vikhorev, Dmitry, Galimzianova, Daria, Gorovaia, Svetlana, Zhemchuzhina, Elizaveta, Yamshchikov, Ivan P.

Humor generation is a challenging task in natural language processing due to limited resources and the quality of existing datasets. Available humor language resources often suffer from toxicity and duplication, limiting their effectiveness for train

Externí odkaz: http://arxiv.org/abs/2412.09203

Zobrazit plný text záznamu

Report

Toxicity of the Commons: Curating Open-Source Pre-Training Data

Autor: Arnett, Catherine, Jones, Eliot, Yamshchikov, Ivan P., Langlais, Pierre-Carl

Open-source large language models are becoming increasingly available and popular among researchers and practitioners. While significant progress has been made on open-weight models, open training data is a practice yet to be adopted by the leading o

Externí odkaz: http://arxiv.org/abs/2410.22587

Zobrazit plný text záznamu

Report

Sui Generis: Large Language Models for Authorship Attribution and Verification in Latin

Autor: Schmidt, Gleb, Gorovaia, Svetlana, Yamshchikov, Ivan P.

This paper evaluates the performance of Large Language Models (LLMs) in authorship attribution and authorship verification tasks for Latin texts of the Patristic Era. The study showcases that LLMs can be robust in zero-shot authorship verification ev

Externí odkaz: http://arxiv.org/abs/2410.09245

Zobrazit plný text záznamu

Report

Individuation in Neural Models with and without Visual Grounding

Autor: Tikhonov, Alexey, Bylinina, Lisa, Yamshchikov, Ivan P.

We show differences between a language-and-vision model CLIP and two text-only models - FastText and SBERT - when it comes to the encoding of individuation information. We study latent representations that CLIP provides for substrates, granular aggre

Externí odkaz: http://arxiv.org/abs/2409.18868

Zobrazit plný text záznamu

Report

BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training

Autor: Chizhov, Pavel, Arnett, Catherine, Korotkova, Elizaveta, Yamshchikov, Ivan P.

Language models can largely benefit from efficient tokenization. However, they still mostly utilize the classical BPE algorithm, a simple and reliable method. This has been shown to cause such issues as under-trained tokens and sub-optimal compressio

Externí odkaz: http://arxiv.org/abs/2409.04599

Zobrazit plný text záznamu

Report

Knowledge Graph Representation for Political Information Sources

Autor: Osmonova, Tinatin, Tikhonov, Alexey, Yamshchikov, Ivan P.

With the rise of computational social science, many scholars utilize data analysis and natural language processing tools to analyze social media, news articles, and other accessible data sources for examining political and social discourse. Particula

Externí odkaz: http://arxiv.org/abs/2404.03437

Zobrazit plný text záznamu

Report

Echo-chambers and Idea Labs: Communication Styles on Twitter

Autor: Sorokovikova, Aleksandra, Becker, Michael, Yamshchikov, Ivan P.

This paper investigates the communication styles and structures of Twitter (X) communities within the vaccination context. While mainstream research primarily focuses on the echo-chamber phenomenon, wherein certain ideas are reinforced and participan

Externí odkaz: http://arxiv.org/abs/2403.19423

Zobrazit plný text záznamu

Report

Vygotsky Distance: Measure for Benchmark Task Similarity

Autor: Surkov, Maxim K., Yamshchikov, Ivan P.

Evaluation plays a significant role in modern natural language processing. Most modern NLP benchmarks consist of arbitrary sets of tasks that neither guarantee any generalization potential for the model once applied outside the test set nor try to mi

Externí odkaz: http://arxiv.org/abs/2402.14890

Zobrazit plný text záznamu

Report

LLMs Simulate Big Five Personality Traits: Further Evidence

Autor: Sorokovikova, Aleksandra, Fedorova, Natalia, Rezagholi, Sharwin, Yamshchikov, Ivan P.

An empirical investigation into the simulation of the Big Five personality traits by large language models (LLMs), namely Llama2, GPT4, and Mixtral, is presented. We analyze the personality traits simulated by these models and their stability. This c

Externí odkaz: http://arxiv.org/abs/2402.01765

Zobrazit plný text záznamu

Report

Neural Machine Translation for Malayalam Paraphrase Generation

Autor: Varghese, Christeena, Koshelev, Sergey, Yamshchikov, Ivan P.

This study explores four methods of generating paraphrases in Malayalam, utilizing resources available for English paraphrasing and pre-trained Neural Machine Translation (NMT) models. We evaluate the resulting paraphrases using both automated metric

Externí odkaz: http://arxiv.org/abs/2401.17827

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání