Zobrazeno 1 - 10
of 281
pro vyhledávání: '"WINTNER, SHULY"'
Autor:
Goldin, Gili, Wintner, Shuly
We present Knesset-DictaBERT, a large Hebrew language model fine-tuned on the Knesset Corpus, which comprises Israeli parliamentary proceedings. The model is based on the DictaBERT architecture and demonstrates significant improvements in understandi
Externí odkaz:
http://arxiv.org/abs/2407.20581
We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences
Externí odkaz:
http://arxiv.org/abs/2405.18115
Why do bilingual speakers code-switch (mix their two languages)? Among the several theories that attempt to explain this natural and ubiquitous phenomenon, the Triggering Hypothesis relates code-switching to the presence of lexical triggers, specific
Externí odkaz:
http://arxiv.org/abs/2308.15209
Natural language processing (NLP) models trained on people-generated data can be unreliable because, without any constraints, they can learn from spurious correlations that are not relevant to the task. We hypothesize that enriching models with speak
Externí odkaz:
http://arxiv.org/abs/2203.08979
State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different fr
Externí odkaz:
http://arxiv.org/abs/2106.06797
Despite impressive performance on many text classification tasks, deep neural networks tend to learn frequent superficial patterns that are specific to the training data and do not always generalize well. In this work, we observe this limitation with
Externí odkaz:
http://arxiv.org/abs/1909.00453
Amidst growing concern over media manipulation, NLP attention has focused on overt strategies like censorship and "fake news'". Here, we draw on two concepts from the political science literature to explore subtler strategies for government media man
Externí odkaz:
http://arxiv.org/abs/1808.09386
We present a computational analysis of cognate effects on the spontaneous linguistic productions of advanced non-native speakers. Introducing a large corpus of highly competent non-native English speakers, and using a set of carefully selected lexica
Externí odkaz:
http://arxiv.org/abs/1805.09590
This work distinguishes between translated and original text in the UN protocol corpus. By modeling the problem as classification problem, we can achieve up to 95% classification accuracy. We begin by deriving a parallel corpus for different language
Externí odkaz:
http://arxiv.org/abs/1805.07697
Translation has played an important role in trade, law, commerce, politics, and literature for thousands of years. Translators have always tried to be invisible; ideal translations should look as if they were written originally in the target language
Externí odkaz:
http://arxiv.org/abs/1704.07146