Zobrazeno 1 - 10
of 52
pro vyhledávání: '"Ovrelid, Lilja"'
Autor:
Samuel, David, Mikhailov, Vladislav, Velldal, Erik, Øvrelid, Lilja, Charpentier, Lucas Georges Gabriel, Kutuzov, Andrey
Training large language models requires vast amounts of data, posing a challenge for less widely spoken languages like Norwegian and even more so for truly low-resource languages like S\'ami. To address this issue, we present a novel three-stage cont
Externí odkaz:
http://arxiv.org/abs/2412.06484
In sentiment analysis of longer texts, there may be a variety of topics discussed, of entities mentioned, and of sentiments expressed regarding each entity. We find a lack of studies exploring how such texts express their sentiment towards each entit
Externí odkaz:
http://arxiv.org/abs/2407.03916
Autor:
Wold, Sondre, Simon, Étienne, Charpentier, Lucas Georges Gabriel, Kostylev, Egor V., Velldal, Erik, Øvrelid, Lilja
Grounded language models use external sources of information, such as knowledge graphs, to meet some of the general challenges associated with pre-training. By extending previous work on compositional generalization in semantic parsing, we allow for
Externí odkaz:
http://arxiv.org/abs/2406.04989
Autor:
Mæhlum, Petter, Samuel, David, Norman, Rebecka Maria, Jelin, Elma, Bjertnæs, Øyvind Andresen, Øvrelid, Lilja, Velldal, Erik
Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an
Externí odkaz:
http://arxiv.org/abs/2404.18832
Text sanitization is the task of redacting a document to mask all occurrences of (direct or indirect) personal identifiers, with the goal of concealing the identity of the individual(s) referred in it. In this paper, we consider a two-step approach t
Externí odkaz:
http://arxiv.org/abs/2310.14312
Autor:
Hussiny, Mohammad Ali, Øvrelid, Lilja
This paper introduces the first emotion annotated dataset for the Dari variant of Persian spoken in Afghanistan. The LetHerLearn dataset contains 7,600 tweets posted in reaction to the Taliban ban of women rights to education in 2022 and has been man
Externí odkaz:
http://arxiv.org/abs/2306.16268
We propose a graph-based event extraction framework JSEEGraph that approaches the task of event extraction as general graph parsing in the tradition of Meaning Representation Parsing. It explicitly encodes entities and events in a single semantic gra
Externí odkaz:
http://arxiv.org/abs/2306.14633
Autor:
Samuel, David, Øvrelid, Lilja
In recent years, language models have become increasingly larger and more complex. However, the input representations for these models continue to rely on simple and greedy subword tokenization methods. In this paper, we propose a novel tokenization
Externí odkaz:
http://arxiv.org/abs/2306.07764
In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an
Externí odkaz:
http://arxiv.org/abs/2306.02871
Autor:
Samuel, David, Kutuzov, Andrey, Touileb, Samia, Velldal, Erik, Øvrelid, Lilja, Rønningstad, Egil, Sigdel, Elina, Palatkina, Anna
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-d
Externí odkaz:
http://arxiv.org/abs/2305.03880