Zobrazeno 1 - 10
of 2 315
pro vyhledávání: '"SAMUEL, DAVID A."'
We present a simple way to merge masked language modeling with causal language modeling. This hybrid training objective results in a model that combines the strengths of both modeling paradigms within a single transformer stack: GPT-BERT can be trans
Externí odkaz:
http://arxiv.org/abs/2410.24159
Autor:
Samuel, David
While in-context learning is commonly associated with causal language models, such as GPT, we demonstrate that this capability also 'emerges' in masked language models. Through an embarrassingly simple inference technique, we enable an existing maske
Externí odkaz:
http://arxiv.org/abs/2406.04823
Autor:
Mæhlum, Petter, Samuel, David, Norman, Rebecka Maria, Jelin, Elma, Bjertnæs, Øyvind Andresen, Øvrelid, Lilja, Velldal, Erik
Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an
Externí odkaz:
http://arxiv.org/abs/2404.18832
Retrieval-augmented language models pose a promising alternative to standard language modeling. During pretraining, these models search in a corpus of documents for contextually relevant information that could aid the language modeling objective. We
Externí odkaz:
http://arxiv.org/abs/2404.10939
This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models. This aspect is evaluated by participating in the BabyLM challenge, where our solution won both the strict and
Externí odkaz:
http://arxiv.org/abs/2311.02265
Autor:
Samuel, David
This paper explores the use of latent bootstrapping, an alternative self-supervision technique, for pretraining language models. Unlike the typical practice of using self-supervision on discrete subwords, latent bootstrapping leverages contextualized
Externí odkaz:
http://arxiv.org/abs/2310.19420
Publikováno v:
BMC Pediatrics, Vol 24, Iss 1, Pp 1-4 (2024)
Abstract Background Kawasaki disease (KD) is a medium artery vasculitis that predominantly affects children under age 5. Prompt diagnosis and treatment with IVIG and moderate dose aspirin is required to prevent the formation of coronary artery aneury
Externí odkaz:
https://doaj.org/article/18850bd5bf01411c887b46e6372de100
Autor:
Jentoft, Matias, Samuel, David
While there has been a surge of large language models for Norwegian in recent years, we lack any tool to evaluate their understanding of grammaticality. We present two new Norwegian datasets for this task. NoCoLA_class is a supervised binary classifi
Externí odkaz:
http://arxiv.org/abs/2306.07790
Autor:
Samuel, David, Øvrelid, Lilja
In recent years, language models have become increasingly larger and more complex. However, the input representations for these models continue to rely on simple and greedy subword tokenization methods. In this paper, we propose a novel tokenization
Externí odkaz:
http://arxiv.org/abs/2306.07764
Autor:
Samuel, David, Kutuzov, Andrey, Touileb, Samia, Velldal, Erik, Øvrelid, Lilja, Rønningstad, Egil, Sigdel, Elina, Palatkina, Anna
We present NorBench: a streamlined suite of NLP tasks and probes for evaluating Norwegian language models (LMs) on standardized data splits and evaluation metrics. We also introduce a range of new Norwegian language models (both encoder and encoder-d
Externí odkaz:
http://arxiv.org/abs/2305.03880