Zobrazeno 1 - 10
of 1 420
pro vyhledávání: '"training data quality"'
This study investigates the relative impact of training data quality versus quantity on the performance of small language models (SLMs), utilizing the TinyStories dataset for empirical analysis. Analysis of dataset variations with respect to size (25
Externí odkaz:
http://arxiv.org/abs/2411.15821
Publikováno v:
NeurIPS 2024 Workshop on Time Series in the Age of Large Models
Recently, there has been a growing interest in time series foundation models that generalize across different downstream tasks. A key to strong foundation models is a diverse pre-training dataset, which is particularly challenging to collect for time
Externí odkaz:
http://arxiv.org/abs/2412.06368
Large language model pre-training has traditionally relied on human experts to craft heuristics for improving the corpora quality, resulting in numerous rules developed to date. However, these rules lack the flexibility to address the unique characte
Externí odkaz:
http://arxiv.org/abs/2409.17115
Autor:
Pantazis, Omiros, Bevan, Peggy, Pringle, Holly, Ferreira, Guilherme Braga, Ingram, Daniel J., Madsen, Emily, Thomas, Liam, Thanet, Dol Raj, Silwal, Thakur, Rayamajhi, Santosh, Brostow, Gabriel, Mac Aodha, Oisin, Jones, Kate E.
Large wildlife image collections from camera traps are crucial for biodiversity monitoring, offering insights into species richness, occupancy, and activity patterns. However, manual processing of these data is time-consuming, hindering analytical pr
Externí odkaz:
http://arxiv.org/abs/2408.14348
Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety of languages, but many languages remain excluded from these models. In this paper, we analyse the effect of pre-training with monolingual data for a l
Externí odkaz:
http://arxiv.org/abs/2205.10517
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Hagendorff, Thilo
Machine behavior that is based on learning algorithms can be significantly influenced by the exposure to data of different qualities. Up to now, those qualities are solely measured in technical terms, but not in ethical ones, despite the significant
Externí odkaz:
http://arxiv.org/abs/2008.11463
Music source separation performance has greatly improved in recent years with the advent of approaches based on deep learning. Such methods typically require large amounts of labelled training data, which in the case of music consist of mixtures and
Externí odkaz:
http://arxiv.org/abs/1909.08494
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Hagendorff, Thilo1 (AUTHOR) thilo.hagendorff@uni-tuebingen.de
Publikováno v:
Minds & Machines. Dec2021, Vol. 31 Issue 4, p563-593. 31p.