Zobrazeno 1 - 10
of 606
pro vyhledávání: '"stochastic weight averaging"'
Ensemble models often improve generalization performances in challenging tasks. Yet, traditional techniques based on prediction averaging incur three well-known disadvantages: the computational overhead of training multiple models, increased latency,
Externí odkaz:
http://arxiv.org/abs/2406.19092
Fine-tuned Large Language Models (LLMs) often suffer from overconfidence and poor calibration, particularly when fine-tuned on small datasets. To address these challenges, we propose a simple combination of Low-Rank Adaptation (LoRA) with Gaussian St
Externí odkaz:
http://arxiv.org/abs/2405.03425
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
This paper introduces Bayesian uncertainty modeling using Stochastic Weight Averaging-Gaussian (SWAG) in Natural Language Understanding (NLU) tasks. We apply the approach to standard tasks in natural language inference (NLI) and demonstrate the effec
Externí odkaz:
http://arxiv.org/abs/2304.04726
Autor:
Lu, Peng, Kobyzev, Ivan, Rezagholizadeh, Mehdi, Rashid, Ahmad, Ghodsi, Ali, Langlais, Philippe
Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for e
Externí odkaz:
http://arxiv.org/abs/2212.05956
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Averaging neural network weights sampled by a backbone stochastic gradient descent (SGD) is a simple yet effective approach to assist the backbone SGD in finding better optima, in terms of generalization. From a statistical perspective, weight averag
Externí odkaz:
http://arxiv.org/abs/2201.00519
Despite their success, modern language models are fragile. Even small changes in their training pipeline can lead to unexpected results. We study this phenomenon by examining the robustness of ALBERT (arXiv:1909.11942) in combination with Stochastic
Externí odkaz:
http://arxiv.org/abs/2111.09612
We use Gaussian stochastic weight averaging (SWAG) to assess the model-form uncertainty associated with neural-network-based function approximation relevant to fluid flows. SWAG approximates a posterior Gaussian distribution of each weight, given tra
Externí odkaz:
http://arxiv.org/abs/2109.08248
Autor:
Morimoto, Masaki a, Fukami, Kai a, b, Maulik, Romit c, ⁎, Vinuesa, Ricardo d, Fukagata, Koji a
Publikováno v:
In Physica D: Nonlinear Phenomena 15 November 2022 440