Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Abubaker, Abdalgader"'
Preference optimization methods have been successfully applied to improve not only the alignment of large language models (LLMs) with human values, but also specific natural language tasks such as summarization and stylistic continuations. This paper
Externí odkaz:
http://arxiv.org/abs/2406.16061
This paper explores the effects of various forms of regularization in the context of language model alignment via self-play. While both reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) require to collect cost
Externí odkaz:
http://arxiv.org/abs/2404.04291
Recently, pretraining methods for the Graph Neural Networks (GNNs) have been successful at learning effective representations from unlabeled graph data. However, most of these methods rely on pairwise relations in the graph and do not capture the und
Externí odkaz:
http://arxiv.org/abs/2311.11368
Autor:
Trabelsi, Imen, Abdellatif, Manel, Abubaker, Abdalgader, Moha, Naouel, Mosser, Sébastien, Ebrahimi‐Kahou, Samira, Guéhéneuc, Yann‐Gaël
Publikováno v:
Journal of Software: Evolution & Process; Oct2023, Vol. 35 Issue 10, p1-23, 23p
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.