Výsledky vyhledávání

Report

HelpSteer2-Preference: Complementing Ratings with Preferences

Autor: Wang, Zhilin, Bukharin, Alexander, Delalleau, Olivier, Egert, Daniel, Shen, Gerald, Zeng, Jiaqi, Kuchaiev, Oleksii, Dong, Yi

Reward models are critical for aligning models to follow instructions, and are typically trained following one of two popular paradigms: Bradley-Terry style or Regression style. However, there is a lack of evidence that either approach is better than

Externí odkaz: http://arxiv.org/abs/2410.01257

Zobrazit plný text záznamu

Report

Nemotron-4 340B Technical Report

Autor: Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distri

Externí odkaz: http://arxiv.org/abs/2406.11704

Zobrazit plný text záznamu

Report

Bounded functional calculus for divergence form operators with dynamical boundary conditions

Autor: Böhnlein, Tim, Egert, Moritz, Rehberg, Joachim

We consider divergence form operators with complex coefficients on an open subset of Euclidean space. Boundary conditions in the corresponding parabolic problem are dynamical, that is, the time derivative appears on the boundary. As a matter of fact,

Externí odkaz: http://arxiv.org/abs/2406.09583

Zobrazit plný text záznamu

Report

HelpSteer2: Open-source dataset for training top-performing reward models

Autor: Wang, Zhilin, Dong, Yi, Delalleau, Olivier, Zeng, Jiaqi, Shen, Gerald, Egert, Daniel, Zhang, Jimmy J., Sreedhar, Makesh Narsimhan, Kuchaiev, Oleksii

High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permiss

Externí odkaz: http://arxiv.org/abs/2406.08673

Zobrazit plný text záznamu

Report

Corrigendum to: Elliptic Boundary Value Problems with Fractional Regularity Data: The First Order Approach

Autor: Amenta, Alex, Auscher, Pascal, Egert, Moritz

The preliminary material of the monograph (arXiv:1607.03852) written by the first two authors contains two major imprecisions that necessitates a number of (in the end harmless) changes throughout the entire text. One is about identification of abstr

Externí odkaz: http://arxiv.org/abs/2406.07570

Zobrazit plný text záznamu

Report

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Autor: Shen, Gerald, Wang, Zhilin, Delalleau, Olivier, Zeng, Jiaqi, Dong, Yi, Egert, Daniel, Sun, Shengyang, Zhang, Jimmy, Jain, Sahil, Taghibakhshi, Ali, Ausin, Markel Sanz, Aithal, Ashwath, Kuchaiev, Oleksii

Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which

Externí odkaz: http://arxiv.org/abs/2405.01481

Zobrazit plný text záznamu

Report

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

Autor: Wang, Zhilin, Dong, Yi, Zeng, Jiaqi, Adams, Virginia, Sreedhar, Makesh Narsimhan, Egert, Daniel, Delalleau, Olivier, Scowcroft, Jane Polak, Kant, Neel, Swope, Aidan, Kuchaiev, Oleksii

Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful re

Externí odkaz: http://arxiv.org/abs/2311.09528

Zobrazit plný text záznamu

Report

The case for studying other planetary magnetospheres and atmospheres in Heliophysics

Heliophysics is the field that "studies the nature of the Sun, and how it influences the very nature of space - and, in turn, the atmospheres of planetary bodies and the technology that exists there." However, NASA's Heliophysics Division tends to li

Externí odkaz: http://arxiv.org/abs/2308.11690

Zobrazit plný text záznamu

Report

Gaussian estimates vs. elliptic regularity on open sets

Autor: Böhnlein, Tim, Ciani, Simone, Egert, Moritz

Given an elliptic operator $L= - \mathrm{div} (A \nabla \cdot)$ subject to mixed boundary conditions on an open subset of $\mathbb{R}^d$, we study the relation between Gaussian pointwise estimates for the kernel of the associated heat semigroup, H\"o

Externí odkaz: http://arxiv.org/abs/2307.03648

Zobrazit plný text záznamu

Report

Explicit improvements for $\mathrm{L}^p$-estimates related to elliptic systems

Autor: Böhnlein, Tim, Egert, Moritz

We give a simple argument to obtain $\mathrm{L}^p$-boundedness for heat semigroups associated to uniformly strongly elliptic systems on $\mathbb{R}^d$ by using Stein interpolation between Gaussian estimates and hypercontractivity. Our results give $p

Externí odkaz: http://arxiv.org/abs/2302.09039

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání