Zobrazeno 1 - 10
of 3 717
pro vyhledávání: '"A. Egert"'
Autor:
Wang, Zhilin, Bukharin, Alexander, Delalleau, Olivier, Egert, Daniel, Shen, Gerald, Zeng, Jiaqi, Kuchaiev, Oleksii, Dong, Yi
Reward models are critical for aligning models to follow instructions, and are typically trained following one of two popular paradigms: Bradley-Terry style or Regression style. However, there is a lack of evidence that either approach is better than
Externí odkaz:
http://arxiv.org/abs/2410.01257
Autor:
Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distri
Externí odkaz:
http://arxiv.org/abs/2406.11704
We consider divergence form operators with complex coefficients on an open subset of Euclidean space. Boundary conditions in the corresponding parabolic problem are dynamical, that is, the time derivative appears on the boundary. As a matter of fact,
Externí odkaz:
http://arxiv.org/abs/2406.09583
Autor:
Wang, Zhilin, Dong, Yi, Delalleau, Olivier, Zeng, Jiaqi, Shen, Gerald, Egert, Daniel, Zhang, Jimmy J., Sreedhar, Makesh Narsimhan, Kuchaiev, Oleksii
High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permiss
Externí odkaz:
http://arxiv.org/abs/2406.08673
The preliminary material of the monograph (arXiv:1607.03852) written by the first two authors contains two major imprecisions that necessitates a number of (in the end harmless) changes throughout the entire text. One is about identification of abstr
Externí odkaz:
http://arxiv.org/abs/2406.07570
Autor:
Shen, Gerald, Wang, Zhilin, Delalleau, Olivier, Zeng, Jiaqi, Dong, Yi, Egert, Daniel, Sun, Shengyang, Zhang, Jimmy, Jain, Sahil, Taghibakhshi, Ali, Ausin, Markel Sanz, Aithal, Ashwath, Kuchaiev, Oleksii
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which
Externí odkaz:
http://arxiv.org/abs/2405.01481
Autor:
Wang, Zhilin, Dong, Yi, Zeng, Jiaqi, Adams, Virginia, Sreedhar, Makesh Narsimhan, Egert, Daniel, Delalleau, Olivier, Scowcroft, Jane Polak, Kant, Neel, Swope, Aidan, Kuchaiev, Oleksii
Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful re
Externí odkaz:
http://arxiv.org/abs/2311.09528
Autor:
Cohen, Ian J., Arridge, Chris, Azari, Abigail, Bard, Chris, Clark, George, Crary, Frank, Curry, Shannon, Delamere, Peter, Dewey, Ryan M., DiBraccio, Gina A., Dong, Chuanfei, Drozdov, Alexander, Egert, Austin, Filwett, Rachael, Halekas, Jasper, Halford, Alexa, Hughes, Andréa, Garcia-Sage, Katherine, Gkioulidou, Matina, Goetz, Charlotte, Grava, Cesare, Hirsch, Michael, Huybrighs, Hans Leo F., Kollmann, Peter, Lamy, Laurent, Li, Wen, Liemohn, Michael, Marshal, Robert, Masters, Adam, McAteer, R. T. James, Molaverdikhani, Karan, Mukhopadhyay, Agnit, Nikoukar, Romina, Paxton, Larry, Regoli, Leonardo H., Roussos, Elias, Schneider, Nick, Sulaiman, Ali, Sun, Y., Szalay, Jamey
Heliophysics is the field that "studies the nature of the Sun, and how it influences the very nature of space - and, in turn, the atmospheres of planetary bodies and the technology that exists there." However, NASA's Heliophysics Division tends to li
Externí odkaz:
http://arxiv.org/abs/2308.11690
Given an elliptic operator $L= - \mathrm{div} (A \nabla \cdot)$ subject to mixed boundary conditions on an open subset of $\mathbb{R}^d$, we study the relation between Gaussian pointwise estimates for the kernel of the associated heat semigroup, H\"o
Externí odkaz:
http://arxiv.org/abs/2307.03648
Autor:
Böhnlein, Tim, Egert, Moritz
We give a simple argument to obtain $\mathrm{L}^p$-boundedness for heat semigroups associated to uniformly strongly elliptic systems on $\mathbb{R}^d$ by using Stein interpolation between Gaussian estimates and hypercontractivity. Our results give $p
Externí odkaz:
http://arxiv.org/abs/2302.09039