Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Sreedhar, Makesh Narsimhan"'
Dialogue policies play a crucial role in developing task-oriented dialogue systems, yet their development and maintenance are challenging and typically require substantial effort from experts in dialogue modeling. While in many situations, large amou
Externí odkaz:
http://arxiv.org/abs/2406.15214
Autor:
Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distri
Externí odkaz:
http://arxiv.org/abs/2406.11704
Autor:
Wang, Zhilin, Dong, Yi, Delalleau, Olivier, Zeng, Jiaqi, Shen, Gerald, Egert, Daniel, Zhang, Jimmy J., Sreedhar, Makesh Narsimhan, Kuchaiev, Oleksii
High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permiss
Externí odkaz:
http://arxiv.org/abs/2406.08673
Autor:
Sreedhar, Makesh Narsimhan, Rebedea, Traian, Ghosh, Shaona, Zeng, Jiaqi, Parisien, Christopher
Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning. There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversa
Externí odkaz:
http://arxiv.org/abs/2404.03820
Autor:
Chuang, Yun-Shiuan, Wu, Yi, Gupta, Dhruv, Uppaal, Rheeya, Kumar, Ananya, Sun, Luhang, Sreedhar, Makesh Narsimhan, Yang, Sijia, Rogers, Timothy T., Hu, Junjie
Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain
Externí odkaz:
http://arxiv.org/abs/2311.09661
Autor:
Wang, Zhilin, Dong, Yi, Zeng, Jiaqi, Adams, Virginia, Sreedhar, Makesh Narsimhan, Egert, Daniel, Delalleau, Olivier, Scowcroft, Jane Polak, Kant, Neel, Swope, Aidan, Kuchaiev, Oleksii
Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful re
Externí odkaz:
http://arxiv.org/abs/2311.09528
Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) s
Externí odkaz:
http://arxiv.org/abs/2310.05344
Conversation designers continue to face significant obstacles when creating production quality task-oriented dialogue systems. The complexity and cost involved in schema development and data collection is often a major barrier for such designers, lim
Externí odkaz:
http://arxiv.org/abs/2211.05596
Subword tokenization schemes are the dominant technique used in current NLP models. However, such schemes can be rigid and tokenizers built on one corpus do not adapt well to other parallel corpora. It has also been observed that in multilingual corp
Externí odkaz:
http://arxiv.org/abs/2205.11490
The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied wi
Externí odkaz:
http://arxiv.org/abs/2010.07261