Výsledky vyhledávání - "Sreedhar, Makesh Narsimhan"

Report

Unsupervised Extraction of Dialogue Policies from Conversations

Autor: Sreedhar, Makesh Narsimhan, Rebedea, Traian, Parisien, Christopher

Dialogue policies play a crucial role in developing task-oriented dialogue systems, yet their development and maintenance are challenging and typically require substantial effort from experts in dialogue modeling. While in many situations, large amou

Externí odkaz: http://arxiv.org/abs/2406.15214

Zobrazit plný text záznamu

Report

Nemotron-4 340B Technical Report

Autor: Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, Zhu, Chen

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distri

Externí odkaz: http://arxiv.org/abs/2406.11704

Zobrazit plný text záznamu

Report

HelpSteer2: Open-source dataset for training top-performing reward models

Autor: Wang, Zhilin, Dong, Yi, Delalleau, Olivier, Zeng, Jiaqi, Shen, Gerald, Egert, Daniel, Zhang, Jimmy J., Sreedhar, Makesh Narsimhan, Kuchaiev, Oleksii

High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permiss

Externí odkaz: http://arxiv.org/abs/2406.08673

Zobrazit plný text záznamu

Report

CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Autor: Sreedhar, Makesh Narsimhan, Rebedea, Traian, Ghosh, Shaona, Zeng, Jiaqi, Parisien, Christopher

Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning. There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversa

Externí odkaz: http://arxiv.org/abs/2404.03820

Zobrazit plný text záznamu

Report

Evolving Domain Adaptation of Pretrained Language Models for Text Classification

Autor: Chuang, Yun-Shiuan, Wu, Yi, Gupta, Dhruv, Uppaal, Rheeya, Kumar, Ananya, Sun, Luhang, Sreedhar, Makesh Narsimhan, Yang, Sijia, Rogers, Timothy T., Hu, Junjie

Adapting pre-trained language models (PLMs) for time-series text classification amidst evolving domain shifts (EDS) is critical for maintaining accuracy in applications like stance detection. This study benchmarks the effectiveness of evolving domain

Externí odkaz: http://arxiv.org/abs/2311.09661

Zobrazit plný text záznamu

Report

HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM

Autor: Wang, Zhilin, Dong, Yi, Zeng, Jiaqi, Adams, Virginia, Sreedhar, Makesh Narsimhan, Egert, Daniel, Delalleau, Olivier, Scowcroft, Jane Polak, Kant, Neel, Swope, Aidan, Kuchaiev, Oleksii

Existing open-source helpfulness preference datasets do not specify what makes some responses more helpful and others less so. Models trained on these datasets can incidentally learn to model dataset artifacts (e.g. preferring longer but unhelpful re

Externí odkaz: http://arxiv.org/abs/2311.09528

Zobrazit plný text záznamu

Report

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Autor: Dong, Yi, Wang, Zhilin, Sreedhar, Makesh Narsimhan, Wu, Xianchao, Kuchaiev, Oleksii

Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) s

Externí odkaz: http://arxiv.org/abs/2310.05344

Zobrazit plný text záznamu

Report

Prompt Learning for Domain Adaptation in Task-Oriented Dialogue

Autor: Sreedhar, Makesh Narsimhan, Parisien, Christopher

Conversation designers continue to face significant obstacles when creating production quality task-oriented dialogue systems. The complexity and cost involved in schema development and data collection is often a major barrier for such designers, lim

Externí odkaz: http://arxiv.org/abs/2211.05596

Zobrazit plný text záznamu

Report

Local Byte Fusion for Neural Machine Translation

Autor: Sreedhar, Makesh Narsimhan, Wan, Xiangpeng, Cheng, Yu, Hu, Junjie

Subword tokenization schemes are the dominant technique used in current NLP models. However, such schemes can be rigid and tokenizers built on one corpus do not adapt well to other parallel corpora. It has also been observed that in multilingual corp

Externí odkaz: http://arxiv.org/abs/2205.11490

Zobrazit plný text záznamu

Report

Learning Improvised Chatbots from Adversarial Modifications of Natural Language Feedback

Autor: Sreedhar, Makesh Narsimhan, Ni, Kun, Reddy, Siva

The ubiquitous nature of chatbots and their interaction with users generate an enormous amount of data. Can we improve chatbots using this data? A self-feeding chatbot improves itself by asking natural language feedback when a user is dissatisfied wi

Externí odkaz: http://arxiv.org/abs/2010.07261

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání