Výsledky vyhledávání

Report

nGPT: Normalized Transformer with Representation Learning on the Hypersphere

Autor: Loshchilov, Ilya, Hsieh, Cheng-Ping, Sun, Simeng, Ginsburg, Boris

We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The in

Externí odkaz: http://arxiv.org/abs/2410.01131

Zobrazit plný text záznamu

Report

Suri: Multi-constraint Instruction Following for Long-form Text Generation

Autor: Pham, Chau Minh, Sun, Simeng, Iyyer, Mohit

Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-

Externí odkaz: http://arxiv.org/abs/2406.19371

Zobrazit plný text záznamu

Report

RULER: What's the Real Context Size of Your Long-Context Language Models?

Autor: Hsieh, Cheng-Ping, Sun, Simeng, Kriman, Samuel, Acharya, Shantanu, Rekesh, Dima, Jia, Fei, Zhang, Yang, Ginsburg, Boris

The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simp

Externí odkaz: http://arxiv.org/abs/2404.06654

Zobrazit plný text záznamu

Report

TopicGPT: A Prompt-based Topic Modeling Framework

Autor: Pham, Chau Minh, Hoyle, Alexander, Sun, Simeng, Resnik, Philip, Iyyer, Mohit

Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal contro

Externí odkaz: http://arxiv.org/abs/2311.01449

Zobrazit plný text záznamu

Report

Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF

Autor: Sun, Simeng, Gupta, Dhawal, Iyyer, Mohit

During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementatio

Externí odkaz: http://arxiv.org/abs/2309.09055

Zobrazit plný text záznamu

Report

PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents

Autor: Sun, Simeng, Liu, Yang, Wang, Shuohang, Zhu, Chenguang, Iyyer, Mohit

Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods to reason ov

Externí odkaz: http://arxiv.org/abs/2305.14564

Zobrazit plný text záznamu

Report

How Does In-Context Learning Help Prompt Tuning?

Autor: Sun, Simeng, Liu, Yang, Iter, Dan, Zhu, Chenguang, Iyyer, Mohit

Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale. This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an

Externí odkaz: http://arxiv.org/abs/2302.11521

Zobrazit plný text záznamu

Report

Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

Autor: Sun, Simeng, Elbayad, Maha, Sun, Anna, Cross, James

With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new l

Externí odkaz: http://arxiv.org/abs/2302.03528

Zobrazit plný text záznamu

Report

Image Coding for Machines with Omnipotent Feature Learning

Autor: Feng, Ruoyu, Jin, Xin, Guo, Zongyu, Feng, Runsen, Gao, Yixin, He, Tianyu, Zhang, Zhizheng, Sun, Simeng, Chen, Zhibo

Image Coding for Machines (ICM) aims to compress images for AI tasks analysis rather than meeting human perception. Learning a kind of feature that is both general (for AI tasks) and compact (for compression) is pivotal for its success. In this paper

Externí odkaz: http://arxiv.org/abs/2207.01932

Zobrazit plný text záznamu

Report

ChapterBreak: A Challenge Dataset for Long-Range Language Models

Autor: Sun, Simeng, Thai, Katherine, Iyyer, Mohit

While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed. To this end, we introduce ChapterBreak, a chal

Externí odkaz: http://arxiv.org/abs/2204.10878

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání