Zobrazeno 1 - 10
of 69
pro vyhledávání: '"Sun, Simeng"'
We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The in
Externí odkaz:
http://arxiv.org/abs/2410.01131
Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-
Externí odkaz:
http://arxiv.org/abs/2406.19371
Autor:
Hsieh, Cheng-Ping, Sun, Simeng, Kriman, Samuel, Acharya, Shantanu, Rekesh, Dima, Jia, Fei, Zhang, Yang, Ginsburg, Boris
The needle-in-a-haystack (NIAH) test, which examines the ability to retrieve a piece of information (the "needle") from long distractor texts (the "haystack"), has been widely adopted to evaluate long-context language models (LMs). However, this simp
Externí odkaz:
http://arxiv.org/abs/2404.06654
Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal contro
Externí odkaz:
http://arxiv.org/abs/2311.01449
During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources. In this technical report, we empirically investigate an efficient implementatio
Externí odkaz:
http://arxiv.org/abs/2309.09055
Strategies such as chain-of-thought prompting improve the performance of large language models (LLMs) on complex reasoning tasks by decomposing input examples into intermediate steps. However, it remains unclear how to apply such methods to reason ov
Externí odkaz:
http://arxiv.org/abs/2305.14564
Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale. This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an
Externí odkaz:
http://arxiv.org/abs/2302.11521
With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new l
Externí odkaz:
http://arxiv.org/abs/2302.03528
Autor:
Feng, Ruoyu, Jin, Xin, Guo, Zongyu, Feng, Runsen, Gao, Yixin, He, Tianyu, Zhang, Zhizheng, Sun, Simeng, Chen, Zhibo
Image Coding for Machines (ICM) aims to compress images for AI tasks analysis rather than meeting human perception. Learning a kind of feature that is both general (for AI tasks) and compact (for compression) is pivotal for its success. In this paper
Externí odkaz:
http://arxiv.org/abs/2207.01932
While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed. To this end, we introduce ChapterBreak, a chal
Externí odkaz:
http://arxiv.org/abs/2204.10878