Výsledky vyhledávání

Report

Data Selection via Optimal Control for Language Models

Autor: Gu, Yuxian, Dong, Li, Wang, Hongning, Hao, Yaru, Dong, Qingxiu, Wei, Furu, Huang, Minlie

This work investigates the selection of high-quality pre-training data from massive corpora to enhance LMs' capabilities for downstream usage. We formulate data selection as a generalized Optimal Control problem, which can be solved theoretically by

Externí odkaz: http://arxiv.org/abs/2410.07064

Zobrazit plný text záznamu

Report

Self-Boosting Large Language Models with Synthetic Preference Data

Autor: Dong, Qingxiu, Dong, Li, Zhang, Xingxing, Sui, Zhifang, Wei, Furu

Through alignment with human preferences, Large Language Models (LLMs) have advanced significantly in generating honest, harmless, and helpful responses. However, collecting high-quality preference data is a resource-intensive and creativity-demandin

Externí odkaz: http://arxiv.org/abs/2410.06961

Zobrazit plný text záznamu

Report

Differential Transformer

Autor: Ye, Tianzhu, Dong, Li, Xia, Yuqing, Sun, Yutao, Zhu, Yi, Huang, Gao, Wei, Furu

Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise. Specifically, the differential attention mechanism calculates att

Externí odkaz: http://arxiv.org/abs/2410.05258

Zobrazit plný text záznamu

Report

Scaling Optimal LR Across Token Horizons

Autor: Bjorck, Johan, Benhaim, Alon, Chaudhary, Vishrav, Wei, Furu, Song, Xia

State-of-the-art LLMs are powered by scaling -- scaling model size, dataset size and cluster size. It is economically infeasible to extensively tune hyperparameter for the largest runs. Instead, approximately optimal hyperparameters must be inferred

Externí odkaz: http://arxiv.org/abs/2409.19913

Zobrazit plný text záznamu

Report

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Autor: Wang, Hongyu, Ma, Shuming, Wang, Ruiping, Wei, Furu

We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by

Externí odkaz: http://arxiv.org/abs/2407.10969

Zobrazit plný text záznamu

Report

Autoregressive Speech Synthesis without Vector Quantization

Autor: Meng, Lingwei, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Han, Bing, Hu, Shujie, Liu, Yanqing, Li, Jinyu, Zhao, Sheng, Wu, Xixin, Meng, Helen, Wei, Furu

We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector qua

Externí odkaz: http://arxiv.org/abs/2407.08551

Zobrazit plný text záznamu

Report

Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

Autor: Zhang, Yadong, Mao, Shaoguang, Wu, Wenshan, Xia, Yan, Ge, Tao, Lan, Man, Wei, Furu

This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional

Externí odkaz: http://arxiv.org/abs/2407.06112

Zobrazit plný text záznamu

Report

Direct Preference Knowledge Distillation for Large Language Models

Autor: Li, Yixing, Gu, Yuxian, Dong, Li, Wang, Dequan, Cheng, Yu, Wei, Furu

In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical technique for transferring capabilities from teacher models to student models. However, existing KD methods face limitations and challenges in distillation of LLM

Externí odkaz: http://arxiv.org/abs/2406.19774

Zobrazit plný text záznamu

Report

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Autor: Cheng, Daixuan, Gu, Yuxian, Huang, Shaohan, Bi, Junyu, Huang, Minlie, Wei, Furu

Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards bette

Externí odkaz: http://arxiv.org/abs/2406.14491

Zobrazit plný text záznamu

Report

Meta Reasoning for Large Language Models

Autor: Gao, Peizhong, Xie, Ao, Mao, Shaoguang, Wu, Wenshan, Xia, Yan, Mi, Haipeng, Wei, Furu

We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) inspired by human meta-reasoning. Traditional in-context learning-based reasoning techniques, such as Tree-of-Thoughts, show p

Externí odkaz: http://arxiv.org/abs/2406.11698

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání