Zobrazeno 1 - 10
of 106
pro vyhledávání: '"Tu, Kewei"'
Recently, sharing key-value (KV) cache across layers has been found effective in efficient inference of large language models (LLMs). To systematically investigate different techniques of cross-layer KV sharing, we propose a unified framework that co
Externí odkaz:
http://arxiv.org/abs/2410.14442
Recently, retrieval-based language models (RLMs) have received much attention. However, most of them leverage a pre-trained retriever with fixed parameters, which may not adapt well to causal language models. In this work, we propose Grouped Cross-At
Externí odkaz:
http://arxiv.org/abs/2410.01651
Named entity recognition (NER) models often struggle with noisy inputs, such as those with spelling mistakes or errors generated by Optical Character Recognition processes, and learning a robust NER model is challenging. Existing robust NER models ut
Externí odkaz:
http://arxiv.org/abs/2407.18562
Syntactic Transformer language models aim to achieve better generalization through simultaneously modeling syntax trees and sentences. While prior work has been focusing on adding constituency-based structures to Transformers, we introduce Dependency
Externí odkaz:
http://arxiv.org/abs/2407.17406
Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-
Externí odkaz:
http://arxiv.org/abs/2406.16747
As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To addr
Externí odkaz:
http://arxiv.org/abs/2406.15245
Huge memory consumption has been a major bottleneck for deploying high-throughput large language models in real-world applications. In addition to the large number of parameters, the key-value (KV) cache for the attention mechanism in the transformer
Externí odkaz:
http://arxiv.org/abs/2405.10637
Autor:
Cheng, Ning, Yan, Zhaohui, Wang, Ziming, Li, Zhijie, Yu, Jiaming, Zheng, Zilong, Tu, Kewei, Xu, Jinan, Han, Wenjuan
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. Nevertheless, an ongoing controversy exists over the extent to which LLMs can grasp struc
Externí odkaz:
http://arxiv.org/abs/2405.06410
Autor:
Hui, Wenyang, Tu, Kewei
Large language models (LLMs) have demonstrated impressive capability in reasoning and planning when integrated with tree-search-based prompting methods. However, since these methods ignore the previous search experiences, they often make the same mis
Externí odkaz:
http://arxiv.org/abs/2404.05449
Publikováno v:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 424-438
In the age of neural natural language processing, there are plenty of works trying to derive interpretations of neural models. Intuitively, when gold rationales exist during training, one can additionally train the model to match its interpretation w
Externí odkaz:
http://arxiv.org/abs/2404.02068