Zobrazeno 1 - 10
of 89 519
pro vyhledávání: '"P. A. Heads"'
Autor:
Elhelo, Amit, Geva, Mor
Attention heads are one of the building blocks of large language models (LLMs). Prior work on investigating their operation mostly focused on analyzing their behavior during inference for specific circuits or tasks. In this work, we seek a comprehens
Externí odkaz:
http://arxiv.org/abs/2412.11965
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
In this paper, we investigate the role of attention heads in Context-aware Machine Translation models for pronoun disambiguation in the English-to-German and English-to-French language directions. We analyze their influence by both observing and modi
Externí odkaz:
http://arxiv.org/abs/2412.11187
The automatic detection of pedestrian heads in crowded environments is essential for crowd analysis and management tasks, particularly in high-risk settings such as railway platforms and event entrances. These environments, characterized by dense cro
Externí odkaz:
http://arxiv.org/abs/2411.18164
Autor:
Xu, Yu, Tang, Fan, Cao, Juan, Zhang, Yuxin, Kong, Xiaoyu, Li, Jintao, Deussen, Oliver, Lee, Tong-Yee
Diffusion Transformers (DiTs) have exhibited robust capabilities in image generation tasks. However, accurate text-guided image editing for multimodal DiTs (MM-DiTs) still poses a significant challenge. Unlike UNet-based structures that could utilize
Externí odkaz:
http://arxiv.org/abs/2411.15034
Autor:
Zhang, Xiaofeng, Quan, Yihao, Gu, Chaochen, Shen, Chen, Yuan, Xiaosong, Yan, Shaotian, Cheng, Hao, Wu, Kaijie, Ye, Jieping
The hallucination problem in multimodal large language models (MLLMs) remains a common issue. Although image tokens occupy a majority of the input sequence of MLLMs, there is limited research to explore the relationship between image tokens and hallu
Externí odkaz:
http://arxiv.org/abs/2411.09968
In-context learning allows models like transformers to adapt to new tasks from a few examples without updating their weights, a desirable trait for reinforcement learning (RL). However, existing in-context RL methods, such as Algorithm Distillation (
Externí odkaz:
http://arxiv.org/abs/2411.01958
Long-context LLMs are increasingly in demand for applications such as retrieval-augmented generation. To defray the cost of pretraining LLMs over long contexts, recent work takes an approach of synthetic context extension: fine-tuning LLMs with synth
Externí odkaz:
http://arxiv.org/abs/2410.22316
Key-Value (KV) caching is a common technique to enhance the computational efficiency of Large Language Models (LLMs), but its memory overhead grows rapidly with input length. Prior work has shown that not all tokens are equally important for text gen
Externí odkaz:
http://arxiv.org/abs/2410.19258
Autor:
Gema, Aryo Pradipta, Jin, Chen, Abdulaal, Ahmed, Diethe, Tom, Teare, Philip, Alex, Beatrice, Minervini, Pasquale, Saseendran, Amrutha
Large Language Models (LLMs) often hallucinate, producing unfaithful or factually incorrect outputs by misrepresenting the provided context or incorrectly recalling internal knowledge. Recent studies have identified specific attention heads within th
Externí odkaz:
http://arxiv.org/abs/2410.18860
Vision Transformers have made remarkable progress in recent years, achieving state-of-the-art performance in most vision tasks. A key component of this success is due to the introduction of the Multi-Head Self-Attention (MHSA) module, which enables e
Externí odkaz:
http://arxiv.org/abs/2410.14874