Výsledky vyhledávání

Report

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Autor: Wu, Shiwei, Chen, Joya, Lin, Kevin Qinghong, Wang, Qimeng, Gao, Yan, Xu, Qianli, Xu, Tong, Hu, Yao, Chen, Enhong, Shou, Mike Zheng

A well-known dilemma in large vision-language models (e.g., GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual understanding, it also significantly raises memory and computational costs, especially in long-te

Externí odkaz: http://arxiv.org/abs/2408.16730

Zobrazit plný text záznamu

Report

Invariant graphs in Julia sets and decompositions of rational maps

Autor: Cui, Guizhen, Gao, Yan, Zeng, Jinsong

In this paper, we prove that for any post-critically finite rational map $f$ on the Riemann sphere $\overline{\mathbb{C}}$ and for each sufficiently large integer $n$, there exists a finite and connected graph $G$ in the Julia set of $f$, such that $

Externí odkaz: http://arxiv.org/abs/2408.12371

Zobrazit plný text záznamu

Report

The possible coexistence of superconductivity and topological electronic states in 1T-RhSeTe

Autor: Zhang, Tengdong, Fan, Rui, Gao, Yan, Wu, Yanling, Xu, Xiaodan, Yao, Dao-Xin, Li, Jun

Transition metal dichalcogenides (TMDs), exhibit a range of crystal structures and topological quantum states. The 1$T$ phase, in particular, shows promise for superconductivity driven by electron-phonon coupling, strain, pressure, and chemical dopin

Externí odkaz: http://arxiv.org/abs/2407.21302

Zobrazit plný text záznamu

Report

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective

Autor: Zhong, Meizhi, Zhang, Chen, Lei, Yikun, Liu, Xikai, Gao, Yan, Hu, Yao, Chen, Kehai, Zhang, Min

Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short

Externí odkaz: http://arxiv.org/abs/2406.13282

Zobrazit plný text záznamu

Report

DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

Autor: Gu, Zhouhong, Zhang, Lin, Zhu, Xiaoxuan, Chen, Jiangjie, Huang, Wenhao, Zhang, Yikai, Wang, Shusen, Ye, Zheyu, Gao, Yan, Feng, Hongwei, Xiao, Yanghua

Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called

Externí odkaz: http://arxiv.org/abs/2406.12641

Zobrazit plný text záznamu

Report

Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval

Autor: Gao, Yan, Cao, Zhiwei, Miao, Zhongjian, Yang, Baosong, Liu, Shiyu, Zhang, Min, Su, Jinsong

To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction dis

Externí odkaz: http://arxiv.org/abs/2406.06073

Zobrazit plný text záznamu

Report

Vript: A Video Is Worth Thousands of Words

Autor: Yang, Dongjie, Huang, Suyuan, Lu, Chengqiang, Han, Xiaodong, Zhang, Haoxin, Gao, Yan, Hu, Yao, Zhao, Hai

Advancements in multimodal learning, particularly in video understanding and generation, require high-quality video-text datasets for improved model performance. Vript addresses this issue with a meticulously annotated corpus of 12K high-resolution v

Externí odkaz: http://arxiv.org/abs/2406.06040

Zobrazit plný text záznamu

Report

An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition

Autor: Xu, Haojun, Gao, Yan, Li, Jie, Gao, Xinbo

Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training. Previous research has focused on aligning sequences' visual and semantic spatial distributions. Howeve

Externí odkaz: http://arxiv.org/abs/2406.00639

Zobrazit plný text záznamu

Report

NoteLLM-2: Multimodal Large Representation Models for Recommendation

Autor: Zhang, Chao, Zhang, Haoxin, Wu, Shiwei, Wu, Di, Xu, Tong, Gao, Yan, Hu, Yao, Chen, Enhong

Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we i

Externí odkaz: http://arxiv.org/abs/2405.16789

Zobrazit plný text záznamu

Report

PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference

Autor: Yang, Dongjie, Han, XiaoDong, Gao, Yan, Hu, Yao, Zhang, Shilin, Zhao, Hai

Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference, hindering their scalability for real-time applications like chatbots. To accelerate inference, we store computed keys

Externí odkaz: http://arxiv.org/abs/2405.12532

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání