Zobrazeno 1 - 10
of 16 723
pro vyhledávání: '"Gao, Yan"'
Autor:
Wu, Shiwei, Chen, Joya, Lin, Kevin Qinghong, Wang, Qimeng, Gao, Yan, Xu, Qianli, Xu, Tong, Hu, Yao, Chen, Enhong, Shou, Mike Zheng
A well-known dilemma in large vision-language models (e.g., GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual understanding, it also significantly raises memory and computational costs, especially in long-te
Externí odkaz:
http://arxiv.org/abs/2408.16730
In this paper, we prove that for any post-critically finite rational map $f$ on the Riemann sphere $\overline{\mathbb{C}}$ and for each sufficiently large integer $n$, there exists a finite and connected graph $G$ in the Julia set of $f$, such that $
Externí odkaz:
http://arxiv.org/abs/2408.12371
Transition metal dichalcogenides (TMDs), exhibit a range of crystal structures and topological quantum states. The 1$T$ phase, in particular, shows promise for superconductivity driven by electron-phonon coupling, strain, pressure, and chemical dopin
Externí odkaz:
http://arxiv.org/abs/2407.21302
Autor:
Zhong, Meizhi, Zhang, Chen, Lei, Yikun, Liu, Xikai, Gao, Yan, Hu, Yao, Chen, Kehai, Zhang, Min
Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short
Externí odkaz:
http://arxiv.org/abs/2406.13282
Autor:
Gu, Zhouhong, Zhang, Lin, Zhu, Xiaoxuan, Chen, Jiangjie, Huang, Wenhao, Zhang, Yikai, Wang, Shusen, Ye, Zheyu, Gao, Yan, Feng, Hongwei, Xiao, Yanghua
Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called
Externí odkaz:
http://arxiv.org/abs/2406.12641
To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction dis
Externí odkaz:
http://arxiv.org/abs/2406.06073
Autor:
Yang, Dongjie, Huang, Suyuan, Lu, Chengqiang, Han, Xiaodong, Zhang, Haoxin, Gao, Yan, Hu, Yao, Zhao, Hai
Advancements in multimodal learning, particularly in video understanding and generation, require high-quality video-text datasets for improved model performance. Vript addresses this issue with a meticulously annotated corpus of 12K high-resolution v
Externí odkaz:
http://arxiv.org/abs/2406.06040
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training. Previous research has focused on aligning sequences' visual and semantic spatial distributions. Howeve
Externí odkaz:
http://arxiv.org/abs/2406.00639
Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we i
Externí odkaz:
http://arxiv.org/abs/2405.16789
Large Language Models (LLMs) have shown remarkable comprehension abilities but face challenges in GPU memory usage during inference, hindering their scalability for real-time applications like chatbots. To accelerate inference, we store computed keys
Externí odkaz:
http://arxiv.org/abs/2405.12532