Zobrazeno 1 - 10
of 13 377 829
pro vyhledávání: '"WANG, AN"'
Autor:
Zohar, Orr, Wang, Xiaohan, Dubois, Yann, Mehta, Nikhil, Xiao, Tong, Hansen-Estruch, Philippe, Yu, Licheng, Wang, Xiaofang, Juefei-Xu, Felix, Zhang, Ning, Yeung-Levy, Serena, Xia, Xide
Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made with
Externí odkaz:
http://arxiv.org/abs/2412.10360
Autor:
Wu, Zhiyu, Chen, Xiaokang, Pan, Zizheng, Liu, Xingchao, Liu, Wen, Dai, Damai, Gao, Huazuo, Ma, Yiyang, Wu, Chengyue, Wang, Bingxuan, Xie, Zhenda, Wu, Yu, Hu, Kai, Wang, Jiawei, Sun, Yaofeng, Li, Yukun, Piao, Yishi, Guan, Kang, Liu, Aixin, Xie, Xin, You, Yuxiang, Dong, Kai, Yu, Xingkai, Zhang, Haowei, Zhao, Liang, Wang, Yisong, Ruan, Chong
We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic til
Externí odkaz:
http://arxiv.org/abs/2412.10302
Autor:
Li, Shuaiting, Wang, Chengxuan, Deng, Juncan, Wang, Zeyu, Ye, Zewen, Wang, Zongsheng, Shen, Haibin, Huang, Kejie
Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important
Externí odkaz:
http://arxiv.org/abs/2412.10261
Autor:
Wang, Haowei, Zhang, Rupeng, Wang, Junjie, Li, Mingyang, Huang, Yuekai, Wang, Dandan, Wang, Qing
Tool-calling has changed Large Language Model (LLM) applications by integrating external tools, significantly enhancing their functionality across diverse tasks. However, this integration also introduces new security vulnerabilities, particularly in
Externí odkaz:
http://arxiv.org/abs/2412.10198
Autor:
Zhang, Weixiang, Xie, Shuzhao, Ren, Chengwei, Xie, Siyi, Tang, Chen, Ge, Shijia, Wang, Mingzi, Wang, Zhi
We propose EVOlutionary Selector (EVOS), an efficient training paradigm for accelerating Implicit Neural Representation (INR). Unlike conventional INR training that feeds all samples through the neural network in each iteration, our approach restrict
Externí odkaz:
http://arxiv.org/abs/2412.10153
Autor:
Du, Zhihao, Wang, Yuxuan, Chen, Qian, Shi, Xian, Lv, Xiang, Zhao, Tianyu, Gao, Zhifu, Yang, Yexin, Gao, Changfeng, Wang, Hui, Yu, Fan, Liu, Huadai, Sheng, Zhengyan, Gu, Yue, Deng, Chong, Wang, Wen, Zhang, Shiliang, Yan, Zhijie, Zhou, Jingren
In our previous work, we introduced CosyVoice, a multilingual speech synthesis model based on supervised discrete speech tokens. By employing progressive semantic decoding with two popular generative models, language models (LMs) and Flow Matching, C
Externí odkaz:
http://arxiv.org/abs/2412.10117
Publikováno v:
Quantum Inf. Process.23(2024),401
The Schmidt number characterizes the quantum entanglement of a bipartite mixed state and plays a significant role in certifying entanglement of quantum states. We derive a Schmidt number criterion based on the trace norm of the correlation matrix obt
Externí odkaz:
http://arxiv.org/abs/2412.10074
Autor:
Wang, Mengmeng, Ma, Teli, Xin, Shuo, Hou, Xiaojun, Xing, Jiazheng, Dai, Guang, Wang, Jingdong, Liu, Yong
Visual Object Tracking (VOT) is an attractive and significant research area in computer vision, which aims to recognize and track specific targets in video sequences where the target objects are arbitrary and class-agnostic. The VOT technology could
Externí odkaz:
http://arxiv.org/abs/2412.09991
Autor:
Li, Rujiang, Kong, Xiangyu, Wang, Wencai, Wang, Yixi, Zhong, Yichen, Jia, Yongtao, Tao, Huibin, Liu, Ying
In nonlinear topological systems, edge solitons emerge either as bifurcations of linear topological edge states or as nonlinearity-induced localized states without topological protection. Although electrical circuits have proven to be a versatile pla
Externí odkaz:
http://arxiv.org/abs/2412.09932
Recently, Vision Large Language Models (VLLMs) integrated with vision encoders have shown promising performance in vision understanding. The key of VLLMs is to encode visual content into sequences of visual tokens, enabling VLLMs to simultaneously pr
Externí odkaz:
http://arxiv.org/abs/2412.09919