Výsledky vyhledávání

Report

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Autor: Zohar, Orr, Wang, Xiaohan, Dubois, Yann, Mehta, Nikhil, Xiao, Tong, Hansen-Estruch, Philippe, Yu, Licheng, Wang, Xiaofang, Juefei-Xu, Felix, Zhang, Ning, Yeung-Levy, Serena, Xia, Xide

Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made with

Externí odkaz: http://arxiv.org/abs/2412.10360

Zobrazit plný text záznamu

Report

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-Language Models that significantly improves upon its predecessor, DeepSeek-VL, through two key major upgrades. For the vision component, we incorporate a dynamic til

Externí odkaz: http://arxiv.org/abs/2412.10302

Zobrazit plný text záznamu

Report

MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization

Autor: Li, Shuaiting, Wang, Chengxuan, Deng, Juncan, Wang, Zeyu, Ye, Zewen, Wang, Zongsheng, Shen, Haibin, Huang, Kejie

Vector quantization(VQ) is a hardware-friendly DNN compression method that can reduce the storage cost and weight-loading datawidth of hardware accelerators. However, conventional VQ techniques lead to significant accuracy loss because the important

Externí odkaz: http://arxiv.org/abs/2412.10261

Zobrazit plný text záznamu

Report

From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection

Autor: Wang, Haowei, Zhang, Rupeng, Wang, Junjie, Li, Mingyang, Huang, Yuekai, Wang, Dandan, Wang, Qing

Tool-calling has changed Large Language Model (LLM) applications by integrating external tools, significantly enhancing their functionality across diverse tasks. However, this integration also introduces new security vulnerabilities, particularly in

Externí odkaz: http://arxiv.org/abs/2412.10198

Zobrazit plný text záznamu

Report

EVOS: Efficient Implicit Neural Training via EVOlutionary Selector

Autor: Zhang, Weixiang, Xie, Shuzhao, Ren, Chengwei, Xie, Siyi, Tang, Chen, Ge, Shijia, Wang, Mingzi, Wang, Zhi

We propose EVOlutionary Selector (EVOS), an efficient training paradigm for accelerating Implicit Neural Representation (INR). Unlike conventional INR training that feeds all samples through the neural network in each iteration, our approach restrict

Externí odkaz: http://arxiv.org/abs/2412.10153

Zobrazit plný text záznamu

Report

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

Autor: Du, Zhihao, Wang, Yuxuan, Chen, Qian, Shi, Xian, Lv, Xiang, Zhao, Tianyu, Gao, Zhifu, Yang, Yexin, Gao, Changfeng, Wang, Hui, Yu, Fan, Liu, Huadai, Sheng, Zhengyan, Gu, Yue, Deng, Chong, Wang, Wen, Zhang, Shiliang, Yan, Zhijie, Zhou, Jingren

In our previous work, we introduced CosyVoice, a multilingual speech synthesis model based on supervised discrete speech tokens. By employing progressive semantic decoding with two popular generative models, language models (LMs) and Flow Matching, C

Externí odkaz: http://arxiv.org/abs/2412.10117

Zobrazit plný text záznamu

Report

Schmidt number criterion via general symmetric informationally complete measurements

Autor: Wang, Zhen, Sun, Bao-Zhi, Fei, Shao-Ming, Wang, Zhi-Xi

Publikováno v: Quantum Inf. Process.23(2024),401

The Schmidt number characterizes the quantum entanglement of a bipartite mixed state and plays a significant role in certifying entanglement of quantum states. We derive a Schmidt number criterion based on the trace norm of the correlation matrix obt

Externí odkaz: http://arxiv.org/abs/2412.10074

Zobrazit plný text záznamu

Report

Visual Object Tracking across Diverse Data Modalities: A Review

Autor: Wang, Mengmeng, Ma, Teli, Xin, Shuo, Hou, Xiaojun, Xing, Jiazheng, Dai, Guang, Wang, Jingdong, Liu, Yong

Visual Object Tracking (VOT) is an attractive and significant research area in computer vision, which aims to recognize and track specific targets in video sequences where the target objects are arbitrary and class-agnostic. The VOT technology could

Externí odkaz: http://arxiv.org/abs/2412.09991

Zobrazit plný text záznamu

Report

Observation of edge solitons and their phase transition in a trimer circuit lattice

Autor: Li, Rujiang, Kong, Xiangyu, Wang, Wencai, Wang, Yixi, Zhong, Yichen, Jia, Yongtao, Tao, Huibin, Liu, Ying

In nonlinear topological systems, edge solitons emerge either as bifurcations of linear topological edge states or as nonlinearity-induced localized states without topological protection. Although electrical circuits have proven to be a versatile pla

Externí odkaz: http://arxiv.org/abs/2412.09932

Zobrazit plný text záznamu

Report

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

Autor: Lu, Zhuqiang, Yin, Zhenfei, He, Mengwei, Wang, Zhihui, Liu, Zicheng, Wang, Zhiyong, Hu, Kun

Recently, Vision Large Language Models (VLLMs) integrated with vision encoders have shown promising performance in vision understanding. The key of VLLMs is to encode visual content into sequences of visual tokens, enabling VLLMs to simultaneously pr

Externí odkaz: http://arxiv.org/abs/2412.09919

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání