Zobrazeno 1 - 10
of 6 209
pro vyhledávání: '"QIAO, YU"'
Autor:
Li, Xinhao, Wang, Yi, Yu, Jiashuo, Zeng, Xiangyu, Zhu, Yuhan, Huang, Haian, Gao, Jianfei, Li, Kunchang, He, Yinan, Wang, Chenting, Qiao, Yu, Wang, Yali, Wang, Limin
Long-context modeling is a critical capability for multimodal large language models (MLLMs), enabling them to process long-form contents with implicit memorization. Despite its advances, handling extremely long videos remains challenging due to the d
Externí odkaz:
http://arxiv.org/abs/2501.00574
Autor:
Huang, Yifei, Xu, Jilan, Pei, Baoqi, He, Yuping, Chen, Guo, Yang, Lijin, Chen, Xinyuan, Wang, Yaohui, Nie, Zheng, Liu, Jinyao, Fan, Guoshun, Lin, Dechen, Fang, Fang, Li, Kunpeng, Yuan, Chang, Wang, Yali, Qiao, Yu, Wang, Limin
We introduce Vinci, a real-time embodied smart assistant built upon an egocentric vision-language model. Designed for deployment on portable devices such as smartphones and wearable cameras, Vinci operates in an "always on" mode, continuously observi
Externí odkaz:
http://arxiv.org/abs/2412.21080
Autor:
Sun, Qiushi, Cheng, Kanzhi, Ding, Zichen, Jin, Chuanyang, Wang, Yian, Xu, Fangzhi, Wu, Zhenyu, Jia, Chengyou, Chen, Liheng, Liu, Zhoumianze, Kao, Ben, Li, Guohao, He, Junxian, Qiao, Yu, Wu, Zhiyong
Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality tr
Externí odkaz:
http://arxiv.org/abs/2412.19723
Federated learning (FL) is a distributed training technology that enhances data privacy in mobile edge networks by allowing data owners to collaborate without transmitting raw data to the edge server. However, data heterogeneity and adversarial attac
Externí odkaz:
http://arxiv.org/abs/2412.19354
Autor:
Yan, Ziang, Li, Zhilin, He, Yinan, Wang, Chenting, Li, Kunchang, Li, Xinhao, Zeng, Xiangyu, Wang, Zilei, Wang, Yali, Qiao, Yu, Wang, Limin, Wang, Yi
Current multimodal large language models (MLLMs) struggle with fine-grained or precise understanding of visuals though they give comprehensive perception and reasoning in a spectrum of vision applications. Recent studies either develop tool-using or
Externí odkaz:
http://arxiv.org/abs/2412.19326
We study Schr\"odinger operators $H:= -\Delta + V$ with potentials $V$ that have power-law growth (not necessarily polynomial) at 0 and at $\infty$ using methods of Lie theory (Lie-Rinehart algebras) and microlocal analysis. More precisely, we show t
Externí odkaz:
http://arxiv.org/abs/2412.19290
Autor:
Tao, Chenxin, Su, Shiqian, Zhu, Xizhou, Zhang, Chenyu, Chen, Zhe, Liu, Jiawen, Wang, Wenhai, Lu, Lewei, Huang, Gao, Qiao, Yu, Dai, Jifeng
The rapid advance of Large Language Models (LLMs) has catalyzed the development of Vision-Language Models (VLMs). Monolithic VLMs, which avoid modality-specific encoders, offer a promising alternative to the compositional ones but face the challenge
Externí odkaz:
http://arxiv.org/abs/2412.16158
Autor:
Zhang, Pan, Dong, Xiaoyi, Cao, Yuhang, Zang, Yuhang, Qian, Rui, Wei, Xilin, Chen, Lin, Li, Yifei, Niu, Junbo, Ding, Shuangrui, Guo, Qipeng, Duan, Haodong, Chen, Xin, Lv, Han, Nie, Zheng, Zhang, Min, Wang, Bin, Zhang, Wenwei, Zhang, Xinyue, Ge, Jiaye, Li, Wei, Li, Jingwen, Tu, Zhongying, He, Conghui, Zhang, Xingcheng, Chen, Kai, Qiao, Yu, Lin, Dahua, Wang, Jiaqi
Creating AI systems that can interact with environments over long periods, similar to human cognition, has been a longstanding research goal. Recent advancements in multimodal large language models (MLLMs) have made significant strides in open-world
Externí odkaz:
http://arxiv.org/abs/2412.09596
Consider a Lie subalgebra $\mathfrak{l} \subset \mathfrak{g}$ and an $\mathfrak{l}$-invariant open submanifold $V \subset \mathfrak{l}^{\ast}$. We demonstrate that any smooth dynamical twist on $V$, valued in $U(\mathfrak{g}) \otimes U(\mathfrak{g})\
Externí odkaz:
http://arxiv.org/abs/2412.09039
Autor:
Wang, Zun, Li, Jialu, Hong, Yicong, Li, Songze, Li, Kunchang, Yu, Shoubin, Wang, Yi, Qiao, Yu, Wang, Yali, Bansal, Mohit, Wang, Limin
Creating high-quality data for training robust language-instructed agents is a long-lasting challenge in embodied AI. In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instructio
Externí odkaz:
http://arxiv.org/abs/2412.08467