Zobrazeno 1 - 10
of 40 460
pro vyhledávání: '"Li, Qing‐An"'
Graphs are essential data structures for modeling complex interactions in domains such as social networks, molecular structures, and biological systems. Graph-level tasks, which predict properties or classes for the entire graph, are critical for app
Externí odkaz:
http://arxiv.org/abs/2501.00773
This paper investigates the problem of understanding dynamic 3D scenes from egocentric observations, a key challenge in robotics and embodied AI. Unlike prior studies that explored this as long-form video understanding and utilized egocentric video o
Externí odkaz:
http://arxiv.org/abs/2501.00358
Autor:
Li, Haoyang, Li, Yiming, Tian, Anxin, Tang, Tianhao, Xu, Zhanchao, Chen, Xuejia, Hu, Nicole, Dong, Wei, Li, Qing, Chen, Lei
Large Language Models (LLMs) have revolutionized a wide range of domains such as natural language processing, computer vision, and multi-modal tasks due to their ability to comprehend context and perform logical reasoning. However, the computational
Externí odkaz:
http://arxiv.org/abs/2412.19442
Autor:
Gao, Zhi, Zhang, Bofei, Li, Pengxiang, Ma, Xiaojian, Yuan, Tao, Fan, Yue, Wu, Yuwei, Jia, Yunde, Zhu, Song-Chun, Li, Qing
The advancement of large language models (LLMs) prompts the development of multi-modal agents, which are used as a controller to call external tools, providing a feasible way to solve practical tasks. In this paper, we propose a multi-modal agent tun
Externí odkaz:
http://arxiv.org/abs/2412.15606
With the prevalence of social networks on online platforms, social recommendation has become a vital technique for enhancing personalized recommendations. The effectiveness of social recommendations largely relies on the social homophily assumption,
Externí odkaz:
http://arxiv.org/abs/2412.15579
In this paper, we propose Text-based Open Molecule Generation Benchmark (TOMG-Bench), the first benchmark to evaluate the open-domain molecule generation capability of LLMs. TOMG-Bench encompasses a dataset of three major tasks: molecule editing (Mol
Externí odkaz:
http://arxiv.org/abs/2412.14642
Video Corpus Visual Answer Localization (VCVAL) includes question-related video retrieval and visual answer localization in the videos. Specifically, we use text-to-text retrieval to find relevant videos for a medical question based on the similarity
Externí odkaz:
http://arxiv.org/abs/2412.15514
In this paper, we present our methods and results for the Video-To-Text (VTT) task at TRECVid 2024, exploring the capabilities of Vision-Language Models (VLMs) like LLaVA and LLaVA-NeXT-Video in generating natural language descriptions for video cont
Externí odkaz:
http://arxiv.org/abs/2412.15509
This year, we explore generation-augmented retrieval for the TRECVid AVS task. Specifically, the understanding of textual query is enhanced by three generations, including Text2Text, Text2Image, and Image2Text, to address the out-of-vocabulary proble
Externí odkaz:
http://arxiv.org/abs/2412.15494
Autor:
Shao, Shi-Yao, Li, Qing, Zhang, Li-Hua, Liu, Bang, Zhang, Zheng-Yuan, Wang, Qi-Feng, Zhang, Jun, Ma, Yu, Han, Tian-Yu, Chen, Han-Chao, Nan, Jia-Dou, Yin, Yi-Ming, Zhu, Dong-Yang, Wang, Ya-Jun, Ding, Dong-Sheng, Shi, Bao-Sen
We describe a three-dimensional (3D) magneto-optical trap (MOT) capable of simultaneously capturing 85Rb and 133Cs atoms. Unlike conventional setups, our system utilizes two separate laser systems that are combined before entering the vacuum chamber,
Externí odkaz:
http://arxiv.org/abs/2412.11411