Zobrazeno 1 - 10
of 498
pro vyhledávání: '"Tang, HaoRan"'
Autor:
Cao, Meng, Tang, Haoran, Zhao, Haoze, Guo, Hangyu, Liu, Jiaheng, Zhang, Ge, Liu, Ruyang, Sun, Qiang, Reid, Ian, Liang, Xiaodan
Recent advancements in video-based large language models (Video LLMs) have witnessed the emergence of diverse capabilities to reason and interpret dynamic visual content. Among them, gameplay videos stand out as a distinctive data source, often conta
Externí odkaz:
http://arxiv.org/abs/2412.01800
The past year has witnessed the significant advancement of video-based large language models. However, the challenge of developing a unified model for both short and long video understanding remains unresolved. Most existing video LLMs cannot handle
Externí odkaz:
http://arxiv.org/abs/2411.02327
Multiple-input multiple-output (MIMO) is pivotal for wireless systems, yet its high-dimensional, stochastic channel poses significant challenges for accurate estimation, highlighting the critical need for robust estimation techniques. In this paper,
Externí odkaz:
http://arxiv.org/abs/2410.23752
Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inheren
Externí odkaz:
http://arxiv.org/abs/2408.10575
We measure the performance of in-context learning as a function of task novelty and difficulty for open and closed questions. For that purpose, we created a novel benchmark consisting of hard scientific questions, each paired with a context of variou
Externí odkaz:
http://arxiv.org/abs/2407.02028
Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic
Externí odkaz:
http://arxiv.org/abs/2406.13201
Autor:
Cao, Meng, Tang, Haoran, Huang, Jinfa, Jin, Peng, Zhang, Can, Liu, Ruyang, Chen, Long, Liang, Xiaodan, Yuan, Li, Li, Ge
Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning based on large-scale pre-trained visionlanguage models (e.g., CLIP). Howe
Externí odkaz:
http://arxiv.org/abs/2405.19465
Large language models (LLMs) can elicit social bias during generations, especially when inference with toxic prompts. Controlling the sensitive attributes in generation encounters challenges in data distribution, generalizability, and efficiency. Spe
Externí odkaz:
http://arxiv.org/abs/2405.19299
Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation, prompting research efforts towards video LLMs to facilitate human-AI interaction at the video level. However, how to effectively encode and unde
Externí odkaz:
http://arxiv.org/abs/2404.00308
Publikováno v:
CHI'2024
Videos are prominent learning materials to prepare surgical trainees before they enter the operating room (OR). In this work, we explore techniques to enrich the video-based surgery learning experience. We propose Surgment, a system that helps expert
Externí odkaz:
http://arxiv.org/abs/2402.17903