Zobrazeno 1 - 10
of 23 536
pro vyhledávání: '"An, Dongxu"'
Large multimodal models (LMMs) with advanced video analysis capabilities have recently garnered significant attention. However, most evaluations rely on traditional methods like multiple-choice questions in benchmarks such as VideoMME and LongVideoBe
Externí odkaz:
http://arxiv.org/abs/2411.13281
Reinforcement Learning from Human Feedback (RLHF) has been proven to be an effective method for preference alignment of large language models (LLMs) and is widely used in the post-training process of LLMs. However, RLHF struggles with handling multip
Externí odkaz:
http://arxiv.org/abs/2411.01245
Autor:
Li, Dongxu, Liu, Yudong, Wu, Haoning, Wang, Yue, Shen, Zhiqi, Qu, Bowen, Niu, Xinyao, Wang, Guoyin, Chen, Bei, Li, Junnan
Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles
Externí odkaz:
http://arxiv.org/abs/2410.05993
The coronal magnetic topology significantly affects the outcome of magnetic flux rope (MFR) eruptions. The recently reported nested double null magnetic system remains unclear as to how it affects MFR eruptions. Using observations from the New Vacuum
Externí odkaz:
http://arxiv.org/abs/2410.03100
This paper studies zero-shot object recognition using event camera data. Guided by CLIP, which is pre-trained on RGB images, existing approaches achieve zero-shot object recognition by optimizing embedding similarities between event data and RGB imag
Externí odkaz:
http://arxiv.org/abs/2407.21616
Large multimodal models (LMMs) are processing increasingly longer and richer inputs. Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark
Externí odkaz:
http://arxiv.org/abs/2407.15754
Detecting hallucinations in large language model (LLM) outputs is pivotal, yet traditional fine-tuning for this classification task is impeded by the expensive and quickly outdated annotation process, especially across numerous vertical domains and i
Externí odkaz:
http://arxiv.org/abs/2407.05474
Autor:
Zhao, Zhonghan, Chai, Wenhao, Wang, Xuan, Ma, Ke, Chen, Kewei, Guo, Dongxu, Ye, Tian, Zhang, Yanting, Wang, Hongwei, Wang, Gaoang
Building an embodied agent system with a large language model (LLM) as its core is a promising direction. Due to the significant costs and uncontrollable factors associated with deploying and training such agents in the real world, we have decided to
Externí odkaz:
http://arxiv.org/abs/2406.11247
Semantic segmentation, as a basic tool for intelligent interpretation of remote sensing images, plays a vital role in many Earth Observation (EO) applications. Nowadays, accurate semantic segmentation of remote sensing images remains a challenge due
Externí odkaz:
http://arxiv.org/abs/2406.10828
Fracture resistance of blood clots plays a crucial role in physiological hemostasis and pathological thromboembolism. Although recent experimental and computational studies uncovered the poro-viscoelastic property of blood clots and its connection to
Externí odkaz:
http://arxiv.org/abs/2406.15432