Zobrazeno 1 - 10
of 1 511
pro vyhledávání: '"Li, DongXu"'
Autor:
Li, Dongxu, Liu, Yudong, Wu, Haoning, Wang, Yue, Shen, Zhiqi, Qu, Bowen, Niu, Xinyao, Wang, Guoyin, Chen, Bei, Li, Junnan
Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles
Externí odkaz:
http://arxiv.org/abs/2410.05993
This paper studies zero-shot object recognition using event camera data. Guided by CLIP, which is pre-trained on RGB images, existing approaches achieve zero-shot object recognition by maximizing embedding similarities between event data encoded by a
Externí odkaz:
http://arxiv.org/abs/2407.21616
Large multimodal models (LMMs) are processing increasingly longer and richer inputs. Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark
Externí odkaz:
http://arxiv.org/abs/2407.15754
Semantic segmentation, as a basic tool for intelligent interpretation of remote sensing images, plays a vital role in many Earth Observation (EO) applications. Nowadays, accurate semantic segmentation of remote sensing images remains a challenge due
Externí odkaz:
http://arxiv.org/abs/2406.10828
Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interferenc
Externí odkaz:
http://arxiv.org/abs/2403.16699
This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based informat
Externí odkaz:
http://arxiv.org/abs/2403.16694
This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission r
Externí odkaz:
http://arxiv.org/abs/2403.16676
Most existing video diffusion models (VDMs) are limited to mere text conditions. Thereby, they are usually lacking in control over visual appearance and geometry structure of the generated videos. This work presents Moonshot, a new video generation m
Externí odkaz:
http://arxiv.org/abs/2401.01827
This paper studies the fundamental limit of semantic communications over the discrete memoryless channel. We consider the scenario to send a semantic source consisting of an observation state and its corresponding semantic state, both of which are re
Externí odkaz:
http://arxiv.org/abs/2401.01176
Autor:
Panagopoulou, Artemis, Xue, Le, Yu, Ning, Li, Junnan, Li, Dongxu, Joty, Shafiq, Xu, Ran, Savarese, Silvio, Xiong, Caiming, Niebles, Juan Carlos
Recent research has achieved significant advancements in visual reasoning tasks through learning image-to-language projections and leveraging the impressive reasoning abilities of Large Language Models (LLMs). This paper introduces an efficient and e
Externí odkaz:
http://arxiv.org/abs/2311.18799