Výsledky vyhledávání

Report

Aria: An Open Multimodal Native Mixture-of-Experts Model

Autor: Li, Dongxu, Liu, Yudong, Wu, Haoning, Wang, Yue, Shen, Zhiqi, Qu, Bowen, Niu, Xinyao, Wang, Guoyin, Chen, Bei, Li, Junnan

Information comes in diverse modalities. Multimodal native AI models are essential to integrate real-world information and deliver comprehensive understanding. While proprietary multimodal native models exist, their lack of openness imposes obstacles

Externí odkaz: http://arxiv.org/abs/2410.05993

Zobrazit plný text záznamu

Report

EZSR: Event-based Zero-Shot Recognition

Autor: Yang, Yan, Pan, Liyuan, Li, Dongxu, Liu, Liu

This paper studies zero-shot object recognition using event camera data. Guided by CLIP, which is pre-trained on RGB images, existing approaches achieve zero-shot object recognition by maximizing embedding similarities between event data encoded by a

Externí odkaz: http://arxiv.org/abs/2407.21616

Zobrazit plný text záznamu

Report

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

Autor: Wu, Haoning, Li, Dongxu, Chen, Bei, Li, Junnan

Large multimodal models (LMMs) are processing increasingly longer and richer inputs. Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark

Externí odkaz: http://arxiv.org/abs/2407.15754

Zobrazit plný text záznamu

Report

PyramidMamba: Rethinking Pyramid Feature Fusion with Selective Space State Model for Semantic Segmentation of Remote Sensing Imagery

Autor: Wang, Libo, Li, Dongxu, Dong, Sijun, Meng, Xiaoliang, Zhang, Xiaokang, Hong, Danfeng

Semantic segmentation, as a basic tool for intelligent interpretation of remote sensing images, plays a vital role in many Earth Observation (EO) applications. Nowadays, accurate semantic segmentation of remote sensing images remains a challenge due

Externí odkaz: http://arxiv.org/abs/2406.10828

Zobrazit plný text záznamu

Report

Resonant Beam Communications: A New Design Paradigm and Challenges

Autor: Tian, Yuanming, Li, Dongxu, Huang, Chuan, Liu, Qingwen, Zhou, Shengli

Resonant beam communications (RBCom), which adopt oscillating photons between two separate retroreflectors for information transmission, exhibit potential advantages over other types of wireless optical communications (WOC). However, echo interferenc

Externí odkaz: http://arxiv.org/abs/2403.16699

Zobrazit plný text záznamu

Report

Design and Performance of Resonant Beam Communications -- Part II: Mobile Scenario

Autor: Li, Dongxu, Tian, Yuanming, Huang, Chuan, Liu, Qingwen, Zhou, Shengli

This two-part paper focuses on the system design and performance analysis for a point-to-point resonant beam communication (RBCom) system under both the quasi-static and mobile scenarios. Part I of this paper proposes a synchronization-based informat

Externí odkaz: http://arxiv.org/abs/2403.16694

Zobrazit plný text záznamu

Report

Design and Performance of Resonant Beam Communications -- Part I: Quasi-Static Scenario

Autor: Li, Dongxu, Tian, Yuanming, Huang, Chuan, Liu, Qingwen, Zhou, Shengli

This two-part paper studies a point-to-point resonant beam communication (RBCom) system, where two separately deployed retroreflectors are adopted to generate the resonant beam between the transmitter and the receiver, and analyzes the transmission r

Externí odkaz: http://arxiv.org/abs/2403.16676

Zobrazit plný text záznamu

Report

Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions

Autor: Zhang, David Junhao, Li, Dongxu, Le, Hung, Shou, Mike Zheng, Xiong, Caiming, Sahoo, Doyen

Most existing video diffusion models (VDMs) are limited to mere text conditions. Thereby, they are usually lacking in control over visual appearance and geometry structure of the generated videos. This work presents Moonshot, a new video generation m

Externí odkaz: http://arxiv.org/abs/2401.01827

Zobrazit plný text záznamu

Report

Fundamental Limitation of Semantic Communications: Neural Estimation for Rate-Distortion

Autor: Li, Dongxu, Huang, Jianhao, Huang, Chuan, Qin, Xiaoqi, Zhang, Han, Zhang, Ping

This paper studies the fundamental limit of semantic communications over the discrete memoryless channel. We consider the scenario to send a semantic source consisting of an observation state and its corresponding semantic state, both of which are re

Externí odkaz: http://arxiv.org/abs/2401.01176

Zobrazit plný text záznamu

Report

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Autor: Panagopoulou, Artemis, Xue, Le, Yu, Ning, Li, Junnan, Li, Dongxu, Joty, Shafiq, Xu, Ran, Savarese, Silvio, Xiong, Caiming, Niebles, Juan Carlos

Recent research has achieved significant advancements in visual reasoning tasks through learning image-to-language projections and leveraging the impressive reasoning abilities of Large Language Models (LLMs). This paper introduces an efficient and e

Externí odkaz: http://arxiv.org/abs/2311.18799

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání