Zobrazeno 1 - 10
of 933
pro vyhledávání: '"Xu Jiarui"'
How can we predict future interaction trajectories of human hands in a scene given high-level colloquial task specifications in the form of natural language? In this paper, we extend the classic hand trajectory prediction task to two tasks involving
Externí odkaz:
http://arxiv.org/abs/2412.13187
Autor:
Sun, Yu, Li, Xinhao, Dalal, Karan, Xu, Jiarui, Vikram, Arjun, Zhang, Genghan, Dubois, Yann, Chen, Xinlei, Wang, Xiaolong, Koyejo, Sanmi, Hashimoto, Tatsunori, Guestrin, Carlos
Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state. We propose a new class of sequence
Externí odkaz:
http://arxiv.org/abs/2407.04620
Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the O
Externí odkaz:
http://arxiv.org/abs/2406.16868
Integration of artificial intelligence (AI) and machine learning (ML) into the air interface has been envisioned as a key technology for next-generation (NextG) cellular networks. At the air interface, multiple-input multiple-output (MIMO) and its va
Externí odkaz:
http://arxiv.org/abs/2403.02651
Autor:
Xu, Jiarui, Zhou, Xingyi, Yan, Shen, Gu, Xiuye, Arnab, Anurag, Sun, Chen, Wang, Xiaolong, Schmid, Cordelia
Large language models have achieved great success in recent years, so as their variants in vision. Existing vision-language models can describe images in natural languages, answer visual-related questions, or perform complex reasoning about the image
Externí odkaz:
http://arxiv.org/abs/2312.09237
Autor:
Xu, Jiarui, Gandelsman, Yossi, Bar, Amir, Yang, Jianwei, Gao, Jianfeng, Darrell, Trevor, Wang, Xiaolong
In-context learning allows adapting a model to new tasks given a task description at test time. In this paper, we present IMProv - a generative model that is able to in-context learn visual tasks from multimodal prompts. Given a textual description o
Externí odkaz:
http://arxiv.org/abs/2312.01771
Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the O
Externí odkaz:
http://arxiv.org/abs/2311.08543
In this paper we introduce StructNet-CE, a novel real-time online learning framework for MIMO-OFDM channel estimation, which only utilizes over-the-air (OTA) pilot symbols for online training and converges within one OFDM subframe. The design of Stru
Externí odkaz:
http://arxiv.org/abs/2305.13487
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable abi
Externí odkaz:
http://arxiv.org/abs/2303.04803
We present the Group Propagation Vision Transformer (GPViT): a novel nonhierarchical (i.e. non-pyramidal) transformer model designed for general visual recognition with high-resolution features. High-resolution features (or tokens) are a natural fit
Externí odkaz:
http://arxiv.org/abs/2212.06795