Výsledky vyhledávání

Report

LLaCA: Multimodal Large Language Continual Assistant

Autor: Qiao, Jingyang, Zhang, Zhizhong, Tan, Xin, Qu, Yanyun, Ding, Shouhong, Xie, Yuan

Instruction tuning guides the Multimodal Large Language Models (MLLMs) in aligning different modalities by designing text instructions, which seems to be an essential technique to enhance the capabilities and controllability of foundation models. In

Externí odkaz: http://arxiv.org/abs/2410.10868

Zobrazit plný text záznamu

Report

DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input

Autor: Tian, Qijian, Tan, Xin, Xie, Yuan, Ma, Lizhuang

We propose DrivingForward, a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input. Driving scene images from vehicle-mounted cameras are typically sparse, with limited overlap, and the movement of t

Externí odkaz: http://arxiv.org/abs/2409.12753

Zobrazit plný text záznamu

Report

Self-Supervised State Space Model for Real-Time Traffic Accident Prediction Using eKAN Networks

Autor: Tan, Xin, Zhao, Meng

Accurate prediction of traffic accidents across different times and regions is vital for public safety. However, existing methods face two key challenges: 1) Generalization: Current models rely heavily on manually constructed multi-view structures, l

Externí odkaz: http://arxiv.org/abs/2409.05933

Zobrazit plný text záznamu

Report

LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

Autor: Jin, Yizhang, Li, Jian, Zhang, Jiangning, Hu, Jianlong, Gan, Zhenye, Tan, Xin, Liu, Yong, Wang, Yabiao, Wang, Chengjie, Ma, Lizhuang

Visual Spatial Description (VSD) aims to generate texts that describe the spatial relationships between objects within images. Traditional visual spatial relationship classification (VSRC) methods typically output the spatial relationship between two

Externí odkaz: http://arxiv.org/abs/2408.04957

Zobrazit plný text záznamu

Report

Harmonizing Visual Text Comprehension and Generation

Autor: Zhao, Zhen, Tang, Jingqun, Wu, Binghong, Lin, Chunhui, Wei, Shu, Liu, Hao, Tan, Xin, Zhang, Zhizhong, Huang, Can, Xie, Yuan

In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inher

Externí odkaz: http://arxiv.org/abs/2407.16364

Zobrazit plný text záznamu

Report

Mutual Information Guided Optimal Transport for Unsupervised Visible-Infrared Person Re-identification

Autor: Zhang, Zhizhong, Wang, Jiangming, Tan, Xin, Qu, Yanyun, Wang, Junping, Xie, Yong, Xie, Yuan

Unsupervised visible infrared person re-identification (USVI-ReID) is a challenging retrieval task that aims to retrieve cross-modality pedestrian images without using any label information. In this task, the large cross-modality variance makes it di

Externí odkaz: http://arxiv.org/abs/2407.12758

Zobrazit plný text záznamu

Report

Uncovering Weaknesses in Neural Code Generation

Autor: Lian, Xiaoli, Wang, Shuaisong, Ma, Jieping, Liu, Fang, Tan, Xin, Zhang, Li, Shi, Lin, Gao, Cuiyun

Code generation, the task of producing source code from prompts, has seen significant advancements with the advent of pre-trained large language models (PLMs). Despite these achievements, there lacks a comprehensive taxonomy of weaknesses about the b

Externí odkaz: http://arxiv.org/abs/2407.09793

Zobrazit plný text záznamu

Report

Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining

Autor: Sun, Tianfang, Zhang, Zhizhong, Tan, Xin, Qu, Yanyun, Xie, Yuan

LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications. However, two issues widely exist in this framework: 1) Solely keyframes are used for training. For example, in nuScenes, a subs

Externí odkaz: http://arxiv.org/abs/2407.07465

Zobrazit plný text záznamu

Report

Teola: Towards End-to-End Optimization of LLM-based Applications

Autor: Tan, Xin, Jiang, Yimin, Yang, Yitao, Xu, Hong

Large language model (LLM)-based applications consist of both LLM and non-LLM components, each contributing to the end-to-end latency. Despite great efforts to optimize LLM inference, end-to-end workflow optimization has been overlooked. Existing fra

Externí odkaz: http://arxiv.org/abs/2407.00326

Zobrazit plný text záznamu

Report

PIG: Prompt Images Guidance for Night-Time Scene Parsing

Autor: Xie, Zhifeng, Qiu, Rui, Wang, Sen, Tan, Xin, Xie, Yuan, Ma, Lizhuang

Night-time scene parsing aims to extract pixel-level semantic information in night images, aiding downstream tasks in understanding scene object distribution. Due to limited labeled night image datasets, unsupervised domain adaptation (UDA) has becom

Externí odkaz: http://arxiv.org/abs/2406.10531

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání