Zobrazeno 1 - 10
of 27
pro vyhledávání: '"Li, Ruihuang"'
Recent vision-language pre-training models have exhibited remarkable generalization ability in zero-shot recognition tasks. Previous open-vocabulary 3D scene understanding methods mostly focus on training 3D models using either image or text supervis
Externí odkaz:
http://arxiv.org/abs/2407.09781
Text-based 2D diffusion models have demonstrated impressive capabilities in image generation and editing. Meanwhile, the 2D diffusion models also exhibit substantial potentials for 3D editing tasks. However, how to achieve consistent edits across mul
Externí odkaz:
http://arxiv.org/abs/2406.17396
Text-driven diffusion models have significantly advanced the image editing performance by using text prompts as inputs. One crucial step in text-driven image editing is to invert the original image into a latent noise code conditioned on the source p
Externí odkaz:
http://arxiv.org/abs/2403.11105
Window-based transformers excel in large-scale point cloud understanding by capturing context-aware representations with affordable attention computation in a more localized manner. However, the sparse nature of point clouds leads to a significant va
Externí odkaz:
http://arxiv.org/abs/2401.00912
Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging. Though much progress has been a
Externí odkaz:
http://arxiv.org/abs/2312.09909
One-to-one (o2o) label assignment plays a key role for transformer based end-to-end detection, and it has been recently introduced in fully convolutional detectors for end-to-end dense detection. However, o2o can degrade the feature learning efficien
Externí odkaz:
http://arxiv.org/abs/2303.11567
Weakly supervised instance segmentation using only bounding box annotations has recently attracted much research attention. Most of the current efforts leverage low-level image features as extra supervision without explicitly exploiting the high-leve
Externí odkaz:
http://arxiv.org/abs/2303.08578
Point cloud sequences are commonly used to accurately detect 3D objects in applications such as autonomous driving. Current top-performing multi-frame detectors mostly follow a Detect-and-Fuse framework, which extracts features from each frame of the
Externí odkaz:
http://arxiv.org/abs/2303.08316
The representative instance segmentation methods mostly segment different object instances with a mask of the fixed resolution, e.g., 28*28 grid. However, a low-resolution mask loses rich details, while a high-resolution mask incurs quadratic computa
Externí odkaz:
http://arxiv.org/abs/2303.07868
It is well-known that the performance of well-trained deep neural networks may degrade significantly when they are applied to data with even slightly shifted distributions. Recent studies have shown that introducing certain perturbation on feature st
Externí odkaz:
http://arxiv.org/abs/2301.12643