Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Chen, Guikun"'
Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP and follow a standard zero-shot pipeline -- computing similarity between the query image and the text embeddings for each category (i.e., te
Externí odkaz:
http://arxiv.org/abs/2410.15364
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial advancements in artificial intelligence, significantly enhancing the capability to understand and generate multimodal content. While prior studies have largely co
Externí odkaz:
http://arxiv.org/abs/2409.18142
DETR introduces a simplified one-stage framework for scene graph generation (SGG). However, DETR-based SGG models face two challenges: i) Sparse supervision, as each image typically contains fewer than 10 relation annotations, while the models employ
Externí odkaz:
http://arxiv.org/abs/2409.10262
We investigate a fundamental aspect of machine vision: the measurement of features, by revisiting clustering, one of the most classic approaches in machine learning and data analysis. Existing visual feature extractors, including ConvNets, ViTs, and
Externí odkaz:
http://arxiv.org/abs/2403.17409
Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistak
Externí odkaz:
http://arxiv.org/abs/2401.08392
Autor:
Chen, Guikun, Wang, Wenguan
3D Gaussian splatting (GS) has recently emerged as a transformative technique in the realm of explicit radiance field and computer graphics. This innovative approach, characterized by the utilization of millions of learnable 3D Gaussians, represents
Externí odkaz:
http://arxiv.org/abs/2401.03890
Scene Graph Generation (SGG) aims to detect all the visual relation triplets in a given image. With the emergence of various advanced techniques for better utilizing both the intrinsic and extrinsic information in each relation tripl
Externí odkaz:
http://arxiv.org/abs/2308.06712
Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that identifies relatio
Externí odkaz:
http://arxiv.org/abs/2305.12476
Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Thus, it is difficult to apply them to real-world applications with a long-tailed distribution of predicates. In this paper, we fo
Externí odkaz:
http://arxiv.org/abs/2303.10863
Publikováno v:
In Journal of Materials Research and Technology July-August 2022 19:1934-1943