Zobrazeno 1 - 10
of 313
pro vyhledávání: '"Yang, LingFeng"'
As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights. This trend underscores the need for an algorithm to extract key video information
Externí odkaz:
http://arxiv.org/abs/2412.09513
Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder archi
Externí odkaz:
http://arxiv.org/abs/2409.17531
Prompt learning is an effective method to customize Vision-Language Models (VLMs) for various downstream tasks, involving tuning very few parameters of input prompt tokens. Recently, prompt pretraining in large-scale dataset (e.g., ImageNet-21K) has
Externí odkaz:
http://arxiv.org/abs/2409.06166
Autor:
Yang, Lingfeng, Zhang, Xinyu, Li, Xiang, Chen, Jinwen, Yao, Kun, Zhang, Gang, Ding, Errui, Liu, Lingqiao, Wang, Jingdong, Yang, Jian
Diffusion models have exhibited remarkable prowess in visual generalization. Building on this success, we introduce an instruction-based object addition pipeline, named Add-SD, which automatically inserts objects into realistic scenes with rational s
Externí odkaz:
http://arxiv.org/abs/2407.21016
Vision-Language Models (VLMs), such as CLIP, have demonstrated impressive zero-shot transfer capabilities in image-level visual perception. However, these models have shown limited performance in instance-level tasks that demand precise localization
Externí odkaz:
http://arxiv.org/abs/2306.04356
New knowledge originates from the old. The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models. In this survey, we comprehensively review and summarize the topic--``Historical
Externí odkaz:
http://arxiv.org/abs/2303.12992
Autor:
Li, Zheng, Li, Xiang, Yang, Lingfeng, Zhao, Borui, Song, Renjie, Luo, Lei, Li, Jun, Yang, Jian
Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two di
Externí odkaz:
http://arxiv.org/abs/2211.16231
Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy. Notably, the success of the
Externí odkaz:
http://arxiv.org/abs/2205.10063
Publikováno v:
In Food Research International December 2024 197 Part 1
Publikováno v:
In Optical Materials November 2024 157 Part 1