Zobrazeno 1 - 10
of 83
pro vyhledávání: '"Lyu, XinYu"'
Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-tri
Externí odkaz:
http://arxiv.org/abs/2411.07762
Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almos
Externí odkaz:
http://arxiv.org/abs/2405.15356
Autor:
Zhang, Haonan, Zeng, Pengpeng, Gao, Lianli, Song, Jingkuan, Duan, Yihang, Lyu, Xinyu, Shen, Hengtao
Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and levera
Externí odkaz:
http://arxiv.org/abs/2405.12710
We demonstrate that, even when there are moderate overlaps in the inputs of sloppy or accurate double-word addition algorithms in the QD library, these algorithms still guarantee error bounds of $O(u^2(|a|+|b|))$ in faithful rounding. Furthermore, th
Externí odkaz:
http://arxiv.org/abs/2404.05948
Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdl
Externí odkaz:
http://arxiv.org/abs/2312.17425
Existing Unbiased Scene Graph Generation (USGG) methods only focus on addressing the predicate-level imbalance that high-frequency classes dominate predictions of rare ones, while overlooking the concept-level imbalance. Actually, even if predicates
Externí odkaz:
http://arxiv.org/abs/2308.04802
Autor:
Gao, Lianli, Lyu, Xinyu, Guo, Yuyu, Hu, Yuxuan, Li, Yuan-Fang, Xu, Lu, Shen, Heng Tao, Song, Jingkuan
Scene graph generation aims to detect visual relationship triplets, (subject, predicate, object). Due to biases in data, current models tend to predict common predicates, e.g. "on" and "at", instead of informative ones, e.g. "standing on" and "lookin
Externí odkaz:
http://arxiv.org/abs/2308.05286
The task of dynamic scene graph generation (DynSGG) aims to generate scene graphs for given videos, which involves modeling the spatial-temporal information in the video. However, due to the long-tailed distribution of samples in the dataset, previou
Externí odkaz:
http://arxiv.org/abs/2308.05274
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class vari
Externí odkaz:
http://arxiv.org/abs/2303.07096
Autor:
Zheng, Chaofan, Gao, Lianli, Lyu, Xinyu, Zeng, Pengpeng, Saddik, Abdulmotaleb El, Shen, Heng Tao
The current studies of Scene Graph Generation (SGG) focus on solving the long-tailed problem for generating unbiased scene graphs. However, most de-biasing methods overemphasize the tail predicates and underestimate head ones throughout training, the
Externí odkaz:
http://arxiv.org/abs/2207.07913