Zobrazeno 1 - 10
of 599
pro vyhledávání: '"Zhuang, Yueting"'
Autor:
Miao, Bingchen, Zhang, Wenqiao, Li, Juncheng, Tang, Siliang, Li, Zhaocheng, Shi, Haochen, Xiao, Jun, Zhuang, Yueting
Multimodal Industrial Anomaly Detection (MIAD), utilizing 3D point clouds and 2D RGB images to identify the abnormal region of products, plays a crucial role in industrial quality inspection. However, the conventional MIAD setting presupposes that al
Externí odkaz:
http://arxiv.org/abs/2410.01737
Autor:
Huang, Hongzhe, Yu, Zhewen, Liu, Jiang, Cai, Li, Jiao, Dian, Zhang, Wenqiao, Tang, Siliang, Li, Juncheng, Jiang, Hao, Li, Haoyuan, Zhuang, Yueting
Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning. Such automatic instruction collection pipelines, however, inadvertently introduce si
Externí odkaz:
http://arxiv.org/abs/2409.18541
Autor:
Lin, Tianwei, Liu, Jiang, Zhang, Wenqiao, Li, Zhaocheng, Dai, Yang, Li, Haoyuan, Yu, Zhelun, He, Wanggui, Li, Juncheng, Jiang, Hao, Tang, Siliang, Zhuang, Yueting
While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straig
Externí odkaz:
http://arxiv.org/abs/2408.09856
Autor:
Chen, Dong, Zhang, Shilin, Gao, Fei, Zhuang, Yueting, Tang, Siliang, Liu, Qidong, Xu, Mingliang
Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (
Externí odkaz:
http://arxiv.org/abs/2407.19405
Data-Free Knowledge Distillation (DFKD) has shown great potential in creating a compact student model while alleviating the dependency on real training data by synthesizing surrogate data. However, prior arts are seldom discussed under distribution s
Externí odkaz:
http://arxiv.org/abs/2407.15155
Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual
Externí odkaz:
http://arxiv.org/abs/2407.10486
Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features
Externí odkaz:
http://arxiv.org/abs/2407.09191
Autor:
Zhang, Wenqi, Cheng, Zhenglin, He, Yuanyu, Wang, Mengna, Shen, Yongliang, Tan, Zeqi, Hou, Guiyang, He, Mingqian, Ma, Yanna, Lu, Weiming, Zhuang, Yueting
Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. T
Externí odkaz:
http://arxiv.org/abs/2407.07053
Publikováno v:
IEEE Transactions on Pattern Analysis and Machine Intelligence 2024
The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mappin
Externí odkaz:
http://arxiv.org/abs/2407.05100
Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information. Research in this field generally consist of two main perspectives: local-level enco
Externí odkaz:
http://arxiv.org/abs/2406.12608