Výsledky vyhledávání - "Zhuang, Yueting"

Report

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection

Autor: Miao, Bingchen, Zhang, Wenqiao, Li, Juncheng, Tang, Siliang, Li, Zhaocheng, Shi, Haochen, Xiao, Jun, Zhuang, Yueting

Multimodal Industrial Anomaly Detection (MIAD), utilizing 3D point clouds and 2D RGB images to identify the abnormal region of products, plays a crucial role in industrial quality inspection. However, the conventional MIAD setting presupposes that al

Externí odkaz: http://arxiv.org/abs/2410.01737

Zobrazit plný text záznamu

Report

Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Autor: Huang, Hongzhe, Yu, Zhewen, Liu, Jiang, Cai, Li, Jiao, Dian, Zhang, Wenqiao, Tang, Siliang, Li, Juncheng, Jiang, Hao, Li, Haoyuan, Zhuang, Yueting

Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning. Such automatic instruction collection pipelines, however, inadvertently introduce si

Externí odkaz: http://arxiv.org/abs/2409.18541

Zobrazit plný text záznamu

Report

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

Autor: Lin, Tianwei, Liu, Jiang, Zhang, Wenqiao, Li, Zhaocheng, Dai, Yang, Li, Haoyuan, Yu, Zhelun, He, Wanggui, Li, Juncheng, Jiang, Hao, Tang, Siliang, Zhuang, Yueting

While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straig

Externí odkaz: http://arxiv.org/abs/2408.09856

Zobrazit plný text záznamu

Report

Logic Distillation: Learning from Code Function by Function for Planning and Decision-making

Autor: Chen, Dong, Zhang, Shilin, Gao, Fei, Zhuang, Yueting, Tang, Siliang, Liu, Qidong, Xu, Mingliang

Large language models (LLMs) have garnered increasing attention owing to their powerful logical reasoning capabilities. Generally, larger LLMs (L-LLMs) that require paid interfaces exhibit significantly superior performance compared to smaller LLMs (

Externí odkaz: http://arxiv.org/abs/2407.19405

Zobrazit plný text záznamu

Report

Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification

Autor: Xuan, Yunyi, Chen, Weijie, Yang, Shicai, Xie, Di, Lin, Luojun, Zhuang, Yueting

Data-Free Knowledge Distillation (DFKD) has shown great potential in creating a compact student model while alleviating the dependency on real training data by synthesizing surrogate data. However, prior arts are seldom discussed under distribution s

Externí odkaz: http://arxiv.org/abs/2407.15155

Zobrazit plný text záznamu

Report

IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

Autor: Cao, Jie, Jiao, Dian, Yan, Qiang, Zhang, Wenqiao, Tang, Siliang, Zhuang, Yueting

Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual

Externí odkaz: http://arxiv.org/abs/2407.10486

Zobrazit plný text záznamu

Report

From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

Autor: Shi, Hanrong, Li, Lin, Xiao, Jun, Zhuang, Yueting, Chen, Long

Panoptic Scene Graph Generation (PSG) aims to generate a comprehensive graph-structure representation based on panoptic segmentation masks. Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features

Externí odkaz: http://arxiv.org/abs/2407.09191

Zobrazit plný text záznamu

Report

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

Autor: Zhang, Wenqi, Cheng, Zhenglin, He, Yuanyu, Wang, Mengna, Shen, Yongliang, Tan, Zeqi, Hou, Guiyang, He, Mingqian, Ma, Yanna, Lu, Weiming, Zhuang, Yueting

Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. T

Externí odkaz: http://arxiv.org/abs/2407.07053

Zobrazit plný text záznamu

Report

Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference

Autor: Shen, Kai, Wu, Lingfei, Tang, Siliang, Xu, Fangli, Long, Bo, Zhuang, Yueting, Pei, Jian

Publikováno v: IEEE Transactions on Pattern Analysis and Machine Intelligence 2024

The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e.g. answer type). Previous works on VQG fall in two aspects: i) They suffer from one image to many questions mappin

Externí odkaz: http://arxiv.org/abs/2407.05100

Zobrazit plný text záznamu

Report

Bridging Local Details and Global Context in Text-Attributed Graphs

Autor: Wang, Yaoke, Zhu, Yun, Zhang, Wenqiao, Zhuang, Yueting, Li, Yunfei, Tang, Siliang

Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information. Research in this field generally consist of two main perspectives: local-level enco

Externí odkaz: http://arxiv.org/abs/2406.12608

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání