Výsledky vyhledávání - "Zhao, Haozhe"

Report

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Autor: Zhao, Haozhe, Ma, Xiaojian, Chen, Liang, Si, Shuzheng, Wu, Rujie, An, Kaikai, Yu, Peiyu, Zhang, Minjia, Li, Qing, Chang, Baobao

This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2

Externí odkaz: http://arxiv.org/abs/2407.05282

Zobrazit plný text záznamu

Report

MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Autor: Huang, Jinsheng, Chen, Liang, Guo, Taian, Zeng, Fu, Zhao, Yusheng, Wu, Bohan, Yuan, Ye, Zhao, Haozhe, Guo, Zhihui, Zhang, Yichi, Yuan, Jingyang, Ju, Wei, Liu, Luchen, Liu, Tianyu, Chang, Baobao, Zhang, Ming

Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for suc

Externí odkaz: http://arxiv.org/abs/2407.00468

Zobrazit plný text záznamu

Report

Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation

Autor: Zhao, Haozhe, Cai, Zefan, Si, Shuzheng, Chen, Liang, He, Yufeng, An, Kaikai, Chang, Baobao

Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow the

Externí odkaz: http://arxiv.org/abs/2404.08491

Zobrazit plný text záznamu

Report

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Autor: Chen, Liang, Zhao, Haozhe, Liu, Tianyu, Bai, Shuai, Lin, Junyang, Zhou, Chang, Chang, Baobao

In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of e

Externí odkaz: http://arxiv.org/abs/2403.06764

Zobrazit plný text záznamu

Report

PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain

Autor: Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Meng, Xiangdi, Liu, Tianyu, Chang, Baobao

We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-

Externí odkaz: http://arxiv.org/abs/2402.15527

Zobrazit plný text záznamu

Report

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e.g., coming up with the right arguments for calling routines), requiring a deeper

Externí odkaz: http://arxiv.org/abs/2311.09835

Zobrazit plný text záznamu

Report

Improving the Robustness of Distantly-Supervised Named Entity Recognition via Uncertainty-Aware Teacher Learning and Student-Student Collaborative Learning

Autor: Hu, Helan, Si, Shuzheng, Zhao, Haozhe, Zeng, Shuang, An, Kaikai, Cai, Zefan, Chang, Baobao

Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label

Externí odkaz: http://arxiv.org/abs/2311.08010

Zobrazit plný text záznamu

Report

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners

Autor: An, Kaikai, Zheng, Ce, Gao, Bofei, Zhao, Haozhe, Chang, Baobao

Frame identification aims to find semantic frames associated with target words in a sentence. Recent researches measure the similarity or matching score between targets and candidate frames by modeling frame definitions. However, they either lack suf

Externí odkaz: http://arxiv.org/abs/2310.13316

Zobrazit plný text záznamu

Report

Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond

Autor: Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Liu, Tianyu, Chang, Baobao

In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents. While Large Language Models (LLMs) have been widely used due to their advanced reasoning skills and vast w

Externí odkaz: http://arxiv.org/abs/2310.02071

Zobrazit plný text záznamu

Report

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

Autor: Zhao, Haozhe, Cai, Zefan, Si, Shuzheng, Ma, Xiaojian, An, Kaikai, Chen, Liang, Liu, Zixuan, Wang, Sheng, Han, Wenjuan, Chang, Baobao

Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context

Externí odkaz: http://arxiv.org/abs/2309.07915

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání