Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Zhao, Haozhe"'
Autor:
Zhao, Haozhe, Ma, Xiaojian, Chen, Liang, Si, Shuzheng, Wu, Rujie, An, Kaikai, Yu, Peiyu, Zhang, Minjia, Li, Qing, Chang, Baobao
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2
Externí odkaz:
http://arxiv.org/abs/2407.05282
Autor:
Huang, Jinsheng, Chen, Liang, Guo, Taian, Zeng, Fu, Zhao, Yusheng, Wu, Bohan, Yuan, Ye, Zhao, Haozhe, Guo, Zhihui, Zhang, Yichi, Yuan, Jingyang, Ju, Wei, Liu, Luchen, Liu, Tianyu, Chang, Baobao, Zhang, Ming
Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for suc
Externí odkaz:
http://arxiv.org/abs/2407.00468
Large-scale multilingual Pretrained Language Models (mPLMs) yield impressive performance on cross-language tasks, yet significant performance disparities exist across different languages within the same mPLM. Previous studies endeavored to narrow the
Externí odkaz:
http://arxiv.org/abs/2404.08491
In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual tokens is of e
Externí odkaz:
http://arxiv.org/abs/2403.06764
Autor:
Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Meng, Xiangdi, Liu, Tianyu, Chang, Baobao
We present PCA-Bench, a multimodal decision-making benchmark for evaluating the integrated capabilities of Multimodal Large Language Models (MLLMs). Departing from previous benchmarks focusing on simplistic tasks and individual model capability, PCA-
Externí odkaz:
http://arxiv.org/abs/2402.15527
Autor:
Tang, Xiangru, Liu, Yuliang, Cai, Zefan, Shao, Yanjun, Lu, Junjie, Zhang, Yichi, Deng, Zexuan, Hu, Helan, An, Kaikai, Huang, Ruijun, Si, Shuzheng, Chen, Sheng, Zhao, Haozhe, Chen, Liang, Wang, Yan, Liu, Tianyu, Jiang, Zhiwei, Chang, Baobao, Fang, Yin, Qin, Yujia, Zhou, Wangchunshu, Zhao, Yilun, Cohan, Arman, Gerstein, Mark
Despite Large Language Models (LLMs) like GPT-4 achieving impressive results in function-level code generation, they struggle with repository-scale code understanding (e.g., coming up with the right arguments for calling routines), requiring a deeper
Externí odkaz:
http://arxiv.org/abs/2311.09835
Distantly-Supervised Named Entity Recognition (DS-NER) is widely used in real-world scenarios. It can effectively alleviate the burden of annotation by matching entities in existing knowledge bases with snippets in the text but suffer from the label
Externí odkaz:
http://arxiv.org/abs/2311.08010
Frame identification aims to find semantic frames associated with target words in a sentence. Recent researches measure the similarity or matching score between targets and candidate frames by modeling frame definitions. However, they either lack suf
Externí odkaz:
http://arxiv.org/abs/2310.13316
Autor:
Chen, Liang, Zhang, Yichi, Ren, Shuhuai, Zhao, Haozhe, Cai, Zefan, Wang, Yuchi, Wang, Peiyi, Liu, Tianyu, Chang, Baobao
In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents. While Large Language Models (LLMs) have been widely used due to their advanced reasoning skills and vast w
Externí odkaz:
http://arxiv.org/abs/2310.02071
Autor:
Zhao, Haozhe, Cai, Zefan, Si, Shuzheng, Ma, Xiaojian, An, Kaikai, Chen, Liang, Liu, Zixuan, Wang, Sheng, Han, Wenjuan, Chang, Baobao
Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context
Externí odkaz:
http://arxiv.org/abs/2309.07915