Zobrazeno 1 - 10
of 758
pro vyhledávání: '"Liu, Zhili"'
Autor:
Xiang, Kun, Liu, Zhili, Jiang, Zihao, Nie, Yunshuang, Huang, Runhui, Fan, Haoxiang, Li, Hanhui, Huang, Weiran, Zeng, Yihan, Han, Jianhua, Hong, Lanqing, Xu, Hang, Liang, Xiaodan
In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of ``slow thinking" into multimodal large language models (MLLMs). Contrary to existing methods that rely on direct or fast thinking, our
Externí odkaz:
http://arxiv.org/abs/2411.11930
Autor:
Chen, Kai, Gou, Yunhao, Huang, Runhui, Liu, Zhili, Tan, Daxin, Xu, Jing, Wang, Chunwei, Zhu, Yi, Zeng, Yihan, Yang, Kuo, Wang, Dingdong, Xiang, Kun, Li, Haoyuan, Bai, Haoli, Han, Jianhua, Li, Xiaohui, Jin, Weike, Xie, Nian, Zhang, Yu, Kwok, James T., Zhao, Hengshuang, Liang, Xiaodan, Yeung, Dit-Yan, Chen, Xiao, Li, Zhenguo, Zhang, Wei, Liu, Qun, Yao, Jun, Hong, Lanqing, Hou, Lu, Xu, Hang
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en
Externí odkaz:
http://arxiv.org/abs/2409.18042
Autor:
Liu, Zhili, Gou, Yunhao, Chen, Kai, Hong, Lanqing, Gao, Jiahui, Mi, Fei, Zhang, Yu, Li, Zhenguo, Jiang, Xin, Liu, Qun, Kwok, James T.
As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge. Traditional alignment strategies rely heavily on human intervention, such as Supervised Fine-Tun
Externí odkaz:
http://arxiv.org/abs/2405.00557
Autor:
Gou, Yunhao, Chen, Kai, Liu, Zhili, Hong, Lanqing, Xu, Hang, Li, Zhenguo, Yeung, Dit-Yan, Kwok, James T., Zhang, Yu
Multimodal large language models (MLLMs) have shown impressive reasoning abilities. However, they are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting the unsafe responses, we observe that saf
Externí odkaz:
http://arxiv.org/abs/2403.09572
Publikováno v:
In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Vision-language pre-trained models have achieved impressive performance on various downstream tasks. However, their large model sizes hinder their utilization on platforms with limited computational resources. We find that directly using smaller pre-
Externí odkaz:
http://arxiv.org/abs/2403.07839
Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically
Externí odkaz:
http://arxiv.org/abs/2402.05382
Autor:
Tan, Haochen, Guo, Zhijiang, Shi, Zhan, Xu, Lu, Liu, Zhili, Feng, Yunlong, Li, Xiaoguang, Wang, Yasheng, Shang, Lifeng, Liu, Qun, Song, Linqi
Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed
Externí odkaz:
http://arxiv.org/abs/2401.15042
Autor:
Gou, Yunhao, Liu, Zhili, Chen, Kai, Hong, Lanqing, Xu, Hang, Li, Aoxue, Yeung, Dit-Yan, Kwok, James T., Zhang, Yu
Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks. However, the diversity of training tasks of diffe
Externí odkaz:
http://arxiv.org/abs/2312.12379
Autor:
Li, Pengxiang, Chen, Kai, Liu, Zhili, Gao, Ruiyuan, Hong, Lanqing, Zhou, Guo, Yao, Hua, Yeung, Dit-Yan, Lu, Huchuan, Jia, Xu
Despite remarkable achievements in video synthesis, achieving granular control over complex dynamics, such as nuanced movement among multiple interacting objects, still presents a significant hurdle for dynamic world modeling, compounded by the neces
Externí odkaz:
http://arxiv.org/abs/2312.00651
Autor:
Liu, Zhili, Chen, Kai, Zhang, Yifan, Han, Jianhua, Hong, Lanqing, Xu, Hang, Li, Zhenguo, Yeung, Dit-Yan, Kwok, James
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. These concepts, termed as the "implicit concepts", could be unintentionally learned during training and then be generated uncont
Externí odkaz:
http://arxiv.org/abs/2310.05873