Zobrazeno 1 - 10
of 2 295
pro vyhledávání: '"Sun, Xing"'
Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning fr
Externí odkaz:
http://arxiv.org/abs/2406.10957
Autor:
Zhou, Chenyu, Zhang, Mengdan, Chen, Peixian, Fu, Chaoyou, Shen, Yunhang, Zheng, Xiawu, Sun, Xing, Ji, Rongrong
The swift progress of Multi-modal Large Models (MLLMs) has showcased their impressive ability to tackle tasks blending vision and language. Yet, most current models and benchmarks cater to scenarios with a narrow scope of visual and textual contexts.
Externí odkaz:
http://arxiv.org/abs/2406.10228
With the significant advancements in cognitive intelligence driven by LLMs, autonomous agent systems have attracted extensive attention. Despite this growing interest, the development of stable and efficient agent systems poses substantial practical
Externí odkaz:
http://arxiv.org/abs/2406.06379
Autor:
Fu, Chaoyou, Dai, Yuhan, Luo, Yongdong, Li, Lei, Ren, Shuhuai, Zhang, Renrui, Wang, Zihan, Zhou, Chenyu, Shen, Yunhang, Zhang, Mengdan, Chen, Peixian, Li, Yanwei, Lin, Shaohui, Zhao, Sirui, Li, Ke, Xu, Tong, Zheng, Xiawu, Chen, Enhong, Ji, Rongrong, Sun, Xing
In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. T
Externí odkaz:
http://arxiv.org/abs/2405.21075
Autor:
Gao, Timin, Chen, Peixian, Zhang, Mengdan, Fu, Chaoyou, Shen, Yunhang, Zhang, Yan, Zhang, Shengchuan, Zheng, Xiawu, Sun, Xing, Cao, Liujuan, Ji, Rongrong
With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm
Externí odkaz:
http://arxiv.org/abs/2404.16033
Autor:
Liu, Chaohu, Yin, Kun, Cao, Haoyu, Jiang, Xinghua, Li, Xin, Liu, Yinsong, Jiang, Deqiang, Sun, Xing, Xu, Linli
Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document und
Externí odkaz:
http://arxiv.org/abs/2404.06918
Autor:
Huang, Wenxuan, Shen, Yunhang, Xie, Jiao, Zhang, Baochang, He, Gaoqi, Li, Ke, Sun, Xing, Lin, Shaohui
The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method universality with accuracy dropping. Mean
Externí odkaz:
http://arxiv.org/abs/2404.00672
Autor:
Yang, Yuncheng, Zhang, Chuyan, Yang, Zuopeng, Gao, Yuting, Qin, Yulei, Li, Ke, Sun, Xing, Yang, Jie, Gu, Yun
Prompt learning is effective for fine-tuning foundation models to improve their generalization across a variety of downstream tasks. However, the prompts that are independently optimized along a single modality path, may sacrifice the vision-language
Externí odkaz:
http://arxiv.org/abs/2403.06136
Autor:
Li, Xin, Wu, Yunfei, Jiang, Xinghua, Guo, Zhihao, Gong, Mingming, Cao, Haoyu, Liu, Yinsong, Jiang, Deqiang, Sun, Xing
Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifi
Externí odkaz:
http://arxiv.org/abs/2402.19014
Autor:
Cui, Xiao, Qin, Yulei, Gao, Yuting, Zhang, Enwei, Xu, Zihan, Wu, Tong, Li, Ke, Sun, Xing, Zhou, Wengang, Li, Houqiang
Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) div
Externí odkaz:
http://arxiv.org/abs/2402.17110