Výsledky vyhledávání

Report

Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence

Autor: Lu, Junru, Li, Jiazheng, An, Siyu, Zhao, Meng, He, Yulan, Yin, Di, Sun, Xing

Direct Preference Optimization (DPO) has emerged as a prominent algorithm for the direct and robust alignment of Large Language Models (LLMs) with human preferences, offering a more straightforward alternative to the complex Reinforcement Learning fr

Externí odkaz: http://arxiv.org/abs/2406.10957

Zobrazit plný text záznamu

Report

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Autor: Zhou, Chenyu, Zhang, Mengdan, Chen, Peixian, Fu, Chaoyou, Shen, Yunhang, Zheng, Xiawu, Sun, Xing, Ji, Rongrong

The swift progress of Multi-modal Large Models (MLLMs) has showcased their impressive ability to tackle tasks blending vision and language. Yet, most current models and benchmarks cater to scenarios with a narrow scope of visual and textual contexts.

Externí odkaz: http://arxiv.org/abs/2406.10228

Zobrazit plný text záznamu

Report

FinVerse: An Autonomous Agent System for Versatile Financial Analysis

Autor: An, Siyu, Li, Qin, Lu, Junru, Yin, Di, Sun, Xing

With the significant advancements in cognitive intelligence driven by LLMs, autonomous agent systems have attracted extensive attention. Despite this growing interest, the development of stable and efficient agent systems poses substantial practical

Externí odkaz: http://arxiv.org/abs/2406.06379

Zobrazit plný text záznamu

Report

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. T

Externí odkaz: http://arxiv.org/abs/2405.21075

Zobrazit plný text záznamu

Report

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Autor: Gao, Timin, Chen, Peixian, Zhang, Mengdan, Fu, Chaoyou, Shen, Yunhang, Zhang, Yan, Zhang, Shengchuan, Zheng, Xiawu, Sun, Xing, Cao, Liujuan, Ji, Rongrong

With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm

Externí odkaz: http://arxiv.org/abs/2404.16033

Zobrazit plný text záznamu

Report

HRVDA: High-Resolution Visual Document Assistant

Autor: Liu, Chaohu, Yin, Kun, Cao, Haoyu, Jiang, Xinghua, Li, Xin, Liu, Yinsong, Jiang, Deqiang, Sun, Xing, Xu, Linli

Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document und

Externí odkaz: http://arxiv.org/abs/2404.06918

Zobrazit plný text záznamu

Report

A General and Efficient Training for Transformer via Token Expansion

Autor: Huang, Wenxuan, Shen, Yunhang, Xie, Jiao, Zhang, Baochang, He, Gaoqi, Li, Ke, Sun, Xing, Lin, Shaohui

The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method universality with accuracy dropping. Mean

Externí odkaz: http://arxiv.org/abs/2404.00672

Zobrazit plný text záznamu

Report

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

Autor: Yang, Yuncheng, Zhang, Chuyan, Yang, Zuopeng, Gao, Yuting, Qin, Yulei, Li, Ke, Sun, Xing, Yang, Jie, Gu, Yun

Prompt learning is effective for fine-tuning foundation models to improve their generalization across a variety of downstream tasks. However, the prompts that are independently optimized along a single modality path, may sacrifice the vision-language

Externí odkaz: http://arxiv.org/abs/2403.06136

Zobrazit plný text záznamu

Report

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Autor: Li, Xin, Wu, Yunfei, Jiang, Xinghua, Guo, Zhihao, Gong, Mingming, Cao, Haoyu, Liu, Yinsong, Jiang, Deqiang, Sun, Xing

Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifi

Externí odkaz: http://arxiv.org/abs/2402.19014

Zobrazit plný text záznamu

Report

Sinkhorn Distance Minimization for Knowledge Distillation

Autor: Cui, Xiao, Qin, Yulei, Gao, Yuting, Zhang, Enwei, Xu, Zihan, Wu, Tong, Li, Ke, Sun, Xing, Zhou, Wengang, Li, Houqiang

Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) div

Externí odkaz: http://arxiv.org/abs/2402.17110

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání