Výsledky vyhledávání

Report

GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models

Autor: Li, Shilong, He, Yancheng, Guo, Hangyu, Bu, Xingyuan, Bai, Ge, Liu, Jie, Liu, Jiaheng, Qu, Xingwei, Li, Yangguang, Ouyang, Wanli, Su, Wenbo, Zheng, Bo

Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, w

Externí odkaz: http://arxiv.org/abs/2406.14550

Zobrazit plný text záznamu

Report

GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

Autor: Cai, Shihao, Bao, Keqin, Guo, Hangyu, Zhang, Jizhi, Song, Jun, Zheng, Bo

Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effecti

Externí odkaz: http://arxiv.org/abs/2406.11503

Zobrazit plný text záznamu

Report

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

Autor: Que, Haoran, Liu, Jiaheng, Zhang, Ge, Zhang, Chenchen, Qu, Xingwei, Ma, Yinghao, Duan, Feiyu, Bai, Zhiqi, Wang, Jiakai, Zhang, Yuanxing, Tan, Xu, Fu, Jie, Su, Wenbo, Wang, Jiamang, Qu, Lin, Zheng, Bo

Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how

Externí odkaz: http://arxiv.org/abs/2406.01375

Zobrazit plný text záznamu

Report

R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models

Autor: Deng, Ken, Liu, Jiaheng, Zhu, He, Liu, Congnan, Li, Jingxin, Wang, Jiakai, Zhao, Peng, Zhang, Chenchen, Wu, Yanan, Yin, Xueqiao, Zhang, Yuanxing, Su, Wenbo, Xiang, Bangyu, Ge, Tiezheng, Zheng, Bo

Code completion models have made significant progress in recent years. Recently, repository-level code completion has drawn more attention in modern software development, and several baseline methods and benchmarks have been proposed. However, existi

Externí odkaz: http://arxiv.org/abs/2406.01359

Zobrazit plný text záznamu

Report

Demystify Mamba in Vision: A Linear Attention Perspective

Autor: Han, Dongchen, Wang, Ziyi, Xia, Zhuofan, Han, Yizeng, Pu, Yifan, Ge, Chunjiang, Song, Jun, Song, Shiji, Zheng, Bo, Huang, Gao

Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares

Externí odkaz: http://arxiv.org/abs/2405.16605

Zobrazit plný text záznamu

Report

AIGB: Generative Auto-bidding via Diffusion Modeling

Autor: Guo, Jiayan, Huo, Yusen, Zhang, Zhilin, Wang, Tianyu, Yu, Chuan, Xu, Jian, Zhang, Yan, Zheng, Bo

Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled throug

Externí odkaz: http://arxiv.org/abs/2405.16141

Zobrazit plný text záznamu

Report

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Autor: Ge, Chunjiang, Cheng, Sijie, Wang, Ziming, Yuan, Jiale, Gao, Yuan, Song, Jun, Song, Shiji, Huang, Gao, Zheng, Bo

High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However,

Externí odkaz: http://arxiv.org/abs/2405.15738

Zobrazit plný text záznamu

Report

Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

Autor: Yang, Dingyi, Zhan, Chunru, Wang, Ziheng, Wang, Biao, Ge, Tiezheng, Zheng, Bo, Jin, Qin

Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and vide

Externí odkaz: http://arxiv.org/abs/2405.14040

Zobrazit plný text záznamu

Report

Safety Alignment for Vision Language Models

Autor: Liu, Zhendong, Nie, Yuanbi, Tan, Yingshui, Yue, Xiangyu, Cui, Qiushi, Wang, Chongjun, Zhu, Xiaoyong, Zheng, Bo

Benefiting from the powerful capabilities of Large Language Models (LLMs), pre-trained visual encoder models connected to an LLMs can realize Vision Language Models (VLMs). However, existing research shows that the visual modality of VLMs is vulnerab

Externí odkaz: http://arxiv.org/abs/2405.13581

Zobrazit plný text záznamu

Report

Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

Autor: Chen, Hongyu, Gao, Yiqi, Zhou, Min, Wang, Peng, Li, Xubin, Ge, Tiezheng, Zheng, Bo

Recently, integrating visual controls into text-to-image~(T2I) models, such as ControlNet method, has received significant attention for finer control capabilities. While various training-free methods make efforts to enhance prompt following in T2I m

Externí odkaz: http://arxiv.org/abs/2404.14768

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání