Výsledky vyhledávání

Report

Intelligent Reflecting Surface-Aided Multiuser Communication: Co-design of Transmit Diversity and Active/Passive Precoding

Autor: Zheng, Beixiong, Ma, Tiantian, Tang, Jie, You, Changsheng, Lin, Shaoe, Wong, Kai-Kit

Intelligent reflecting surface (IRS) has become a cost-effective solution for constructing a smart and adaptive radio environment. Most previous works on IRS have jointly designed the active and passive precoding based on perfectly or partially known

Externí odkaz: http://arxiv.org/abs/2409.14088

Zobrazit plný text záznamu

Report

VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning

Autor: Jiang, Zhihuan, Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Wang, Weihan, Xu, Bin, Dong, Yuxiao, Tang, Jie

Multi-modal large language models (MLLMs) have demonstrated promising capabilities across various tasks by integrating textual and visual information to achieve visual understanding in complex scenarios. Despite the availability of several benchmarks

Externí odkaz: http://arxiv.org/abs/2409.13730

Zobrazit plný text záznamu

Report

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

Autor: Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Yu, Wenmeng, Wang, Weihan, Hong, Wenyi, Jiang, Zhihuan, Xu, Bin, Dong, Yuxiao, Tang, Jie

Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathema

Externí odkaz: http://arxiv.org/abs/2409.13729

Zobrazit plný text záznamu

Report

CogVLM2: Visual Language Models for Image and Video Understanding

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new genera

Externí odkaz: http://arxiv.org/abs/2408.16500

Zobrazit plný text záznamu

Report

BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems

Autor: Wang, Wei, Zhang, Dan, Feng, Tao, Wang, Boyan, Tang, Jie

Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks, e.g., building single agents and multi-agent systems. Compared to single agents, multi-agent systems have higher requirements for the collaboration

Externí odkaz: http://arxiv.org/abs/2408.15971

Zobrazit plný text záznamu

Report

LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

Autor: Gui, Jiayi, Liu, Yiming, Cheng, Jiale, Gu, Xiaotao, Liu, Xiao, Wang, Hongning, Dong, Yuxiao, Tang, Jie, Huang, Minlie

Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning an

Externí odkaz: http://arxiv.org/abs/2408.15778

Zobrazit plný text záznamu

Report

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Autor: Bai, Yushi, Zhang, Jiajie, Lv, Xin, Zheng, Linzhi, Zhu, Siqi, Hou, Lei, Dong, Yuxiao, Tang, Jie, Li, Juanzi

Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation l

Externí odkaz: http://arxiv.org/abs/2408.07055

Zobrazit plný text záznamu

Report

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, pote

Externí odkaz: http://arxiv.org/abs/2408.06327

Zobrazit plný text záznamu

Report

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Autor: Yang, Zhuoyi, Teng, Jiayan, Zheng, Wendi, Ding, Ming, Huang, Shiyu, Xu, Jiazheng, Yang, Yuanming, Hong, Wenyi, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Gu, Xiaotao, Zhang, Yuxuan, Wang, Weihan, Cheng, Yean, Liu, Ting, Xu, Bin, Dong, Yuxiao, Tang, Jie

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels. Previous vide

Externí odkaz: http://arxiv.org/abs/2408.06072

Zobrazit plný text záznamu

Report

RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

Autor: Pan, Tianrui, Liu, Jie, Wang, Bohan, Tang, Jie, Wu, Gangshan

While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS m

Externí odkaz: http://arxiv.org/abs/2407.19224

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání