Výsledky vyhledávání - "Chen, Liangyu"

Report

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Autor: Ma, Yubo, Zang, Yuhang, Chen, Liangyu, Chen, Meiqi, Jiao, Yizhu, Li, Xinze, Lu, Xinyuan, Liu, Ziyu, Ma, Yan, Dong, Xiaoyi, Zhang, Pan, Pan, Liangming, Jiang, Yu-Gang, Wang, Jiaqi, Cao, Yixin, Sun, Aixin

Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding

Externí odkaz: http://arxiv.org/abs/2407.01523

Zobrazit plný text záznamu

Report

MMInA: Benchmarking Multihop Multimodal Internet Agents

Autor: Zhang, Ziniu, Tian, Shulin, Chen, Liangyu, Liu, Ziwei

Autonomous embodied agents live on an Internet of multimedia websites. Can they hop around multimodal websites to complete complex user tasks? Existing benchmarks fail to assess them in a realistic, evolving environment for their embodiment across we

Externí odkaz: http://arxiv.org/abs/2404.09992

Zobrazit plný text záznamu

Report

Dissecting Quantum Many-body Chaos in the Krylov Space

Autor: Chen, Liangyu, Mu, Baoyuan, Wang, Huajia, Zhang, Pengfei

The growth of simple operators is essential for the emergence of chaotic dynamics and quantum thermalization. Recent studies have proposed different measures, including the out-of-time-order correlator and Krylov complexity. It is established that th

Externí odkaz: http://arxiv.org/abs/2404.08207

Zobrazit plný text záznamu

Report

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community devel

Externí odkaz: http://arxiv.org/abs/2404.00399

Zobrazit plný text záznamu

Report

Signal crosstalk in a flip-chip quantum processor

Autor: Kosen, Sandoko, Li, Hang-Xi, Rommel, Marcus, Rehammar, Robert, Caputo, Marco, Grönberg, Leif, Fernández-Pendás, Jorge, Kockum, Anton Frisk, Biznárová, Janka, Chen, Liangyu, Križan, Christian, Nylander, Andreas, Osman, Amr, Roudsari, Anita Fadavi, Shiri, Daryoush, Tancredi, Giovanna, Govenius, Joonas, Bylander, Jonas

Quantum processors require a signal-delivery architecture with high addressability (low crosstalk) to ensure high performance already at the scale of dozens of qubits. Signal crosstalk causes inadvertent driving of quantum gates, which will adversely

Externí odkaz: http://arxiv.org/abs/2403.00285

Zobrazit plný text záznamu

Report

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

Autor: Liu, Na, Chen, Liangyu, Tian, Xiaoyu, Zou, Wei, Chen, Kaijiang, Cui, Ming

This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework

Externí odkaz: http://arxiv.org/abs/2401.02777

Zobrazit plný text záznamu

Report

FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

Autor: Li, Linze, Fan, Sunqi, Pu, Hengjun, Bing, Zhaodong, Tang, Yao, Ye, Tianzhu, Yang, Tong, Chen, Liangyu, Liang, Jiajun

Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontr

Externí odkaz: http://arxiv.org/abs/2312.03775

Zobrazit plný text záznamu

Report

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

Autor: Tian, Xiaoyu, Chen, Liangyu, Liu, Na, Liu, Yaxuan, Zou, Wei, Chen, Kaijiang, Cui, Ming

Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow th

Externí odkaz: http://arxiv.org/abs/2310.18075

Zobrazit plný text záznamu

Report

Large Language Models are Visual Reasoning Coordinators

Autor: Chen, Liangyu, Li, Bo, Shen, Sheng, Yang, Jingkang, Li, Chunyuan, Keutzer, Kurt, Darrell, Trevor, Liu, Ziwei

Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the c

Externí odkaz: http://arxiv.org/abs/2310.15166

Zobrazit plný text záznamu

Report

LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation

Autor: Wu, Ruiqi, Chen, Liangyu, Yang, Tong, Guo, Chunle, Li, Chongyi, Zhang, Xiangyu

With the impressive progress in diffusion-based text-to-image generation, extending such powerful generative ability to text-to-video raises enormous attention. Existing methods either require large-scale text-video pairs and a large number of traini

Externí odkaz: http://arxiv.org/abs/2310.10769

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání