Zobrazeno 1 - 10
of 423
pro vyhledávání: '"Chen, Liangyu"'
Autor:
Ma, Yubo, Zang, Yuhang, Chen, Liangyu, Chen, Meiqi, Jiao, Yizhu, Li, Xinze, Lu, Xinyuan, Liu, Ziyu, Ma, Yan, Dong, Xiaoyi, Zhang, Pan, Pan, Liangming, Jiang, Yu-Gang, Wang, Jiaqi, Cao, Yixin, Sun, Aixin
Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding
Externí odkaz:
http://arxiv.org/abs/2407.01523
Autonomous embodied agents live on an Internet of multimedia websites. Can they hop around multimodal websites to complete complex user tasks? Existing benchmarks fail to assess them in a realistic, evolving environment for their embodiment across we
Externí odkaz:
http://arxiv.org/abs/2404.09992
The growth of simple operators is essential for the emergence of chaotic dynamics and quantum thermalization. Recent studies have proposed different measures, including the out-of-time-order correlator and Krylov complexity. It is established that th
Externí odkaz:
http://arxiv.org/abs/2404.08207
Autor:
Nakamura, Taishi, Mishra, Mayank, Tedeschi, Simone, Chai, Yekun, Stillerman, Jason T, Friedrich, Felix, Yadav, Prateek, Laud, Tanmay, Chien, Vu Minh, Zhuo, Terry Yue, Misra, Diganta, Bogin, Ben, Vu, Xuan-Son, Karpinska, Marzena, Dantuluri, Arnav Varma, Kusa, Wojciech, Furlanello, Tommaso, Yokota, Rio, Muennighoff, Niklas, Pai, Suhas, Adewumi, Tosin, Laippala, Veronika, Yao, Xiaozhe, Junior, Adalberto, Ariyak, Alpay, Drozd, Aleksandr, Clive, Jordan, Gupta, Kshitij, Chen, Liangyu, Sun, Qi, Tsui, Ken, Persaud, Noah, Fahmy, Nour, Chen, Tianlong, Bansal, Mohit, Monti, Nicolo, Dang, Tai, Luo, Ziyang, Bui, Tien-Tung, Navigli, Roberto, Mehta, Virendra, Blumberg, Matthew, May, Victor, Nguyen, Huu, Pyysalo, Sampo
Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community devel
Externí odkaz:
http://arxiv.org/abs/2404.00399
Autor:
Kosen, Sandoko, Li, Hang-Xi, Rommel, Marcus, Rehammar, Robert, Caputo, Marco, Grönberg, Leif, Fernández-Pendás, Jorge, Kockum, Anton Frisk, Biznárová, Janka, Chen, Liangyu, Križan, Christian, Nylander, Andreas, Osman, Amr, Roudsari, Anita Fadavi, Shiri, Daryoush, Tancredi, Giovanna, Govenius, Joonas, Bylander, Jonas
Quantum processors require a signal-delivery architecture with high addressability (low crosstalk) to ensure high performance already at the scale of dozens of qubits. Signal crosstalk causes inadvertent driving of quantum gates, which will adversely
Externí odkaz:
http://arxiv.org/abs/2403.00285
This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework
Externí odkaz:
http://arxiv.org/abs/2401.02777
Autor:
Li, Linze, Fan, Sunqi, Pu, Hengjun, Bing, Zhaodong, Tang, Yao, Ye, Tianzhu, Yang, Tong, Chen, Liangyu, Liang, Jiajun
Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontr
Externí odkaz:
http://arxiv.org/abs/2312.03775
Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow th
Externí odkaz:
http://arxiv.org/abs/2310.18075
Autor:
Chen, Liangyu, Li, Bo, Shen, Sheng, Yang, Jingkang, Li, Chunyuan, Keutzer, Kurt, Darrell, Trevor, Liu, Ziwei
Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the c
Externí odkaz:
http://arxiv.org/abs/2310.15166
With the impressive progress in diffusion-based text-to-image generation, extending such powerful generative ability to text-to-video raises enormous attention. Existing methods either require large-scale text-video pairs and a large number of traini
Externí odkaz:
http://arxiv.org/abs/2310.10769