Zobrazeno 1 - 10
of 1 030
pro vyhledávání: '"Zhang Zhaoxiang"'
Autor:
Wu, Siwei, Peng, Zhongyuan, Du, Xinrun, Zheng, Tuney, Liu, Minghao, Wu, Jialong, Ma, Jiachen, Li, Yizhi, Yang, Jian, Zhou, Wangchunshu, Lin, Qunshu, Zhao, Junbo, Zhang, Zhaoxiang, Huang, Wenhao, Zhang, Ge, Lin, Chenghua, Liu, J. H.
Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields diminishing perfo
Externí odkaz:
http://arxiv.org/abs/2410.13639
Autor:
Wang, Pei, Wu, Yanan, Wang, Zekun, Liu, Jiaheng, Song, Xiaoshuai, Peng, Zhongyuan, Deng, Ken, Zhang, Chenchen, Wang, Jiakai, Peng, Junran, Zhang, Ge, Guo, Hangyu, Zhang, Zhaoxiang, Su, Wenbo, Zheng, Bo
Large Language Models (LLMs) have displayed massive improvements in reasoning and decision-making skills and can hold natural conversations with users. Recently, many tool-use benchmark datasets have been proposed. However, existing datasets have the
Externí odkaz:
http://arxiv.org/abs/2410.11710
Autor:
Wang, Yuqi, Cheng, Ke, He, Jiawei, Wang, Qitai, Dai, Hengchen, Chen, Yuntao, Xia, Fei, Zhang, Zhaoxiang
Driving world models have gained increasing attention due to their ability to model complex physical dynamics. However, their superb modeling capability is yet to be fully unleashed due to the limited video diversity in current driving datasets. We i
Externí odkaz:
http://arxiv.org/abs/2410.10738
Autor:
Wang, Haochen, Zheng, Anlin, Zhao, Yucheng, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Zhang, Zhaoxiang
This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals. In contrast to conventional visual instruction tuning approaches that exclusively super
Externí odkaz:
http://arxiv.org/abs/2410.09575
Autor:
Wang, Zekun, Zhu, King, Xu, Chunpu, Zhou, Wangchunshu, Liu, Jiaheng, Zhang, Yibo, Wang, Jiashuo, Shi, Ning, Li, Siyu, Li, Yizhi, Que, Haoran, Zhang, Zhaoxiang, Zhang, Yuanxing, Zhang, Ge, Xu, Ke, Fu, Jie, Huang, Wenhao
In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs
Externí odkaz:
http://arxiv.org/abs/2409.17692
Autor:
Que, Haoran, Duan, Feiyu, He, Liqun, Mou, Yutao, Zhou, Wangchunshu, Liu, Jiaheng, Rong, Wenge, Wang, Zekun Moore, Yang, Jian, Zhang, Ge, Peng, Junran, Zhang, Zhaoxiang, Zhang, Songyang, Chen, Kai
In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks (e.g., long-context understanding), and many benchmarks have been proposed. However, we observe that long text generation capabilities are not we
Externí odkaz:
http://arxiv.org/abs/2409.16191
Autor:
Li, Yizhi, Zhang, Ge, Ma, Yinghao, Yuan, Ruibin, Zhu, Kang, Guo, Hangyu, Liang, Yiming, Liu, Jiaheng, Wang, Zekun, Yang, Jian, Wu, Siwei, Qu, Xingwei, Shi, Jinjie, Zhang, Xinyue, Yang, Zhenzhu, Wang, Xiangzhou, Zhang, Zhaoxiang, Liu, Zachary, Benetos, Emmanouil, Huang, Wenhao, Lin, Chenghua
Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains inadequat
Externí odkaz:
http://arxiv.org/abs/2409.15272
Autor:
Lei, Chenyang, Chen, Liyi, Cen, Jun, Chen, Xiao, Lei, Zhen, Heide, Felix, Liu, Ziwei, Chen, Qifeng, Zhang, Zhaoxiang
Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. However, it is extremely challenging for sensors in many different fields to collect similar scales of natural images to train s
Externí odkaz:
http://arxiv.org/abs/2409.08083
Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampling negatives in prior arts can le
Externí odkaz:
http://arxiv.org/abs/2408.16448
Autor:
Zhang, Shougao, Zhou, Mengqi, Wang, Yuxi, Luo, Chuanchen, Wang, Rongyu, Li, Yiwei, Yin, Xucheng, Zhang, Zhaoxiang, Peng, Junran
Generating a realistic, large-scale 3D virtual city remains a complex challenge due to the involvement of numerous 3D assets, various city styles, and strict layout constraints. Existing approaches provide promising attempts at procedural content gen
Externí odkaz:
http://arxiv.org/abs/2407.17572