Výsledky vyhledávání - "Zhang Zhaoxiang"

Report

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Autor: Wu, Siwei, Peng, Zhongyuan, Du, Xinrun, Zheng, Tuney, Liu, Minghao, Wu, Jialong, Ma, Jiachen, Li, Yizhi, Yang, Jian, Zhou, Wangchunshu, Lin, Qunshu, Zhao, Junbo, Zhang, Zhaoxiang, Huang, Wenhao, Zhang, Ge, Lin, Chenghua, Liu, J. H.

Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields diminishing perfo

Externí odkaz: http://arxiv.org/abs/2410.13639

Zobrazit plný text záznamu

Report

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Autor: Wang, Pei, Wu, Yanan, Wang, Zekun, Liu, Jiaheng, Song, Xiaoshuai, Peng, Zhongyuan, Deng, Ken, Zhang, Chenchen, Wang, Jiakai, Peng, Junran, Zhang, Ge, Guo, Hangyu, Zhang, Zhaoxiang, Su, Wenbo, Zheng, Bo

Large Language Models (LLMs) have displayed massive improvements in reasoning and decision-making skills and can hold natural conversations with users. Recently, many tool-use benchmark datasets have been proposed. However, existing datasets have the

Externí odkaz: http://arxiv.org/abs/2410.11710

Zobrazit plný text záznamu

Report

DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model

Autor: Wang, Yuqi, Cheng, Ke, He, Jiawei, Wang, Qitai, Dai, Hengchen, Chen, Yuntao, Xia, Fei, Zhang, Zhaoxiang

Driving world models have gained increasing attention due to their ability to model complex physical dynamics. However, their superb modeling capability is yet to be fully unleashed due to the limited video diversity in current driving datasets. We i

Externí odkaz: http://arxiv.org/abs/2410.10738

Zobrazit plný text záznamu

Report

Reconstructive Visual Instruction Tuning

Autor: Wang, Haochen, Zheng, Anlin, Zhao, Yucheng, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Zhang, Zhaoxiang

This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals. In contrast to conventional visual instruction tuning approaches that exclusively super

Externí odkaz: http://arxiv.org/abs/2410.09575

Zobrazit plný text záznamu

Report

MIO: A Foundation Model on Multimodal Tokens

Autor: Wang, Zekun, Zhu, King, Xu, Chunpu, Zhou, Wangchunshu, Liu, Jiaheng, Zhang, Yibo, Wang, Jiashuo, Shi, Ning, Li, Siyu, Li, Yizhi, Que, Haoran, Zhang, Zhaoxiang, Zhang, Yuanxing, Zhang, Ge, Xu, Ke, Fu, Jie, Huang, Wenhao

In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs

Externí odkaz: http://arxiv.org/abs/2409.17692

Zobrazit plný text záznamu

Report

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Autor: Que, Haoran, Duan, Feiyu, He, Liqun, Mou, Yutao, Zhou, Wangchunshu, Liu, Jiaheng, Rong, Wenge, Wang, Zekun Moore, Yang, Jian, Zhang, Ge, Peng, Junran, Zhang, Zhaoxiang, Zhang, Songyang, Chen, Kai

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks (e.g., long-context understanding), and many benchmarks have been proposed. However, we observe that long text generation capabilities are not we

Externí odkaz: http://arxiv.org/abs/2409.16191

Zobrazit plný text záznamu

Report

OmniBench: Towards The Future of Universal Omni-Language Models

Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains inadequat

Externí odkaz: http://arxiv.org/abs/2409.15272

Zobrazit plný text záznamu

Report

SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality

Autor: Lei, Chenyang, Chen, Liyi, Cen, Jun, Chen, Xiao, Lei, Zhen, Heide, Felix, Liu, Ziwei, Chen, Qifeng, Zhang, Zhaoxiang

Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. However, it is extremely challenging for sensors in many different fields to collect similar scales of natural images to train s

Externí odkaz: http://arxiv.org/abs/2409.08083

Zobrazit plný text záznamu

Report

Enhancing Sound Source Localization via False Negative Elimination

Autor: Song, Zengjie, Zhang, Jiangshe, Wang, Yuxi, Fan, Junsong, Zhang, Zhaoxiang

Sound source localization aims to localize objects emitting the sound in visual scenes. Recent works obtaining impressive results typically rely on contrastive learning. However, the common practice of randomly sampling negatives in prior arts can le

Externí odkaz: http://arxiv.org/abs/2408.16448

Zobrazit plný text záznamu

Report

CityX: Controllable Procedural Content Generation for Unbounded 3D Cities

Autor: Zhang, Shougao, Zhou, Mengqi, Wang, Yuxi, Luo, Chuanchen, Wang, Rongyu, Li, Yiwei, Yin, Xucheng, Zhang, Zhaoxiang, Peng, Junran

Generating a realistic, large-scale 3D virtual city remains a complex challenge due to the involvement of numerous 3D assets, various city styles, and strict layout constraints. Existing approaches provide promising attempts at procedural content gen

Externí odkaz: http://arxiv.org/abs/2407.17572

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání