Zobrazeno 1 - 10
of 1 563
pro vyhledávání: '"Zhou, Junjie"'
Although current Multi-modal Large Language Models (MLLMs) demonstrate promising results in video understanding, processing extremely long videos remains an ongoing challenge. Typically, MLLMs struggle with handling thousands of visual tokens that ex
Externí odkaz:
http://arxiv.org/abs/2409.14485
We study dynamic network formation from a centralized perspective. In each period, the social planner builds a single link to connect previously unlinked pairs. The social planner is forward-looking, with instantaneous utility monotonic in the aggreg
Externí odkaz:
http://arxiv.org/abs/2409.14136
Autor:
Xiao, Shitao, Wang, Yueze, Zhou, Junjie, Yuan, Huaying, Xing, Xingrun, Yan, Ruiran, Wang, Shuting, Huang, Tiejun, Liu, Zheng
In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse contro
Externí odkaz:
http://arxiv.org/abs/2409.11340
Multi-modal retrieval becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process visual information. Despite the presence of vision-language models like CLIP, the current
Externí odkaz:
http://arxiv.org/abs/2406.04292
Autor:
Zhou, Junjie, Shu, Yan, Zhao, Bo, Wu, Boya, Xiao, Shitao, Yang, Xi, Xiong, Yongping, Zhang, Bo, Huang, Tiejun, Liu, Zheng
The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insuffi
Externí odkaz:
http://arxiv.org/abs/2406.04264
Autor:
Liu, Baolin, Yang, Zongyuan, Wang, Pengfei, Zhou, Junjie, Liu, Ziqi, Song, Ziyi, Liu, Yan, Xiong, Yongping
The goal of scene text image super-resolution is to reconstruct high-resolution text-line images from unrecognizable low-resolution inputs. The existing methods relying on the optimization of pixel-level loss tend to yield text edges that exhibit a n
Externí odkaz:
http://arxiv.org/abs/2308.06743
Autor:
Yang, Zongyuan, Liu, Baolin, Xiong, Yongping, Yi, Lan, Wu, Guibin, Tang, Xiaojun, Liu, Ziqi, Zhou, Junjie, Zhang, Xing
Removing degradation from document images not only improves their visual quality and readability, but also enhances the performance of numerous automated document analysis and recognition tasks. However, existing regression-based methods optimized fo
Externí odkaz:
http://arxiv.org/abs/2305.03892
In a model of interconnected conflicts on a network, we compare the equilibrium effort profiles and payoffs under two scenarios: uniform effort (UE) in which each contestant is restricted to exert the same effort across all the battles she participat
Externí odkaz:
http://arxiv.org/abs/2302.09861
Transformer models have achieved promising performances in point cloud segmentation. However, most existing attention schemes provide the same feature learning paradigm for all points equally and overlook the enormous difference in size among scene o
Externí odkaz:
http://arxiv.org/abs/2301.06869
Autor:
Kor, Ryan, Zhou, Junjie
We study a planner's optimal interventions in both the standalone marginal utilities of players on a network and weights on the links that connect players. The welfare-maximizing joint intervention exhibits the following properties: (a) when the plan
Externí odkaz:
http://arxiv.org/abs/2206.03863