Zobrazeno 1 - 10
of 131
pro vyhledávání: '"Lin, Chung‐Ching"'
Autor:
Wang, Xiyao, Yang, Zhengyuan, Li, Linjie, Lu, Hongjin, Xu, Yuancheng, Lin, Chung-Ching, Lin, Kevin, Huang, Furong, Wang, Lijuan
Despite significant advancements in vision-language models (VLMs), there lacks effective approaches to enhance response quality by scaling inference-time computation. This capability is known to be a core step towards the self-improving models in rec
Externí odkaz:
http://arxiv.org/abs/2412.03704
Autor:
Zhao, Yuyang, Lin, Chung-Ching, Lin, Kevin, Yan, Zhiwen, Li, Linjie, Yang, Zhengyuan, Wang, Jianfeng, Lee, Gim Hee, Wang, Lijuan
Recent developments in 2D visual generation have been remarkably successful. However, 3D and 4D generation remain challenging in real-world applications due to the lack of large-scale 4D data and effective model design. In this paper, we propose to j
Externí odkaz:
http://arxiv.org/abs/2411.02319
Autor:
Hong, Yining, Liu, Beide, Wu, Maxine, Zhai, Yuanhao, Chang, Kai-Wei, Li, Linjie, Lin, Kevin, Lin, Chung-Ching, Wang, Jianfeng, Yang, Zhengyuan, Wu, Yingnian, Wang, Lijuan
Human beings are endowed with a complementary learning system, which bridges the slow learning of general world dynamics with fast storage of episodic memory from a new experience. Previous video generation models, however, primarily focus on slow le
Externí odkaz:
http://arxiv.org/abs/2410.23277
Autor:
Yu, Weihao, Yang, Zhengyuan, Ren, Lingfeng, Li, Linjie, Wang, Jianfeng, Lin, Kevin, Lin, Chung-Ching, Liu, Zicheng, Wang, Lijuan, Wang, Xinchao
MM-Vet, with open-ended vision-language questions targeting at evaluating integrated capabilities, has become one of the most popular benchmarks for large multimodal model evaluation. MM-Vet assesses six core vision-language (VL) capabilities: recogn
Externí odkaz:
http://arxiv.org/abs/2408.00765
Autor:
Zhai, Yuanhao, Lin, Kevin, Li, Linjie, Lin, Chung-Ching, Wang, Jianfeng, Yang, Zhengyuan, Doermann, David, Yuan, Junsong, Liu, Zicheng, Wang, Lijuan
Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and m
Externí odkaz:
http://arxiv.org/abs/2407.10937
Autor:
Zhai, Yuanhao, Lin, Kevin, Yang, Zhengyuan, Li, Linjie, Wang, Jianfeng, Lin, Chung-Ching, Doermann, David, Yuan, Junsong, Wang, Lijuan
Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public vide
Externí odkaz:
http://arxiv.org/abs/2406.06890
Choosing appropriate hyperparameters plays a crucial role in the success of neural networks as hyper-parameters directly control the behavior and performance of the training algorithms. To obtain efficient tuning, Bayesian optimization methods based
Externí odkaz:
http://arxiv.org/abs/2402.04885
Autor:
Lin, Kevin, Ahmed, Faisal, Li, Linjie, Lin, Chung-Ching, Azarnasab, Ehsan, Yang, Zhengyuan, Wang, Jianfeng, Liang, Lin, Liu, Zicheng, Lu, Yumao, Liu, Ce, Wang, Lijuan
We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. MM-VID is designed to address the challenges posed by long-fo
Externí odkaz:
http://arxiv.org/abs/2310.19773
Autor:
Yang, Zhengyuan, Wang, Jianfeng, Li, Linjie, Lin, Kevin, Lin, Chung-Ching, Liu, Zicheng, Wang, Lijuan
We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation. Humans can quickly identify the characteristics of different text-to-image (T2I) models via itera
Externí odkaz:
http://arxiv.org/abs/2310.08541
Autor:
Li, Xiang, Chen, Yinpeng, Lin, Chung-Ching, Chen, Hao, Hu, Kai, Singh, Rita, Raj, Bhiksha, Wang, Lijuan, Liu, Zicheng
This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components. Our method, named MaskComp, delineates the completion process through iterative stages of gene
Externí odkaz:
http://arxiv.org/abs/2310.00808