Zobrazeno 1 - 10
of 370
pro vyhledávání: '"Doermann, David"'
Autor:
Zhai, Yuanhao, Lin, Kevin, Li, Linjie, Lin, Chung-Ching, Wang, Jianfeng, Yang, Zhengyuan, Doermann, David, Yuan, Junsong, Liu, Zicheng, Wang, Lijuan
Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and m
Externí odkaz:
http://arxiv.org/abs/2407.10937
Autor:
Ma, Tianren, Xie, Lingxi, Tian, Yunjie, Yang, Boyu, Zhang, Yuan, Doermann, David, Ye, Qixiang
An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing
Externí odkaz:
http://arxiv.org/abs/2406.11327
Autor:
Zhai, Yuanhao, Lin, Kevin, Yang, Zhengyuan, Li, Linjie, Wang, Jianfeng, Lin, Chung-Ching, Doermann, David, Yuan, Junsong, Wang, Lijuan
Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public vide
Externí odkaz:
http://arxiv.org/abs/2406.06890
Autor:
Qiu, Jihao, Zhang, Yuan, Tang, Xi, Xie, Lingxi, Ma, Tianren, Yan, Pengyu, Doermann, David, Ye, Qixiang, Tian, Yunjie
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2406.00258
Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance
Externí odkaz:
http://arxiv.org/abs/2403.00209
Autor:
Gong, Xuan, Li, Shanglin, Bao, Yuxiang, Yao, Barry, Huang, Yawen, Wu, Ziyan, Zhang, Baochang, Zheng, Yefeng, Doermann, David
Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deplo
Externí odkaz:
http://arxiv.org/abs/2312.14478
The agricultural sector plays an essential role in the economic growth of a country. Specifically, in an Indian context, it is the critical source of livelihood for millions of people living in rural areas. Plant Disease is one of the significant fac
Externí odkaz:
http://arxiv.org/abs/2404.16833
Organizational charts, also known as org charts, are critical representations of an organization's structure and the hierarchical relationships between its components and positions. However, manually extracting information from org charts can be erro
Externí odkaz:
http://arxiv.org/abs/2311.10234
We propose a neural network architecture that learns body part appearances for soccer player re-identification. Our model consists of a two-stream network (one stream for appearance map extraction and the other for body part map extraction) and a bil
Externí odkaz:
http://arxiv.org/abs/2310.14469
Autor:
Zhai, Yuanhao, Liu, Ziyi, Wu, Zhenyu, Wu, Yi, Zhou, Chunluan, Doermann, David, Yuan, Junsong, Hua, Gang
Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have dif
Externí odkaz:
http://arxiv.org/abs/2309.01265