Výsledky vyhledávání - "Doermann, David"

Report

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

Autor: Zhai, Yuanhao, Lin, Kevin, Li, Linjie, Lin, Chung-Ching, Wang, Jianfeng, Yang, Zhengyuan, Doermann, David, Yuan, Junsong, Liu, Zicheng, Wang, Lijuan

Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and m

Externí odkaz: http://arxiv.org/abs/2407.10937

Zobrazit plný text záznamu

Report

ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

Autor: Ma, Tianren, Xie, Lingxi, Tian, Yunjie, Yang, Boyu, Zhang, Yuan, Doermann, David, Ye, Qixiang

An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing

Externí odkaz: http://arxiv.org/abs/2406.11327

Zobrazit plný text záznamu

Report

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

Autor: Zhai, Yuanhao, Lin, Kevin, Yang, Zhengyuan, Li, Linjie, Wang, Jianfeng, Lin, Chung-Ching, Doermann, David, Yuan, Junsong, Wang, Lijuan

Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public vide

Externí odkaz: http://arxiv.org/abs/2406.06890

Zobrazit plný text záznamu

Report

Artemis: Towards Referential Understanding in Complex Videos

Autor: Qiu, Jihao, Zhang, Yuan, Tang, Xi, Xie, Lingxi, Ma, Tianren, Yan, Pengyu, Doermann, David, Ye, Qixiang, Tian, Yunjie

Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we

Externí odkaz: http://arxiv.org/abs/2406.00258

Zobrazit plný text záznamu

Report

ChartReformer: Natural Language-Driven Chart Image Editing

Autor: Yan, Pengyu, Bhosale, Mahesh, Lal, Jay, Adhikari, Bikhyat, Doermann, David

Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance

Externí odkaz: http://arxiv.org/abs/2403.00209

Zobrazit plný text záznamu

Report

Federated Learning via Input-Output Collaborative Distillation

Autor: Gong, Xuan, Li, Shanglin, Bao, Yuxiang, Yao, Barry, Huang, Yawen, Wu, Ziyan, Zhang, Baochang, Zheng, Yefeng, Doermann, David

Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deplo

Externí odkaz: http://arxiv.org/abs/2312.14478

Zobrazit plný text záznamu

Report

Leaf-Based Plant Disease Detection and Explainable AI

Autor: Sagar, Saurav, Javed, Mohammed, Doermann, David S

The agricultural sector plays an essential role in the economic growth of a country. Specifically, in an Indian context, it is the critical source of livelihood for millions of people living in rural areas. Plant Disease is one of the significant fac

Externí odkaz: http://arxiv.org/abs/2404.16833

Zobrazit plný text záznamu

Report

The Analysis and Extraction of Structure from Organizational Charts

Autor: Manali, Nikhil, Doermann, David, Desai, Mahesh

Organizational charts, also known as org charts, are critical representations of an organization's structure and the hierarchical relationships between its components and positions. However, manually extracting information from org charts can be erro

Externí odkaz: http://arxiv.org/abs/2311.10234

Zobrazit plný text záznamu

Report

Player Re-Identification Using Body Part Appearences

Autor: Bhosale, Mahesh, Kumar, Abhishek, Doermann, David

We propose a neural network architecture that learns body part appearances for soccer player re-identification. Our model consists of a two-stream network (one stream for appearance map extraction and the other for body part map extraction) and a bil

Externí odkaz: http://arxiv.org/abs/2310.14469

Zobrazit plný text záznamu

Report

SOAR: Scene-debiasing Open-set Action Recognition

Autor: Zhai, Yuanhao, Liu, Ziyi, Wu, Zhenyu, Wu, Yi, Zhou, Chunluan, Doermann, David, Yuan, Junsong, Hua, Gang

Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have dif

Externí odkaz: http://arxiv.org/abs/2309.01265

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání