Výsledky vyhledávání - "Cheng Wen Huang"

Report

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

Autor: Yeo, Jeong Hun, Kim, Chae Won, Kim, Hyunjun, Rha, Hyeongseop, Han, Seunghee, Cheng, Wen-Huang, Ro, Yong Man

Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information suc

Externí odkaz: http://arxiv.org/abs/2409.00986

Zobrazit plný text záznamu

Report

ReCorD: Reasoning and Correcting Diffusion for HOI Generation

Autor: Jiang-Lin, Jian-Yu, Huang, Kang-Yang, Lo, Ling, Huang, Yi-Ning, Lin, Terence, Wu, Jhih-Ciang, Shuai, Hong-Han, Cheng, Wen-Huang

Diffusion models revolutionize image generation by leveraging natural language to guide the creation of multimedia content. Despite significant advancements in such generative models, challenges persist in depicting detailed human-object interactions

Externí odkaz: http://arxiv.org/abs/2407.17911

Zobrazit plný text záznamu

Report

The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation

Autor: Yao, Yi, Hsu, Chan-Feng, Lin, Jhe-Hao, Xie, Hongxia, Lin, Terence, Huang, Yi-Ning, Shuai, Hong-Han, Cheng, Wen-Huang

In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data. This work explores how diffusion models can generate images

Externí odkaz: http://arxiv.org/abs/2407.12579

Zobrazit plný text záznamu

Report

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

Autor: Liu, Hou-I, Tseng, Yu-Wen, Chang, Kai-Cheng, Wang, Pin-Jyun, Shuai, Hong-Han, Cheng, Wen-Huang

Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challe

Externí odkaz: http://arxiv.org/abs/2406.05755

Zobrazit plný text záznamu

Report

SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge

Autor: Wu, Bo, Liu, Peiye, Cheng, Wen-Huang, Liu, Bei, Zeng, Zhaoyang, Wang, Jia, Huang, Qiushi, Luo, Jiebo

Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating so

Externí odkaz: http://arxiv.org/abs/2405.10497

Zobrazit plný text záznamu

Report

An Investigation of Incorporating Mamba for Speech Enhancement

Autor: Chao, Rong, Cheng, Wen-Huang, La Quatra, Moreno, Siniscalchi, Sabato Marco, Yang, Chao-Han Huck, Fu, Szu-Wei, Tsao, Yu

This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the proper

Externí odkaz: http://arxiv.org/abs/2405.06573

Zobrazit plný text záznamu

Report

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning

Autor: Xie, Hongxia, Peng, Chu-Jun, Tseng, Yu-Wen, Chen, Hung-Jen, Hsu, Chan-Feng, Shuai, Hong-Han, Cheng, Wen-Huang

Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but

Externí odkaz: http://arxiv.org/abs/2404.16670

Zobrazit plný text záznamu

Report

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Autor: Liu, Hou-I, Galindo, Marco, Xie, Hongxia, Wong, Lai-Kuan, Shuai, Hong-Han, Li, Yung-Hui, Cheng, Wen-Huang

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improveme

Externí odkaz: http://arxiv.org/abs/2404.07236

Zobrazit plný text záznamu

Report

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection

Autor: Liu, Hou-I, Wu, Christine, Cheng, Jen-Hao, Chai, Wenhao, Wang, Shian-Yun, Liu, Gaowen, Hwang, Jenq-Neng, Shuai, Hong-Han, Cheng, Wen-Huang

Monocular 3D object detection (Mono3D) is an indispensable research topic in autonomous driving, thanks to the cost-effective monocular camera sensors and its wide range of applications. Since the image perspective has depth ambiguity, the challenges

Externí odkaz: http://arxiv.org/abs/2404.04910

Zobrazit plný text záznamu

Report

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Autor: Huang, Yi-Xin, Liu, Hou-I, Shuai, Hong-Han, Cheng, Wen-Huang

Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects,

Externí odkaz: http://arxiv.org/abs/2404.03507

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání