Zobrazeno 1 - 10
of 414
pro vyhledávání: '"Cheng Wen Huang"'
Autor:
Yeo, Jeong Hun, Kim, Chae Won, Kim, Hyunjun, Rha, Hyeongseop, Han, Seunghee, Cheng, Wen-Huang, Ro, Yong Man
Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information suc
Externí odkaz:
http://arxiv.org/abs/2409.00986
Autor:
Jiang-Lin, Jian-Yu, Huang, Kang-Yang, Lo, Ling, Huang, Yi-Ning, Lin, Terence, Wu, Jhih-Ciang, Shuai, Hong-Han, Cheng, Wen-Huang
Diffusion models revolutionize image generation by leveraging natural language to guide the creation of multimedia content. Despite significant advancements in such generative models, challenges persist in depicting detailed human-object interactions
Externí odkaz:
http://arxiv.org/abs/2407.17911
Autor:
Yao, Yi, Hsu, Chan-Feng, Lin, Jhe-Hao, Xie, Hongxia, Lin, Terence, Huang, Yi-Ning, Shuai, Hong-Han, Cheng, Wen-Huang
In spite of recent advancements in text-to-image generation, limitations persist in handling complex and imaginative prompts due to the restricted diversity and complexity of training data. This work explores how diffusion models can generate images
Externí odkaz:
http://arxiv.org/abs/2407.12579
Autor:
Liu, Hou-I, Tseng, Yu-Wen, Chang, Kai-Cheng, Wang, Pin-Jyun, Shuai, Hong-Han, Cheng, Wen-Huang
Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challe
Externí odkaz:
http://arxiv.org/abs/2406.05755
Autor:
Wu, Bo, Liu, Peiye, Cheng, Wen-Huang, Liu, Bei, Zeng, Zhaoyang, Wang, Jia, Huang, Qiushi, Luo, Jiebo
Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating so
Externí odkaz:
http://arxiv.org/abs/2405.10497
Autor:
Chao, Rong, Cheng, Wen-Huang, La Quatra, Moreno, Siniscalchi, Sabato Marco, Yang, Chao-Han Huck, Fu, Szu-Wei, Tsao, Yu
This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the proper
Externí odkaz:
http://arxiv.org/abs/2405.06573
Autor:
Xie, Hongxia, Peng, Chu-Jun, Tseng, Yu-Wen, Chen, Hung-Jen, Hsu, Chan-Feng, Shuai, Hong-Han, Cheng, Wen-Huang
Visual Instruction Tuning represents a novel learning paradigm involving the fine-tuning of pre-trained language models using task-specific instructions. This paradigm shows promising zero-shot results in various natural language processing tasks but
Externí odkaz:
http://arxiv.org/abs/2404.16670
Autor:
Liu, Hou-I, Galindo, Marco, Xie, Hongxia, Wong, Lai-Kuan, Shuai, Hong-Han, Li, Yung-Hui, Cheng, Wen-Huang
Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improveme
Externí odkaz:
http://arxiv.org/abs/2404.07236
Autor:
Liu, Hou-I, Wu, Christine, Cheng, Jen-Hao, Chai, Wenhao, Wang, Shian-Yun, Liu, Gaowen, Hwang, Jenq-Neng, Shuai, Hong-Han, Cheng, Wen-Huang
Monocular 3D object detection (Mono3D) is an indispensable research topic in autonomous driving, thanks to the cost-effective monocular camera sensors and its wide range of applications. Since the image perspective has depth ambiguity, the challenges
Externí odkaz:
http://arxiv.org/abs/2404.04910
Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects,
Externí odkaz:
http://arxiv.org/abs/2404.03507