Zobrazeno 1 - 10
of 409
pro vyhledávání: '"Yu A, Frank"'
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Autor:
Lin, Yen-Ting, Yang, Chao-Han Huck, Chen, Zhehuai, Zelasko, Piotr, Yang, Xuesong, Chen, Zih-Ching, Puvvada, Krishna C, Fu, Szu-Wei, Hu, Ke, Chiu, Jun Wei, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank
Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting the
Externí odkaz:
http://arxiv.org/abs/2411.05945
Autor:
Liu, Shih-Yang, Yang, Huck, Wang, Chien-Yi, Fung, Nai Chit, Yin, Hongxu, Sakr, Charbel, Muralidharan, Saurav, Cheng, Kwang-Ting, Kautz, Jan, Wang, Yu-Chiang Frank, Molchanov, Pavlo, Chen, Min-Hung
In this work, we re-formulate the model compression problem into the customized compensation problem: Given a compressed model, we aim to introduce residual low-rank paths to compensate for compression errors under customized requirements from users
Externí odkaz:
http://arxiv.org/abs/2410.21271
Autor:
Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, Lee, Hung-yi
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) by incorporating pre-trained speech models. However, these SLMs often undergo extensive speech instruction-tuning to bridge the gap be
Externí odkaz:
http://arxiv.org/abs/2409.20007
We introduce SAM4MLLM, an innovative approach which integrates the Segment Anything Model (SAM) with Multi-Modal Large Language Models (MLLMs) for pixel-aware tasks. Our method enables MLLMs to learn pixel-level location information without requiring
Externí odkaz:
http://arxiv.org/abs/2409.10542
Autor:
Hirota, Yusuke, Chen, Min-Hung, Wang, Chien-Yi, Nakashima, Yuta, Wang, Yu-Chiang Frank, Hachiuma, Ryo
Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e.g., gender and age). In this paper, we aim to address the problems of societal bias in CLIP. Although previous studies have
Externí odkaz:
http://arxiv.org/abs/2408.10202
NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthes
Externí odkaz:
http://arxiv.org/abs/2406.20066
Autor:
Chen, Jr-Jen, Liao, Yu-Chien, Lin, Hsi-Che, Yu, Yu-Chu, Chen, Yen-Chun, Wang, Yu-Chiang Frank
We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corres
Externí odkaz:
http://arxiv.org/abs/2406.19392
Autor:
Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Huang, He, Ginsburg, Boris, Wang, Yu-Chiang Frank, Lee, Hung-yi
Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning
Externí odkaz:
http://arxiv.org/abs/2406.18871
Autor:
Lin, Ci-Siang, Liu, I-Jieh, Chen, Min-Hung, Wang, Chien-Yi, Liu, Sifei, Wang, Yu-Chiang Frank
Referring Video Object Segmentation (RVOS) aims to segment the object referred to by the query sentence throughout the entire video. Most existing methods require end-to-end training with dense mask annotations, which could be computation-consuming a
Externí odkaz:
http://arxiv.org/abs/2406.12834
Autor:
Lai, Chun-Mao, Wang, Hsiang-Chun, Hsieh, Ping-Chun, Wang, Yu-Chiang Frank, Chen, Min-Hung, Sun, Shao-Hua
Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a gen
Externí odkaz:
http://arxiv.org/abs/2405.16194