Zobrazeno 1 - 10
of 531
pro vyhledávání: '"Chen‐Yi Chang"'
Large language models (LLMs) have significantly advanced autonomous agents, particularly in zero-shot tool usage, also known as function calling. This research delves into enhancing the function-calling capabilities of LLMs by exploring different app
Externí odkaz:
http://arxiv.org/abs/2412.01130
Autor:
Hsu, Chan-Jan, Chen, Yi-Chang, Liao, Feng-Ting, Ho, Pei-Chen, Wang, Yu-Hsiang, Hsu, Po-Chun, Shiu, Da-shan
We introduce "Generative Fusion Decoding" (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recogniti
Externí odkaz:
http://arxiv.org/abs/2405.14259
Breeze-7B is an open-source language model based on Mistral-7B, designed to address the need for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese. This technical report provides an overview of the additional pr
Externí odkaz:
http://arxiv.org/abs/2403.02712
The evaluation of large language models is an essential task in the field of language understanding and generation. As language models continue to advance, the need for effective benchmarks to assess their performance has become imperative. In the co
Externí odkaz:
http://arxiv.org/abs/2309.08448
In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model
Externí odkaz:
http://arxiv.org/abs/2307.10274
Autor:
Chen, Yi-Chang, Lee, Chi-En, Lin, Fan-Ya, Li, Ya-Jing, Lor, Kuo-Lung, Chang, Yeun-Chung, Chen, Chung-Ming
Publikováno v:
In Computer Methods and Programs in Biomedicine November 2024 256
Polyphone disambiguation is the most crucial task in Mandarin grapheme-to-phoneme (g2p) conversion. Previous studies have approached this problem using pre-trained language models, restricted output, and extra information from Part-Of-Speech (POS) ta
Externí odkaz:
http://arxiv.org/abs/2203.10430
Training recognition models with synthetic images have achieved remarkable results in text recognition. However, recognizing text from real-world images still faces challenges due to the domain shift between synthetic and real-world text images. One
Externí odkaz:
http://arxiv.org/abs/2202.11949
Scene text recognition (STR) has been widely studied in academia and industry. Training a text recognition model often requires a large amount of labeled data, but data labeling can be difficult, expensive, or time-consuming, especially for Tradition
Externí odkaz:
http://arxiv.org/abs/2111.13327
Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic co
Externí odkaz:
http://arxiv.org/abs/2111.08400