Zobrazeno 1 - 10
of 183
pro vyhledávání: '"Liao, Minghui"'
Multimodal Large Language Models (MLLMs) have shown impressive results on various multimodal tasks. However, most existing MLLMs are not well suited for document-oriented tasks, which require fine-grained image perception and information compression.
Externí odkaz:
http://arxiv.org/abs/2404.09204
Autor:
Zhang, Jiwen, Wu, Jihao, Teng, Yihua, Liao, Minghui, Xu, Nuo, Xiao, Xiao, Wei, Zhongyu, Tang, Duyu
Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API. Even though the task highly relies on past actions and visual o
Externí odkaz:
http://arxiv.org/abs/2403.02713
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training. However, collecting and labeling real text images is expensive and time-consuming, which limits the availability of real data. Therefore, most e
Externí odkaz:
http://arxiv.org/abs/2402.15806
Scene text recognition is a rapidly developing field that faces numerous challenges due to the complexity and diversity of scene text, including complex backgrounds, diverse fonts, flexible arrangements, and accidental occlusions. In this paper, we p
Externí odkaz:
http://arxiv.org/abs/2402.13643
Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular chara
Externí odkaz:
http://arxiv.org/abs/2312.14518
Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the pro
Externí odkaz:
http://arxiv.org/abs/2308.08806
Autor:
Yang, Mingkun, Liao, Minghui, Lu, Pu, Wang, Jing, Zhu, Shenggao, Luo, Hualin, Tian, Qi, Bai, Xiang
Existing text recognition methods usually need large-scale training data. Most of them rely on synthetic training data due to the lack of annotated real images. However, there is a domain gap between the synthetic data and real data, which limits the
Externí odkaz:
http://arxiv.org/abs/2207.00193
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e.g., Amharic, Tigrinya) in East Africa for more than 120 million people. The Amharic writing system, Abugida, has 282 syllables, 15 punctuation
Externí odkaz:
http://arxiv.org/abs/2203.12165
Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the
Externí odkaz:
http://arxiv.org/abs/2202.10304
Single-cell sequencing has a significant role to explore biological processes such as embryonic development, cancer evolution, and cell differentiation. These biological properties can be presented by a two-dimensional scatter plot. However, single-c
Externí odkaz:
http://arxiv.org/abs/2110.09413