Výsledky vyhledávání - "Liao, Minghui"

Report

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

Autor: Yu, Ya-Qi, Liao, Minghui, Wu, Jihao, Liao, Yongxin, Zheng, Xiaoyu, Zeng, Wei

Multimodal Large Language Models (MLLMs) have shown impressive results on various multimodal tasks. However, most existing MLLMs are not well suited for document-oriented tasks, which require fine-grained image perception and information compression.

Externí odkaz: http://arxiv.org/abs/2404.09204

Zobrazit plný text záznamu

Report

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Autor: Zhang, Jiwen, Wu, Jihao, Teng, Yihua, Liao, Minghui, Xu, Nuo, Xiao, Xiao, Wei, Zhongyu, Tang, Duyu

Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API. Even though the task highly relies on past actions and visual o

Externí odkaz: http://arxiv.org/abs/2403.02713

Zobrazit plný text záznamu

Report

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

Autor: Yang, Mingkun, Yang, Biao, Liao, Minghui, Zhu, Yingying, Bai, Xiang

Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training. However, collecting and labeling real text images is expensive and time-consuming, which limits the availability of real data. Therefore, most e

Externí odkaz: http://arxiv.org/abs/2402.15806

Zobrazit plný text záznamu

Report

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

Autor: Yang, Mingkun, Yang, Biao, Liao, Minghui, Zhu, Yingying, Bai, Xiang

Scene text recognition is a rapidly developing field that faces numerous challenges due to the complexity and diversity of scene text, including complex backgrounds, diverse fonts, flexible arrangements, and accidental occlusions. In this paper, we p

Externí odkaz: http://arxiv.org/abs/2402.13643

Zobrazit plný text záznamu

Report

Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification

Autor: Liao, Minghui, Wan, Guojia, Du, Bo

Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular chara

Externí odkaz: http://arxiv.org/abs/2312.14518

Zobrazit plný text záznamu

Report

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

Autor: Zhang, Ziyin, Lu, Ning, Liao, Minghui, Huang, Yongshuai, Li, Cheng, Wang, Min, Peng, Wei

Text recognition methods are gaining rapid development. Some advanced techniques, e.g., powerful modules, language models, and un- and semi-supervised learning schemes, consecutively push the performance on public benchmarks forward. However, the pro

Externí odkaz: http://arxiv.org/abs/2308.08806

Zobrazit plný text záznamu

Report

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

Autor: Yang, Mingkun, Liao, Minghui, Lu, Pu, Wang, Jing, Zhu, Shenggao, Luo, Hualin, Tian, Qi, Bai, Xiang

Existing text recognition methods usually need large-scale training data. Most of them rely on synthetic training data due to the lack of annotated real images. However, there is a domain gap between the synthetic data and real data, which limits the

Externí odkaz: http://arxiv.org/abs/2207.00193

Zobrazit plný text záznamu

Report

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

Autor: Dikubab, Wondimu, Liang, Dingkang, Liao, Minghui, Bai, Xiang

Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e.g., Amharic, Tigrinya) in East Africa for more than 120 million people. The Amharic writing system, Abugida, has 282 syllables, 15 punctuation

Externí odkaz: http://arxiv.org/abs/2203.12165

Zobrazit plný text záznamu

Report

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

Autor: Liao, Minghui, Zou, Zhisheng, Wan, Zhaoyi, Yao, Cong, Bai, Xiang

Recently, segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field, because of their superiority in detecting the text instances of arbitrary shapes and extreme aspect ratios, profiting from the

Externí odkaz: http://arxiv.org/abs/2202.10304

Zobrazit plný text záznamu

Report

SGEN: Single-cell Sequencing Graph Self-supervised Embedding Network

Autor: Liu, Ziyi, Liao, Minghui, luo, Fulin, Du, Bo

Single-cell sequencing has a significant role to explore biological processes such as embryonic development, cancer evolution, and cell differentiation. These biological properties can be presented by a two-dimensional scatter plot. However, single-c

Externí odkaz: http://arxiv.org/abs/2110.09413

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání