Multi-orientation scene text detection with scale-guided regression
Autor: | Xiaobin Zhu, Xu-Cheng Yin, Liang Min, Jingyan Qin, Chun Yang, Jie-Bo Hou |
---|---|
Rok vydání: | 2021 |
Předmět: |
Series (mathematics)
Scale (ratio) business.industry Computer science Orientation (computer vision) Cognitive Neuroscience Pattern recognition Regression Computer Science Applications Artificial Intelligence Margin (machine learning) Bounding overwatch Feature (machine learning) Key (cryptography) Artificial intelligence business |
Zdroj: | Neurocomputing. 461:310-318 |
ISSN: | 0925-2312 |
DOI: | 10.1016/j.neucom.2021.07.026 |
Popis: | Existing multi-orientation scene text detection methods generally contain two crucial components: regression prediction for text bounding boxes and classification prediction for text/non-text. However, these methods always regard classification prediction and regression prediction as two independent procedures, neglecting fully exploring their mutual relations. Based on this key observation, we propose an innovative Scale-Guided Regression Module (SRM), specially for multi-orientation scene text detection. Equipped with width-guided kernels and height-guided kernels of different sizes, our SRM can generate a series of scale feature maps of candidate texts by capturing their shape information in classification prediction. The scale feature maps are used to predict the width and height of candidate texts, which can serve as guides for regressing bounding boxes. In this way, the procedures of classification and regression can be coherently integrated. In addition, we adopt IoU loss to train our network and then integrate IoU loss and l 1 -smooth loss for fine-tuning. Extensive experiments on publicly available datasets demonstrate the state-of-the-art performance of our method. Notably, our method achieves significant improvement of performance on long texts, e.g., on MSRA-TD500, our method outperforms Basemodel with a great margin (4.86 % in terms of Recall). |
Databáze: | OpenAIRE |
Externí odkaz: |