Character-Based Handwritten Text Transcription with Attention Networks
Autor: | Rafael Valle, Jason Poulos |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
FOS: Computer and information sciences
0209 industrial biotechnology Computer science Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Machine Learning (stat.ML) 02 engineering and technology 020901 industrial engineering & automation Transcription (linguistics) Artificial Intelligence Bounding overwatch Handwriting Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering Sequence Computer Science - Computation and Language business.industry Character (computing) Pattern recognition Sigmoid function Softmax function 020201 artificial intelligence & image processing Artificial intelligence business Computation and Language (cs.CL) Software Decoding methods |
Popis: | The paper approaches the task of handwritten text recognition (HTR) with attentional encoder–decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavily on individual characters, while sigmoid attention focuses on multiple characters at each step of the decoding. When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise. When a linear function is used to obtain attention weights, the model predicts a character by looking at the entire sequence of characters and performs poorly because it lacks a precise alignment between the source and target. Future research may explore HTR in natural scene images, since the model is capable of transcribing handwritten text without the need for producing segmentations or bounding boxes of text in images. |
Databáze: | OpenAIRE |
Externí odkaz: |