Popis: |
In this chapter, we present a method for the segmentation of scanned images of newspapers and magazines. For further Mixed Raster Content (MRC) compression, we classify blocks of an image into background, or picture areas, or text regions. The method is relatively simple and fast, because it is intended for implementation in firmware. Textural features are calculated for image blocks. We train three one-to-rest classifiers by the AdaBoost technique on a publicly available dataset. The output of the classifiers can be treated as a posteriori probability. We smooth these probabilities among adjacent blocks. After smoothing, a voting procedure sets the class for each block. We argue that the Dual Leave-Group-of-Sources-Out cross-validation scheme is beneficial for the tuning of algorithm parameters. We discuss the advantages and shortcomings of several segmentation quality metrics. |