How Good Is Good Enough? Establishing Quality Thresholds for the Automatic Text Analysis of Retro-Digitized Comics
Autor: | Alexander Dunst, Rita Hartel |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Computer science 02 engineering and technology Text recognition Comics computer.software_genre 030218 nuclear medicine & medical imaging 03 medical and health sciences 0302 clinical medicine Text mining Transcription (linguistics) Authorship attribution 0202 electrical engineering electronic engineering information engineering Stylometry 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing |
Zdroj: | MultiMedia Modeling ISBN: 9783030057152 MMM (2) |
DOI: | 10.1007/978-3-030-05716-9_59 |
Popis: | Stylometry in the form of simple statistical text analysis has proven to be a powerful tool for text classification, e.g. in the form of authorship attribution. When analyzing retro-digitized comics, manga and graphic novels, the researcher is confronted with the problem that automated text recognition (ATR) still leads to results that have comparatively high error rates, while the manual transcription of texts remains highly time-consuming. In this paper, we present an approach and measures that specify whether stylometry based on unsupervised ATR will produce reliable results for a given dataset of comics images. |
Databáze: | OpenAIRE |
Externí odkaz: |