Assisted Transcription of Historical Documents by Keyword Spotting: A Performance Model
Autor: | Angelo Marcelli, Adolfo Santoro, Claudio De Stefano |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
Vocabulary
Computer science media_common.quotation_subject 02 engineering and technology keyword spotting computer.software_genre 01 natural sciences Historical document processing keyword spotting performance evaluation 0103 physical sciences 0202 electrical engineering electronic engineering information engineering 010306 general physics Performance model media_common business.industry Image segmentation performance evaluation Historical document processing Ranking Handwriting recognition Keyword spotting ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Task analysis 020201 artificial intelligence & image processing Artificial intelligence Transcription (software) business computer Natural language processing |
Zdroj: | ICDAR |
Popis: | We propose a model for estimating the time to transcribe a large collection of historical handwritten documents when the transcription is assisted by a keyword spotting system following the query-by-string approach. The model assumes that the system is segmentation-based and provides as output the transcription of each item (either right or wrong) or a reject. We also assume that any other information the system may need is obtained from the training set. The model has been validated by comparing its estimates with the actual time required for the manual transcription of pages from the Bentham dataset. Eventually, we discuss possible ways of extending the model to consider different kind of keyword spotting system, such as those providing the output in terms of a ranked list of alternatives and/or adopting the query-by-example approach. |
Databáze: | OpenAIRE |
Externí odkaz: |