Reliable Indexing Using Unreliable Recognition Devices

Autor: James K. Mullin
Rok vydání: 1981
Předmět:
Zdroj: IEEE Transactions on Pattern Analysis and Machine Intelligence. :347-350
ISSN: 0162-8828
DOI: 10.1109/tpami.1981.4767108
Popis: A new method is described and tested for using an unreliable character recognition device to produce a reliable index for a collection of documents. All highly likely substitution errors of the recognition device are handled by transforming characters which confuse readily into the same pseudocharacter. An analysis of the method is done showing the expected precision (fraction of words correctly found to words present) and recall (fraction of words retrieved properly to those which were retrieved). Published substitution error matrices were employed, along with a large file of words and word frequencies to evaluate the method. Performance was surprisingly good. Suggestions for further enhancements are given.
Databáze: OpenAIRE