Zobrazeno 1 - 10
of 32
pro vyhledávání: '"Thomas A. Nartker"'
Publikováno v:
Information Processing & Management. 40:441-458
We report on two types of experiments with respect to manually-assigned keywords to documents in a collection. The first type of experiment examines the usefulness of manually-assigned keywords to automatic feedback. The second type of experiment con
Optical character recognition (OCR) is the most prominent and successful example of pattern recognition to date. There are thousands of research papers and dozens of OCR products. Optical Character Rcognition: An Illustrated Guide to the Frontier off
Publikováno v:
Algorithmica. 18:271-280
Finding a sequence of edit operations that transforms one string of symbols into another with the minimum cost is a well-known problem. The minimum cost, or edit distance, is a widely used measure of the similarity of two strings. An important parame
Publikováno v:
IEEE Transactions on Pattern Analysis and Machine Intelligence. 17:86-90
Many current optical character recognition (OCR) systems attempt to decompose printed pages into a set of zones, each containing a single column of text, before converting the characters into coded form. The authors present a methodology for automati
Publikováno v:
Document Image Analysis
When optical character recognition (OCR) devices process the same page image, they generate similar text strings. Differences are due to recognition errors. A page of text rarely contains long repeated substrings; therefore, N strings generated by OC
Publikováno v:
DRR
Extraction of metadata from documents is a tedious and expensive process. In general, documents are manually reviewed for structured data such as title, author, date, organization, etc. The purpose of extraction is to build metadata for documents tha
Publikováno v:
DRR
We report on an attempt to build an automatic redaction system by applying information extraction techniques to the identification of private dates of birth. We conclude that automatic redaction is a promising concept although information extraction
Publikováno v:
ITCC
In this paper, we report on a series of experiments involving feedback and query expansion. We conclude that query expansion using manually-assigned keywords has no advantage over expansion using terms from the text of the document.
Publikováno v:
DRR
This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be a
Publikováno v:
Proceedings of the 1st ACM workshop on Hardcopy document processing.
Over the last 15 years, the Information Science Research Institute (ISRI) at the University of Nevada, Las Vegas (UNLV) has conducted information access research in the presence of OCR errors. Our research has focused on issues associated with the co