Výsledky vyhledávání - "Thomas A. Nartker"

The role of manually-assigned keywords in query expansion

Autor: Allen Condit, Thomas A. Nartker, Julie Borsack, Kazem Taghva

Publikováno v: Information Processing & Management. 40:441-458

We report on two types of experiments with respect to manually-assigned keywords to documents in a collection. The first type of experiment examines the usefulness of manually-assigned keywords to automatic feedback. The second type of experiment con

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::4dcd0312f8a39027213da7b06d983d17
https://doi.org/10.1016/j.ipm.2003.12.005

Zobrazit plný text záznamu

Elektronická kniha

Optical Character Recognition : An Illustrated Guide to the Frontier

Autor: Stephen V. Rice, George Nagy, Thomas A. Nartker

Optical character recognition (OCR) is the most prominent and successful example of pattern recognition to date. There are thousands of research papers and dozens of OCR products. Optical Character Rcognition: An Illustrated Guide to the Frontier off

Zobrazit plný text záznamu

Classes of cost functions for string edit distance

Autor: Thomas A. Nartker, Horst Bunke, Stephen V. Rice

Publikováno v: Algorithmica. 18:271-280

Finding a sequence of edit operations that transforms one string of symbols into another with the minimum cost is a well-known problem. The minimum cost, or edit distance, is a widely used measure of the similarity of two strings. An important parame

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::2ff1ec45d5a6f33c6b983014c132450e
https://doi.org/10.1007/bf02526038

Zobrazit plný text záznamu

Automated evaluation of OCR zoning

Autor: Thomas A. Nartker, George Nagy, Junichi Kanai, Stephen V. Rice

Publikováno v: IEEE Transactions on Pattern Analysis and Machine Intelligence. 17:86-90

Many current optical character recognition (OCR) systems attempt to decompose printed pages into a set of zones, each containing a single column of text, before converting the characters into coded form. The authors present a methodology for automati

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::53884fa7463826ae997485a69b9ed260
https://doi.org/10.1109/34.368146

Zobrazit plný text záznamu

AN ALGORITHM FOR MATCHING OCR-GENERATED TEXT STRINGS

Autor: Junichi Kanai, Stephen V. Rice, Thomas A. Nartker

Publikováno v: Document Image Analysis

When optical character recognition (OCR) devices process the same page image, they generate similar text strings. Differences are due to recognition errors. A page of text rarely contains long repeated substrings; therefore, N strings generated by OC

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::45feb83ea9677a22b0ac6d5d45ce7564
https://doi.org/10.1142/s0218001494000632

Zobrazit plný text záznamu

Title extraction and generation from OCR'd documents

Autor: Allen Condit, Julie Borsack, Kazem Taghva, Thomas A. Nartker, Steven E. Lumos

Publikováno v: DRR

Extraction of metadata from documents is a tedious and expensive process. In general, documents are manually reviewed for structured data such as title, author, date, organization, etc. The purpose of extraction is to build metadata for documents tha

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::3b6b0d093765d501267ff0023e1259d2
https://doi.org/10.1117/12.712264

Zobrazit plný text záznamu

Automatic redaction of private information using relational information extraction

Autor: Ray Pereda, Kazem Taghva, Thomas A. Nartker, Jeffrey Coombs, Russell Beckley, Julie Borsack

Publikováno v: DRR

We report on an attempt to build an automatic redaction system by applying information extraction techniques to the identification of private dates of birth. We conclude that automatic redaction is a promising concept although information extraction

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::0fad35152e2608b0b81b5cf56285bb04
https://doi.org/10.1117/12.643126

Zobrazit plný text záznamu

Determining the usefulness of manually assigned keywords for a vector space system

Autor: Julie Borsack, Thomas A. Nartker, Kazem Taghva, Allen Condit

Publikováno v: ITCC

In this paper, we report on a series of experiments involving feedback and query expansion. We conclude that query expansion using manually-assigned keywords has no advantage over expansion using terms from the text of the document.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::a3ac26ac1cf234ebdfad4fe1eccde45c
https://doi.org/10.1109/itcc.2002.1000394

Zobrazit plný text záznamu

Address extraction using hidden Markov models

Autor: Kazem Taghva, Jeffrey Coombs, Ray Pereda, Thomas A. Nartker

Publikováno v: DRR

This paper presents the implementation and evaluation of a Hidden Markov Model to extract addresses from OCR text. Although Hidden Markov Models discover addresses with high precision and recall, this type of Information Extraction task seems to be a

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::47595143522e6ca288fa5f461045da81
https://doi.org/10.1117/12.587799

Zobrazit plný text záznamu

Information access in the presence of OCR errors

Autor: Kazem Taghva, Thomas A. Nartker, Julie Borsack

Publikováno v: Proceedings of the 1st ACM workshop on Hardcopy document processing.

Over the last 15 years, the Information Science Research Institute (ISRI) at the University of Nevada, Las Vegas (UNLV) has conducted information access research in the presence of OCR errors. Our research has focused on issues associated with the co

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::595e7135dd33482024cf4acc88d05a1b
https://doi.org/10.1145/1031442.1031443

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání