Enabling search for facts and implied facts in historical documents

Autor:	Spencer Machado, Thomas L. Packer, Joseph Park, Stephen W. Liddle, David W. Embley, Andrew Zitzelberger, Nathan Tate, Deryle Lonsdale
Rok vydání:	2011
Předmět:	Annotation Information retrieval Point (typography) Conceptualization Computer science Interface (Java) Process (engineering) Fact extraction Semantic reasoner
Zdroj:	HIP@ICDAR
DOI:	10.1145/2037342.2037353
Popis:	Building a database of facts extracted from historical documents to enable database-like query and search would reduce the tedium of gleaning facts of interest from historical documents. We propose a solution in which historical documents themselves constitute the stored database. In our solution, we use information-extraction techniques to produce a conceptualized external annotation of facts found in each document, and we superimpose the conceptualization over the document collection. The annotation process populates the conceptualization producing a repository of extracted facts, and a reasoner obtains inferred facts from these extracted facts. Our query interface accepts free-form queries and converts them to formal queries over the extracted and inferred facts. Displayed results include, in addition to standard query results, images of original documents with results highlighted along with reasoning chains for inferred facts grounded in these highlighted facts. Along with giving the implementation status of our proof-of-concept prototype, we present results for extraction accuracy and efficiency and point to current and future work needed to enable a practical solution for the envisioned historical-document database.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::a80a77911080fbeaca3620d982692d4a https://doi.org/10.1145/2037342.2037353 Zobrazit plný text záznamu