DAG Based Feature Additive XML Schema Generation for Unstructured Text

Autor:	S. Sudha, K. Rajbabu
Rok vydání:	2013
Předmět:	Document Structure Description Information extraction Information retrieval computer.internet_protocol Computer science Feature extraction Graph (abstract data type) Graphical model Directed graph Security token computer.software_genre computer XML
Zdroj:	CyberC
Popis:	Recent works on handling unstructured text employ multilevel filtering techniques for identifying the key terms in documents and then apply mining techniques to extract necessary information. Though these techniques are more efficient in information retrieval, they cannot be applied directly for information extraction, for documents that are more critical in context and also accuracy cannot be expected. Further, loss of hidden and significant information cannot be tolerated in data critical applications emerging based on unstructured documents. Hence, a novel idea of re-organizing the unstructured textual model into feature enriched structured graphical model by adding spatial, logical, lexical, syntactical and semantic features is proposed. The generated graph depicts relationships across the document at all levels from its micro level token to macro level document. Moreover, a structural pattern identification algorithm for generating an XML schema from the generated graph is also recommended. The experimental outcome for a real-time dataset is presented.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::54eddcac1140c2d7b8f48b0a9c0314da https://doi.org/10.1109/cyberc.2013.27 Zobrazit plný text záznamu