Mining Semantics Structures from Syntactic Structures in Web Document Corpora

Autor: Shi Gao, Markus Iseli, Carlo Zaniolo, Deirdre Kerr, Hamid Mousavi
Rok vydání: 2014
Předmět:
Zdroj: International Journal of Semantic Computing. :461-489
ISSN: 1793-7108
1793-351X
Popis: The Web is making possible many advanced text-mining applications, such as news summarization, essay grading, question answering, semantic search and structured queries on corpora of Web documents. For many of such applications, statistical text-mining techniques are of limited effectiveness since they do not utilize the morphological structure of the text. On the other hand, many approaches use NLP-based techniques that parse the text into parse trees, and then use patterns to mine and analyze parse trees which are often unnecessarily complex. To reduce this complexity and ease the entire process of text mining, we propose a weighted-graph representation of text, called TextGraphs, which captures the grammatical and semantic relations between words and terms in the text. TextGraphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves coreferences by a novel technique, generates domain-specific TextGraphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining TextGraphs.
Databáze: OpenAIRE