Mining Semantics Structures from Syntactic Structures in Web Document Corpora
Autor: | Shi Gao, Markus Iseli, Carlo Zaniolo, Deirdre Kerr, Hamid Mousavi |
---|---|
Rok vydání: | 2014 |
Předmět: |
Linguistics and Language
Information retrieval Parsing Computer Networks and Communications business.industry Computer science Semantic search computer.software_genre Query language Automatic summarization Computer Science Applications Text mining Artificial Intelligence Question answering Artificial intelligence business computer Web document Software Natural language processing Sentence Information Systems |
Zdroj: | International Journal of Semantic Computing. :461-489 |
ISSN: | 1793-7108 1793-351X |
Popis: | The Web is making possible many advanced text-mining applications, such as news summarization, essay grading, question answering, semantic search and structured queries on corpora of Web documents. For many of such applications, statistical text-mining techniques are of limited effectiveness since they do not utilize the morphological structure of the text. On the other hand, many approaches use NLP-based techniques that parse the text into parse trees, and then use patterns to mine and analyze parse trees which are often unnecessarily complex. To reduce this complexity and ease the entire process of text mining, we propose a weighted-graph representation of text, called TextGraphs, which captures the grammatical and semantic relations between words and terms in the text. TextGraphs are generated using a new text mining framework which is the main focus of this paper. Our framework, SemScape, uses a statistical parser to generate few of the most probable parse trees for each sentence and employs a novel two-step pattern-based technique to extract from parse trees candidate terms and their grammatical relations. Moreover, SemScape resolves coreferences by a novel technique, generates domain-specific TextGraphs by consulting ontologies, and provides a SPARQL-like query language and an optimized engine for semantically querying and mining TextGraphs. |
Databáze: | OpenAIRE |
Externí odkaz: |