Key semantics extraction by dependency tree mining
Autor: | Takahiro Ikeda, Hiroki Arimura, Yosuke Sakao, Satoshi Morinaga, Susumu Akamine |
---|---|
Rok vydání: | 2005 |
Předmět: |
Reduction (recursion theory)
Dependency (UML) Phrase business.industry Computer science Semantics (computer science) Pattern recognition Semantics computer.software_genre Tree (data structure) Text mining Knowledge extraction Redundancy (engineering) Artificial intelligence business computer Sentence Natural language Natural language processing |
Zdroj: | KDD |
Popis: | We propose a new text mining system which extracts characteristic contents from given documents. We define Key semantics as characteristic sub-structures of syntactic dependencies in the given documents, and consider the following three tasks in this paper: 1)Key semantics extraction: extracting characteristic syntactic dependency structures not only as ordered trees but also as unordered trees and free trees, 2)Redundancy reduction: from the result of extraction, deleting redundant dependency structures such as sub-structures or equivalent structures of the others, and 3)Phrase/sentence reconstruction: generating a phrase or sentence in a natural language corresponding to the extracted structure.Our system is a combination of natural language processing techniques and tree mining techniques. The system consists of the following five units: 1) syntactic dependency analysis unit, 2) input filters, 3) characteristic ordered subtree extraction unit, 4) output filters, and 5) phrase/sentence reconstruction unit. Although ordered trees are extracted in the third unit, the overall behavior of the system can be switched into the extraction of ordered trees, unordered trees, or free trees depending on which of the input filters is/are applied in the second step. The output filters delete redundant trees from the extraction result for efficient knowledge discovery. Finally, phrases or sentences corresponding to the extracted subtrees are reconstructed by utilizing the input documents.We demonstrate the validity of our system by showing experimental results using real data collected at a help desk and TDT pilot corpus. |
Databáze: | OpenAIRE |
Externí odkaz: |