Automatic taxonomy extraction for specialized domains using distributional semantics

Autor:	Leo Wanner, Jorge Vivaldi, Rogelio Nazar
Rok vydání:	2012
Předmět:	Syntagmatic analysis Exploit Computer science Terminology extraction business.industry Communication Library and Information Sciences computer.software_genre Language and Linguistics Quantitative linguistics Rule-based machine translation Taxonomy (general) Artificial intelligence Distributional semantics business computer Statistic Natural language processing
Zdroj:	Terminology. 18:188-225
ISSN:	1569-9994 0929-9971
DOI:	10.1075/term.18.2.03naz
Popis:	This article explores a statistical, language-independent methodology for the construction of taxonomies of specialized domains from noisy corpora. In contrast to proposals that exploit linguistic information by searching for lexico-syntactic patterns that tend to express the hypernymy relation, our methodology relies entirely upon the distributional semantics of terms as captured by their lexical co-occurrence in large scale corpora. In a first stage, we analyze the syntagmatic relations of terms that serve as seeds of the taxonomy to be constructed and we obtain, thus, the first batch of hypernym candidate terms for our seed terms. In a second stage, we analyze the paradigmatic relations of the terms by inspecting which terms show a prominent frequency of co-occurrence with the terms that, as we found in the previous stage, are syntagmatically related to our seed terms — which allows us to refine the first batch of hypernym candidate terms and obtain new ones. In a third and final stage, we build a taxonomy from the obtained hypernym candidate lists, exploiting the asymmetric statistic association between terms that is characteristic of the hypernymy relation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::6b27026a3da426a22056e8d94f56103e https://doi.org/10.1075/term.18.2.03naz Zobrazit plný text záznamu