Automatic Synset Extraction from text documents using a Graph-Based Clustering Approach

Autor:	Mahsa Khorasani, Behrouz Minaei-Bidgoli, Chakaveh Saedi
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	automatic synset extraction semantic relation graph-based clustering cbc clustering persian Information technology T58.5-58.64 Telecommunication TK5101-6720 Electronic computers. Computer science QA75.5-76.95
Zdroj:	International Journal of Information and Communication Technology Research, Vol 11, Iss 1, Pp 27-35 (2019)
Druh dokumentu:	article
ISSN:	2251-6107 2783-4425
Popis:	Semantic relations between words like synsets are used in automatic ontology production which is a strong tool in many NLP tasks. Synset extraction is usually dependent on other languages and resources using techniques such as mapping or translation. In our proposed method, synsets are extracted merely from text and corpora. This frees us from the need for special resources including Word-Nets or dictionaries. The representation model for words of corpus is based on Vector Space model and the most similar words to each are extracted based on common features count (CFC) using a modified cosine similarity measure. Furthermore, a graph-based soft clustering approach is applied to create clusters of synonymous words. To examine performance of the proposed method, Extracted synsets were compared to other Persian semantic resources. Results show an accuracy of 80.25%, which indicates improvement in comparison to the 69.5% accuracy of pure clustering by committee method.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/c1c8a340c8d34fd38e18d33f5d7cafbd Zobrazit plný text záznamu View record in DOAJ