Towards an automatic extraction of synonyms for Quranic Arabic WordNet.

Autor:	AlMaayah, Manal, Sawalha, Majdi, Abushariah, Mohammad
Předmět:	ARABIC language SYNONYMS CORPORA PARTS of speech
Zdroj:	International Journal of Speech Technology; Jun2016, Vol. 19 Issue 2, p177-189, 13p
Abstrakt:	In this paper, we developed an automatic extraction model of synonyms, which is used to construct our Quranic Arabic WordNet (QAWN) that depends on traditional Arabic dictionaries. In this work, we rely on three resources. First, the Boundary Annotated Quran Corpus that contains Quran words, Part-of-Speech, root and other related information. Second, the lexicon resources that was used to collect a set of derived words for Quranic words. Third, traditional Arabic dictionaries, which were used to extract the meaning of words with distinction of different senses. The objective of this work is to link the Quranic words of similar meanings in order to generate synonym sets (synsets). To accomplish that, we used term frequency and inverse document frequency in vector space model, and we then computed cosine similarities between Quranic words based on textual definitions that are extracted from traditional Arabic dictionaries. Words of highest similarity were grouped together to form a synset. Our QAWN consists of 6918 synsets that were constructed from about 8400 unique word senses, on average of 5 senses for each word. Based on our experimental evaluation, the average recall of the baseline system was 7.01 %, whereas the average recall of the QAWN was 34.13 % which improved the recall of semantic search for Quran concepts by 27 %. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu Full text from SpringerLink