Features based approach for indexation and representation of unstructured Arabic documents

Autor:	Mohamed Salim El Bazzi, Abdelatif Ennaji, Taher Zaki, Driss Mammass
Rok vydání:	2017
Předmět:	Physics and Astronomy (miscellaneous) Computer science Arabic InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL computer.software_genre lcsh:Technology Management of Technology and Innovation Keyphrases lcsh:Science Engineering (miscellaneous) Indexation lcsh:T business.industry Arabic text mining Representation (systemics) Classification language.human_language Unstructured documents ComputingMethodologies_DOCUMENTANDTEXTPROCESSING language lcsh:Q Artificial intelligence business computer Natural language processing
Zdroj:	Advances in Science, Technology and Engineering Systems, Vol 2, Iss 3, Pp 900-905 (2017)
ISSN:	2415-6698
DOI:	10.25046/aj0203112
Popis:	The increase of textual information published in Arabic language on the internet, public libraries and administrations requires implementing effective techniques for the extraction of relevant information contained in large corpus of texts. The purpose of indexing is to create a document representation that easily find and identify the relevant information in a set of documents. However, mining textual data is becoming a complicated task, especially when taking semantic into consideration. In this paper, we will present an indexation system based on contextual representation that will take the advantage of semantic links given in a document. Our approach is based on the extraction of keyphrases. Then, each document is represented by its relevant keyphrases instead of its simple keywords. The experimental results confirms the effectiveness of our approach.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::65375c8dd20a15d32ff1422cd0ef596a https://doi.org/10.25046/aj0203112 Zobrazit plný text záznamu