WikiAutoCat: Information Retrieval System for Automatic Categorization of Wikipedia Articles
Autor: | Elsayed E. Hemayed, Nesma Refaei, Riham Mansour |
---|---|
Rok vydání: | 2018 |
Předmět: |
Structure (mathematical logic)
Hierarchy Multidisciplinary Information retrieval Process (engineering) Computer science 02 engineering and technology 010501 environmental sciences 01 natural sciences Task (project management) Set (abstract data type) Categorization Scalability 0202 electrical engineering electronic engineering information engineering Benchmark (computing) 020201 artificial intelligence & image processing 0105 earth and related environmental sciences |
Zdroj: | Arabian Journal for Science and Engineering. 43:8095-8109 |
ISSN: | 2191-4281 2193-567X |
DOI: | 10.1007/s13369-018-3244-9 |
Popis: | Document categorization became a crucial task to organize the massive amount of data over the web. Moreover, many web repositories tended to classify its articles to hierarchies of topics. This structure facilitates connecting related topics and reaching articles. Wikipedia has organized its articles in a category hierarchy; but so far, the categorization process is done manually by human editors which is a confusing, tiring and a time-consuming task. In this work we propose WikiAutoCat system for automatic categorization of Wikipedia articles. It is an information retrieval system that suggests the most relevant set of categories to the article editor to simplify the categorization process. Empirical evaluation demonstrates that our system is scalable enough to perform the categorization process of such a big dataset and it achieves big improvements over the state of the art in Wikipedia categorization in accuracy by 41.65% over WikiCat-Word system and 26.83% over WikiCat-Link system. Also, it is evaluated on a benchmark dataset and achieved gains over their baseline by 8.1% in accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |