Context-aware Urdu Information Retrieval System

Autor: Umar Shoaib, Laiba Fiaz, Chinmay Chakraborty, Hafiz Tayyab Rauf
Rok vydání: 2023
Předmět:
Zdroj: ACM Transactions on Asian and Low-Resource Language Information Processing. 22:1-19
ISSN: 2375-4702
2375-4699
Popis: World Wide Web (WWW) is playing a vital role for sharing dynamic knowledge in every field of life. The information on web comprises a huge amount of data in different forms such as structured, semi structured, or few is totally in unstructured format. Due to huge size of information, searching from larger textual data about the specific topic or getting precise information is a challenging task. All this leads to the problem of word sense ambiguity (WSA). Urdu language-based information retrieval system using different techniques related to Web Semantic Search Engine architecture is proposed to efficiently retrieve the relevant information and solve the problem of WSA. The proposed system has average precision ratio 96% as compared to average precision ratio of 74% and 75% average precision Google for single word query. For the long text queries, our system outperforms the existing famous search engines with 92% accuracy such as Bing and Google having 16.50% and 16% accuracy, respectively. Similarly, the proposed system for single word query, the recall ratio is 32.25% as compared to 25% and 25% of Bing and Google. The results of recall ratio for long text query are improved as well, showing 6.38% as compared to 6.20% and 4.8% of Bing and Google, respectively. The results showed that the proposed system gives better and efficient results as compared to the existing systems for Urdu language.
Databáze: OpenAIRE