Improving User Intent Detection in Urdu Web Queries with Capsule Net Architectures

Autor: Sana Shams, Muhammad Aslam
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Applied Sciences, Vol 12, Iss 22, p 11861 (2022)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app122211861
Popis: Detecting the communicative intent behind user queries is critically required by search engines to understand a user’s search goal and retrieve the desired results. Due to increased web searching in local languages, there is an emerging need to support the language understanding for languages other than English. This article presents a distinctive, capsule neural network architecture for intent detection from search queries in Urdu, a widely spoken South Asian language. The proposed two-tiered capsule network utilizes LSTM cells and an iterative routing mechanism between the capsules to effectively discriminate diversely expressed search intents. Since no Urdu queries dataset is available, a benchmark intent-annotated dataset of 11,751 queries was developed, incorporating 11 query domains and annotated with Broder’s intent taxonomy (i.e., navigational, transactional and informational intents). Through rigorous experimentation, the proposed model attained the state of the art accuracy of 91.12%, significantly improving upon several alternate classification techniques and strong baselines. An error analysis revealed systematic error patterns owing to a class imbalance and large lexical variability in Urdu web queries.
Databáze: Directory of Open Access Journals