Arabic reCAPTCHA Service for Enhancing Digitization of Arabic Manuscripts.

Autor: Abubaker, Hanin, Salah, Khaled, Al-Muhairi, Hassan, Bentiba, Ahmed
Předmět:
Zdroj: Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ); Aug2017, Vol. 42 Issue 8, p3391-3408, 18p
Abstrakt: reCAPTCHA is a security measure that guards web applications against automated bot abuse by presenting a random auto-generated challenge for users to solve. These challenges have to be devised to be hard on computers, yet easily solved by humans. In this paper, we present a cloud-based Arabic reCAPTCHA service that provides protection for Arabic websites against automated abuse. In addition, the proposed service is designed to improve the accuracy of printed Arabic manuscripts digitization when compared with the traditional digitization using optical character recognition software. The architectural design, algorithms, implementation and deployment guidelines presented in this paper are not limited to the Arabic language, but can be the basis for developing a reCAPTCHA service for any other language. The paper discusses the need for developing an Arabic reCAPTCHA service and then presents an original system architecture, design and implementation. We also address and propose solutions and algorithms to a number of design and implementation challenges. First, we devise a scheme to properly extract word images from scanned pages to form reCAPTCHA challenges. Second, we propose a classification mechanism for the extracted word images into known and unknown word sets. Third, we explore and propose two algorithms for processing user input to a reCAPTCHA challenge to prepare the service response for user verification, and at the same time, store the user guess for the digitization process. Fourth, we present a solution to maintain data integrity while handling multiple user requests for reCAPTCHA challenges. Moreover, we show how the different components and subservices of our proposed Arabic reCAPTCHA system can be deployed on a public cloud as that of Amazon Web Services. Finally, we conduct an experimental study to validate the efficacy of the service. The study shows that an overall digitization accuracy of 97.67 and 96.73% in two experiment setups was attained and that 72.2% of the audience preferred solving Arabic reCAPTCHA challenges over English reCAPTCHA in Arabic websites. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index