Machine Learning based approach to Image Description for the Visually Impaired

Autor: Anisa Fathima, Raghunandan Srinath, S. Arpitha, T. Kavya, Chaitanya S Rao, Javavrinda Vrindavanam
Rok vydání: 2021
Předmět:
Zdroj: 2021 Asian Conference on Innovation in Technology (ASIANCON).
Popis: The paper, supported with a review of the literature on image description technologies, introduces an alternative approach that can automatically generate audio description of images, which can greatly support the visually impaired. The need for the paper has arisen from the reality that, the interaction points for visually impaired is getting constrained in an increasingly digitised environment and accessing digital medium through an image describer can be an enabler for the visually impaired. The images which are unseen by the visually impaired are processed, suitable description are generated and converted to a voice output. As against the standard methods like Computer Vision and Convolutional Neural Networks (CNN), the paper makes use of Inception Resnet - V2 model as the feature extractor and decoder (GRU-RNN) along with Bahdanau attention model to generate the text description of the image which is finally converted to an audio using Google Text-to-Speech converter. The results found to be more accurate and accordingly can be supportive in accessing the digital medium for the visually impaired.
Databáze: OpenAIRE