Image Captioning in Real Time

Autor: Ankit Patil, Karishma Saudagar, Atul Maharnawar, Tejas Rangatwan, I. Priyadarshini
Jazyk: angličtina
Rok vydání: 2022
Předmět:
DOI: 10.5281/zenodo.6759891
Popis: The current development in Deep Learning based Machine Translation and Computer Vision have led to incredible Image Captioning models using advanced techniques like Deep Learning. Even if these models are very accurate, they often rely on the use of exorbitant computation hardware making it problematic to apply these models in real-time scenarios, where their actual uses can be noticed. This model uses a hybrid CN-NRNN model, where the CNN part of the model system uses the Xception model for transfer learning, and RNNs are widely used in language modeling. The Flickr8k dataset is used for real-time training and testing. RNN’s LSTM model is used to avoid problems with extinction or gradient explosion during the training phase.
{"references":["Zhao, W., Wu, X., & Luo, J. (2020). Cross-domain image captioning via cross-modal retrieval and model adaptation. IEEE Transactions on Image Processing, 30, 1180-1192.","Huang, Y., Chen, J., Ouyang, W., Wan, W., & Xue, Y. (2020). Image captioning with end-to-end attribute detection and subsequent attributes prediction. IEEE Transactions on Image processing, 29, 4013-4026.","Yu, N., Hu, X., Song, B., Yang, J., & Zhang, J. (2018). Topic-oriented image captioning based on orderembedding. IEEE Transactions on Image Processing, 28(6), 2743-2754.","Lu, D., Whitehead, S., Huang, L., Ji, H., & Chang, S. F. (2018). Entityaware image caption generation. arXiv preprint arXiv:1804.07889.","Hossain, M. Z., Sohel, F., Shiratuddin, M. F., & Laga, H. (2019). A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CsUR), 51(6), 1- 36.","Albawi, S., Mohammed, T. A., & AlZawi, S. (2017, August). Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET) (pp. 1-6). IEEE.","Elamri, C., & de Planque, T. (2016). Automated neural image caption generator for visually impaired people. Department of Computer Science Stanford University, 28.","Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).","Karpathy, A., & Fei-Fei, L. (2017). Deep visual-semantic alignments for generating image descriptions. Department of Computer Science.","Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3156-3164).","Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780."]}
Databáze: OpenAIRE