Word Embedding Methods for Word Representation in Deep Learning for Natural Language Processing

Autor:	Md. Anwar Hussen Wadud, M. F. Mridha, Mohammad Motiur Rahman
Rok vydání:	2022
Předmět:	General Computer Science General Chemistry General Biochemistry Genetics and Molecular Biology
Zdroj:	Iraqi Journal of Science. :1349-1361
ISSN:	2312-1637 0067-2904
DOI:	10.24996/ijs.2022.63.3.37
Popis:	Natural Language Processing (NLP) deals with analysing, understanding and generating languages likes human. One of the challenges of NLP is training computers to understand the way of learning and using a language as human. Every training session consists of several types of sentences with different context and linguistic structures. Meaning of a sentence depends on actual meaning of main words with their correct positions. Same word can be used as a noun or adjective or others based on their position. In NLP, Word Embedding is a powerful method which is trained on large collection of texts and encoded general semantic and syntactic information of words. Choosing a right word embedding generates more efficient result than others. Most of the papers used pretrained word embedding vector in deep learning for NLP processing. But, the major issue of pretrained word embedding vector is that it can’t use for all types of NLP processing. In this paper, a local word embedding vector formation process have been proposed and shown a comparison between pretrained and local word embedding vectors for Bengali language. The Keras framework is used in Python for local word embedding implementation and analysis section of this paper shows proposed model produced 87.84% accuracy result which is better than fastText pretrained word embedding vectors accuracy 86.75%. Using this proposed method NLP researchers of Bengali language can easily build the specific word embedding vectors for word representation in Natural Language Processing.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::fadbcfa79466d5594ad48a9577f933cf https://doi.org/10.24996/ijs.2022.63.3.37 Zobrazit plný text záznamu