Abstrakt: |
Text data has been growing drastically in the present day because of digitalization. The Internet, being flooded with millions of documents every day, makes the task of text processing by human beings relatively complex, which is neither adaptable nor successful. Many machine learning algorithms cannot interpret the raw text in its original format, as these algorithms purely need numbers as inputs to accomplish any task (say, classification, regression). A better way to represent text for computers, to understand and process text efficiently and effectively is needed. Word embedding is one such technique. Word embedding, or the encoding of words as vectors, has received much interest as a feature learning technique for natural language processing in recent times. This review presents a better way of understanding and working with word embeddings. Many researchers, who are non-experts in using different text processing techniques, would not know where to start their exploration due to a lack of comprehensive material. This review provides an overview of several word embedding strategies and the entire working procedure of word2vec,both in theory and mathematical perspectives which provides researchers with detailed information so that they may rapidly get to work on their research. Research results of standard word embedding techniques have also been included to better understand how word embedding have been improved from the past years to most recent findings. [ABSTRACT FROM AUTHOR] |