Popis: |
Word embedding is widely used in various natural language processing (NLP) tasks, especially sentiment classification. Huge computational costs for training new embeddings from scratch have made pretrained word embeddings such as Word2Vec and Glove popular for reuse of word vectors. Inadequacy of a single embedding necessitated newer techniques to combine two embeddings. However, the combined embeddings proposed in existing works are static and only provide a one-size-fits-all solution regardless of the problem and dataset at hand. Optimization is a more promising technique to overcome the limitations of simplistic techniques in existing works related to combined word embeddings because optimization provides unique and optimal solutions according to the problem and dataset at hand. In this paper, a new genetic algorithm based combinatorial optimization algorithm called Evolutionary Combinatorial Optimization for Word Embedding (ECOWE) is proposed to produce combinations of word embeddings, which yield optimal accuracy for the specific sentiment classification dataset that is used. Results show that absolute percentages of improvement ranging from 1.7% to 12.9%, averaging around 5.5% and relative percentages of improvement ranging from 2.4% to 19.5%, averaging around 8.1% have been achieved over the benchmark model accuracy values for all datasets. The ECOWE accuracy values for all datasets have also been found to be statistically significant compared to benchmark accuracy values with a z-score of -2.2014 using two-tailed Wilcoxon signed rank test with 5% significance level. |