A Neural Language Model for Multi-Dimensional Textual Data based on CNN-LSTM Network
Autor: | Jin-Hee Song, Yanggon Kim, Seongik Park |
---|---|
Rok vydání: | 2018 |
Předmět: |
Vanishing gradient problem
Perplexity Artificial neural network Computer science business.industry Speech recognition Deep learning 020208 electrical & electronic engineering 02 engineering and technology Convolutional neural network Recurrent neural network 020204 information systems 0202 electrical engineering electronic engineering information engineering Language model Artificial intelligence business Natural language |
Zdroj: | SNPD |
DOI: | 10.1109/snpd.2018.8441130 |
Popis: | Language Modeling (LM) is a subtask in Natural Language Processing (NLP), and the goal of LM is to build a statistical language model that can learn and estimate a probability distribution of natural language over sentences of terms. Recently, many recurrent neural network based LM, a type of deep neural network for dealing with sequential data, have been proposed and achieved remarkable results. However, they only rely upon the analysis on the words occurred in the sentences even though every sentence contains various useful morphological information, such as Part-of-Speech (POS) tag that is necessary for constituting a sentence and can be used for an analysis as a feature. Although morphological information can be useful for LM, using that information as the input data to neural network based LM is not straightforward because adding features between words as a one-dimensional array can cause the vanishing gradient problem by increasing the time steps of recurrent neural network. In order to solve this problem, in this paper, we propose a CNN-LSTM based language model that deals with textual data regarding a multi-dimensional data with respect to the input of the network. To train this multi-dimensional input to Long-Short Term Memory (LSTM), we use a convolutional neural network (CNN) with a 1×1 filter for dimensionality reduction of input data to avoid the vanishing gradient problem by decreasing the time step between input words. In addition, our approach that uses multi-dimension data reduced by CNN can be used as a plugin with many customized LSTM based LM. On the Penn Treebank corpus, our model has shown improvement of the perplexity with not only vanilla LSTM but customized LSTM models. |
Databáze: | OpenAIRE |
Externí odkaz: |