Popis: |
Document classification is an abstract task in the domain of natural language processing and information retrieval. There are traditional methods associated with this task, our method shows the performance enhancement in terms of the performance, convergence and enrichment of information. We propose a hybrid neural language modelling architecture that constructs hierarchical feature representations. We examine our architecture through document classification. In our first model, we begin with a character level convolutional neural layer (CNN) to get word-level representation, next layers recurrent neural network (RNN) with attention-based feature merging in order to get sentence level representation and again we have RNN with attention layer to get document level representation and finally, we have interconnected dense structure stacked to classify documents with soft-max activation. We extend this model to the word level and summarize the overall results and comparisons with baseline models. We show evidence of the hypotheses on multiple datasets, utilizing IMDB YELP review datasets. We show extended results with all datasets in terms of performance with F1 score, accuracy, precision and recall. Also, we show the comparison of convergence time and the rate of convergence of our approach. Moreover, we show visual evidence that our approach leads to better feature construction and able to construct features for 99% of the effective word vocabulary from the characters in the documents. |