A Trade-off between ML and DL Techniques in Natural Language Processing

Autor: Himanshu Ashar, Parth Tank, Bhavesh Singh, Rahil Desai, Neha Katre
Rok vydání: 2021
Předmět:
Zdroj: Journal of Physics: Conference Series. 1831:012025
ISSN: 1742-6596
1742-6588
DOI: 10.1088/1742-6596/1831/1/012025
Popis: The domain of Natural Language Processing covers various tasks, such as classification, text generation, and language model. The data processed using word embeddings, or vectorizers, is then trained using Machine Learning and Deep Learning algorithms. In order to observe the tradeoff between both these types of algorithms, with respect to data available, accuracy obtained and other factors, a binary classification is undertaken to distinguish between insincere and regular questions on Quora. A dataset called Quora Insincere Questions Classification was used to train various machine learning and deep learning models. A Bidirectional-Long Short Term Network (LSTM) was trained, with the text processed using Global Vectors for Word Representation (GloVe). Machine Learning algorithms such as Extreme Gradient Boosting classifier, Gaussian Naive Bayes, and Support Vector Classifier (SVC), by using the TF-IDF vectorizer to process the text. This paper also presents an evaluation of the above algorithms on the basis of precision, recall, f1 score metrics.
Databáze: OpenAIRE