JUMRv1: A Sentiment Analysis Dataset for Movie Recommendation

Autor: Chatterjee, Shuvamoy, Chakrabarti, Kushal, Garain, Avishek, Schwenker, Friedhelm, Sarkar, Ram
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Applied Sciences
Volume 11
Issue 20
Applied Sciences, Vol 11, Iss 9381, p 9381 (2021)
ISSN: 2076-3417
DOI: 10.3390/app11209381
Popis: Nowadays, we can observe the applications of machine learning in every field, ranging from the quality testing of materials to the building of powerful computer vision tools. One such recent application is the recommendation system, which is a method that suggests products to users based on their preferences. In this paper, our focus is on a specific recommendation system called movie recommendation. Here, we make use of user reviews of movies in order to establish a general outlook about the movie and then use that outlook to recommend that movie to other users. However, a huge number of available reviews has baffled sophisticated review systems. Consequently, there is a need to find a method of extracting meaningful information from the available reviews and use that in classifying a movie review and predicting the sentiment in each one. In a typical scenario, a review can either be positive, negative, or indifferent about a movie. However, the available research articles in the field mainly consider this as a two-class classification problem—positive and negative. The most popular work in this field was performed on Stanford and Rotten Tomatoes datasets, which are somewhat outdated. Our work is based on self-scraped reviews from the IMDB website, and we have annotated the reviews into one of the three classes—positive, negative, and neutral. Our dataset is called JUMRv1—Jadavpur University Movie Recommendation dataset version 1. For the evaluation of JUMRv1, we took an exhaustive approach by testing various combinations of word embeddings, feature selection methods, and classifiers. We also analysed the performance trends, if there were any, and attempted to explain them. Our work sets a benchmark for movie recommendation systems that is based on the newly developed dataset using a three-class sentiment classification.
publishedVersion
Databáze: OpenAIRE