SubRosa: Determining Movie Similarities based on Subtitles

Autor: Luhmann, Jan, Burghardt, Manuel, Tiepmar, Jochen
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Druh dokumentu: Text<br />Conference Material
DOI: 10.18420/inf2020_119
Popis: For streaming websites, media shopping platforms and movie databases, movie recommendation systems have become an important technology, where mostly hybrid methods of collaborative and content-based filtering on the basis of user ratings and user-generated content have proven to be effective. However, these methods can lead to popularity-biased results that show an underrepresentation of those movies for which only little user-generated data exists. In this paper we will discuss the possibility of generating movie recommendations that are not based on user-generated data or metadata, but solely on the content of the movies themselves, confining ourselves to movie dialog. We extract low-level features from movie subtitles by using methods from Information Retrieval, Natural Language Processing and Stylometry, and examine a possible correlation of these features’ similarity with the overall movie similarity. In addition we present a novel web application called SubRosa (http://ch01.informatik.uni-leipzig.de:5001/), which can be used to interactively compare the results of different feature combinations.
Databáze: Networked Digital Library of Theses & Dissertations