Popis: |
For streaming websites, media shopping platforms and movie databases, movie recommendation systems have become an important technology, where mostly hybrid methods of collaborative and content-based filtering on the basis of user ratings and user-generated content have proven to be effective. However, these methods can lead to popularity-biased results that show an underrepresentation of those movies for which only little user-generated data exists. In this paper we will discuss the possibility of generating movie recommendations that are not based on user-generated data or metadata, but solely on the content of the movies themselves, confining ourselves to movie dialog. We extract low-level features from movie subtitles by using methods from Information Retrieval, Natural Language Processing and Stylometry, and examine a possible correlation of these features’ similarity with the overall movie similarity. In addition we present a novel web application called SubRosa (http://ch01.informatik.uni-leipzig.de:5001/), which can be used to interactively compare the results of different feature combinations. |