ContentWise Impressions: An Industrial Dataset with Impressions Included

Autor: Maurera, Fernando Benjamín Pérez, Dacrema, Maurizio Ferrari, Saule, Lorenzo, Scriminaci, Mario, Cremonesi, Paolo
Rok vydání: 2020
Předmět:
Zdroj: Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM 2020)
Druh dokumentu: Working Paper
DOI: 10.1145/3340531.3412774
Popis: In this article, we introduce the ContentWise Impressions dataset, a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet. The dataset is distinguished from other already available multimedia recommendation datasets by the availability of impressions, i.e., the recommendations shown to the user, its size, and by being open-source. We describe the data collection process, the preprocessing applied, its characteristics, and statistics when compared to other commonly used datasets. We also highlight several possible use cases and research questions that can benefit from the availability of user impressions in an open-source dataset. Furthermore, we release software tools to load and split the data, as well as examples of how to use both user interactions and impressions in several common recommendation algorithms.
Comment: 8 pages, 2 figures
Databáze: arXiv