Popis: |
In recent years, embedding models based on skip-gram algorithm have been widely applied to real-world recommendation systems (RSs). When designing embedding-based methods for recommendation at Taobao, there are three main challenges: scalability, sparsity and cold start. The first problem is inherently caused by the extremely large numbers of users and items (in the order of billions), while the remaining two problems are caused by the fact that most items have only very few (or none at all) user interactions. To address these challenges, in this work, we present a flexible and highly scalable Side Information (SI) enhanced Skip-Gram (SISG) framework, which is deployed at Taobao. SISG overcomes the drawbacks of existing embedding-based models by modeling user metadata and capturing asymmetries of user behavior. Furthermore, as training SISG can be performed using any SGNS implementation, we present our production deployment of SISG on a custom-built word2vec engine, which allows us to compute item and SI embedding vectors for billion-scale sets of products in a join semantic space on a daily basis. Finally, using offline and online experiments we demonstrate the significant superiority of SISG over our previously deployed framework, EGES, and a well-tuned CF, as well as present evidence supporting our scalability claims. |