Session: Top-N Recommendation

Date: Wednesday, September 12, 14:30-16:00

  • CLiMF: Learning to Maximize Reciprocal Rank with Collaborative Less-is-More Filtering

    by Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Nuria Oliver and Alan Hanjalic

    In this paper we tackle the problem of recommendation in the scenarios with binary relevance data, when only a few (k) items are recommended to individual users. Past work on Collaborative Filtering (CF) has either not addressed the ranking problem for binary relevance datasets, or not specifically focused on improving top-k recommendations. To solve the problem we propose a new CF approach, Collaborative Less-is-More Filtering (CLiMF). In CLiMF the model parameters are learned by directly maximizing the Mean Reciprocal Rank (MRR), which is a well-known information retrieval metric for measuring the performance of top-k recommendations. We achieve linear computational complexity by introducing a lower bound of the smoothed reciprocal rank metric. Experiments on two social network datasets demonstrate the effectiveness and the scalability of CLiMF, and show that CLiMF significantly outperforms a naive baseline and two state-of-the-art CF methods.

    Details

  • Ranking with Non-Random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics

    by Bruno Pradel, Nicolas Usunier and Patrick Gallinari

    The evaluation of recommender systems in terms of ranking has recently gained attention, as it seems to better fit the top-k recommendation task than the usual ratings prediction task. In that context, several authors have proposed to consider missing ratings as some form of negative feedback to compensate for the skewed distribution of observed ratings when users choose the items they rate. In this work, we study two major biases of the selection of items: the first one is that some items obtain more ratings than others (popularity effect), and the second one is that positive ratings are observed more frequently than negative ratings (positivity effect). We present a theoretical analysis and experiments on the Yahoo! dataset with randomly selected items, which show that considering missing data as a form of negative feedback during training may improve performances, but also that it can be misleading when testing, favoring models of popularity more than models of user preferences.

    Details

  • Sparse Linear Methods with Side Information for Top-N Recommendations

    by Xia Ning and George Karypis

    The increasing amount of side information associated with the items in E-commerce applications has provided a very rich source of information that, once properly exploited and incorporated, can significantly improve the performance of the conventional recommender systems. This paper focuses on developing effective algorithms that utilize item side information for top-N recommender systems. A set of sparse linear methods with side information (SSLIM) is proposed, which involve a regularized optimization process to learn a sparse aggregation coefficient matrix based on both user-item purchase profiles and item side information. This aggregation coefficient matrix is used within an item-based recommendation framework to generate recommendations for the users. Our experimental results demonstrate that SSLIM outperforms other methods in effectively utilizing side information and achieving performance improvement.

    Details

  • Scalable Similarity-Based Neighborhood Methods with MapReduce

    by Sebastian Schelter, Christoph Boden and Volker Markl

    Similarity-based neighborhood methods, a simple and popular approach to collaborative filtering, infer their predictions by finding users with similar taste or items that have been similarly rated. If the number of users grows to millions, the standard approach of sequentially examining each item and looking at all interacting users does not scale. To solve this problem, we develop a MapReduce algorithm for the pairwise item comparison and top-N recommendation problem that scales linearly with respect to a growing number of users. This parallel algorithm is able to work on partitioned data and is general in that it supports a wide range of similarity measures. We evaluate our algorithm on a large dataset consisting of 700 million song ratings from Yahoo! Music.

    Details

Back to Program