Session: Beyond Prediction Accuracy
Chair: Bamshad Mobasher
Date: Tuesday, September 28, 11:00‐13:00
- Performance of recommender algorithms on top-n recommendation tasks
by Paolo Cremonesi, Yehuda Koren, Roberto Turrin
In many commercial systems, the ‘best bet’ recommendations are shown, but the predicted rating values are not. This is usually referred to as a top-N recommendation task, where the goal of the recommender system is to find a few specific items which are supposed to be most appealing to the user. Common methodologies based on error metrics (such as RMSE) are not a natural fit for evaluating the top-N recommendation task. Rather, top-N performance can be directly measured by alternative methodologies based on accuracy metrics (such as precision/recall).
An extensive evaluation of several state-of-the art recommender algorithms suggests that algorithms optimized for minimizing RMSE do not necessarily perform as expected in terms of top-N recommendation task. Results show that improvements in RMSE often do not translate into accuracy improvements. In particular, a naive non-personalized algorithm can outperform some common recommendation approaches and almost match the accuracy of sophisticated algorithms. Another finding is that the very few top popular items can skew the top-N performance. The analysis points out that when evaluating a recommender algorithm on the top-N recommendation task, the test set should be chosen carefully in order to not bias accuracy metrics towards non-personalized solutions. Finally, we offer practitioners new variants of two collaborative filtering algorithms that, regardless of their RMSE, significantly outperform other recommender algorithms in pursuing the top-N recommendation task, with offering additional practical advantages. This comes at surprise given the simplicity of these two methods.
- On the stability of recommendation algorithms
by Gediminas Adomavicius, Jingjing Zhang
The paper introduces stability as a new measure of the recommender systems performance. In general, we define a recommendation algorithm to be “stable” if its predictions for the same items are consistent over a period of time, assuming that any new ratings that have been submitted to the recommender system over the same period of time are in complete agreement with system’s prior predictions. In this paper, we advocate that stability should be a desired property of recommendation algorithms, because unstable recommendations can lead to user confusion and, therefore, reduce trust in recommender systems. Furthermore, we empirically evaluate stability of several popular recommendation algorithms. Our results suggest that model-based recommendation techniques demonstrate higher stability than memory-based collaborative filtering heuristics. We also find that the stability measure for recommendation techniques is influenced by many factors, including the sparsity of the initial rating data, the number of new incoming ratings (representing the length of the time period over which the stability is being measured), the distribution of the newly added rating values, and the rating normalization procedures employed by the recommendation algorithms.
- Optimizing multiple objectives in collaborative filtering
by Tamas Jambor, Jun Wang
This paper is about the utility of making personalized recommendations. While it is important to accurately predict the target user’s preference, in practice the accuracy should not be the only concern; a useful recommender system needs to consider the user’s utility or satisfaction of fulfilling a certain information seeking task. For example, recommending popular items (products) is unlikely to result in more gain than discovering insignificant (“long tail”) yet liked items because the popular ones might be already known to the user. Equally, recommending items that are out of stock would be frustrating for both the user and system if the system is employed to discover items to purchase. Thus, it is important to have a flexible recommendation framework that takes into account additional recommendation goals meanwhile minimizing the performance loss in order to provide greater adjustability and a better user experience.
To achieve this, in this paper, we propose a general recommendation optimization framework that not only considers the predicted preference scores (e.g. ratings) but also deals with additional operational or resource related recommendation goals. Using this framework we demonstrate through realistic examples how to expand existing rating prediction algorithms by biasing the recommendation depending on other external factors such as the availability, profitability or usefulness of an item. Our experiments on real data sets demonstrate that this framework is indeed able to cope with multiple objectives with minor performance loss.
- Understanding choice overload in recommender systems
by Dirk Bollen, Bart P. Knijnenburg, Martijn C. Willemsen, Mark Graus
Even though people are attracted by large, high quality recommendation sets, psychological research on choice overload shows that choosing an item from recommendation sets containing many attractive items can be a very difficult task. A web-based user experiment using a matrix factorization algorithm applied to the MovieLens dataset was used to investigate the effect of recommendation set size (5 or 20 items) and set quality (low or high) on perceived variety, recommendation set attractiveness, choice difficulty and satisfaction with the chosen item. The results show that larger sets containing only good items do not necessarily result in higher choice satisfaction compared to smaller sets, as the increased recommendation set attractiveness is counteracted by the increased difficulty of choosing from these sets. These findings were supported by behavioral measurements revealing intensified information search and increased acquisition times for these large attractive sets. Important implications of these findings for the design of recommender system user interfaces will be discussed.
RecSys 2010 (Barcelona)
Sponsors and Benefactors
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |







