RecSys 2011 - Session Methodological Issues - RecSys

Session: Methodological Issues, Evaluation Metrics and Tools

Chair: Alexander Felfernig
Date: Tuesday, October 25, 14:00-15:45

Rank and relevance in novelty and diversity metrics for recommender systems

by Saúl Vargas, Pablo Castells

The Recommender Systems community is paying increasing attention to novelty and diversity as key qualities beyond accuracy in real recommendation scenarios. Despite the raise of interest and work on the topic in recent years, we find that a clear common methodological and conceptual ground for the evaluation of these dimensions is still to be consolidated. Different evaluation metrics have been reported in the literature but the precise relation, distinction or equivalence between them has not been explicitly studied. Furthermore, the metrics reported so far miss important properties such as taking into consideration the ranking of recommended items, or whether items are relevant or not, when assessing the novelty and diversity of recommendations.

We present a formal framework for the definition of novelty and diversity metrics that unifies and generalizes several state of the art metrics. We identify three essential ground concepts at the roots of novelty and diversity: choice, discovery and relevance, upon which the framework is built. Item rank and relevance are introduced through a probabilistic recommendation browsing model, building upon the same three basic concepts. Based on the combination of ground elements, and the assumptions of the browsing model, different metrics and variants unfold. We report experimental observations which validate and illustrate the properties of the proposed metrics.

Details
OrdRec: An ordinal model for predicting personalized item rating distributions (Best Paper)

by Yehuda Koren, Joe Sill

We propose a collaborative filtering (CF) recommendation framework, which is based on viewing user feedback on products as ordinal, rather than the more common numerical view. This way, we do not need to interpret each user feedback value as a number, but only rely on the more relaxed assumption of having an order among the different feedback ratings. Such an ordinal view frequently provides a more natural reflection of the user intention when providing qualitative ratings, allowing users to have different internal scoring scales. Moreover, we can address scenarios where assigning numerical scores to different types of user feedback would not be easy. Our approach is based on a pointwise ordinal model, which allows it to linearly scale with data size. The framework can wrap most collaborative filtering algorithms, upgrading those algorithms designed to handle numerical values into being able to handle ordinal values. In particular, we demonstrate our framework with wrapping a leading matrix factorization CF method. A cornerstone of our method is its ability to predict a full probability distribution of the expected item ratings, rather than only a single score for an item. One of the advantages this brings is a novel approach to estimating the confidence level in each individual prediction. Compared to previous approaches to confidence estimation, ours is more principled and empirically superior in its accuracy. We demonstrate the efficacy of the approach on some of the largest publicly available datasets, the Netflix data, and the Yahoo! Music data.

Details
Item popularity and recommendation accuracy

by Harald Steck

Recommendations from the long tail of the popularity distribution of items are generally considered to be particularly valuable. On the other hand, recommendation accuracy tends to decrease towards the long tail. In this paper, we quantitatively examine this trade-off between item popularity and recommendation accuracy. To this end, we assume that there is a selection bias towards popular items in the available data. This allows us to define a new accuracy measure that can be gradually tuned towards the long tail. We show that, under this assumption, this measure has the desirable property of providing nearly unbiased estimates concerning recommendation accuracy. In turn, this also motivates a refinement for training collaborative-filtering approaches. In various experiments with real-world data, including a user study, empirical evidence suggests that only a small, if any, bias of the recommendations towards less popular items is appreciated by users.

Details
Rethinking the recommender research ecosystem: reproducibility, openness, and LensKit

by Michael D. Ekstrand, Michael Ludwig, Joseph A. Konstan, John T. Riedl

Recommender systems research is being slowed by the difficulty of replicating and comparing research results. Published research uses various experimental methodologies and metrics that are difficult to compare. It also often fails to sufficiently document the details of proposed algorithms or the evaluations employed. Researchers waste time reimplementing well-known algorithms, and the new implementations may miss key details from the original algorithm or its subsequent refinements. When proposing new algorithms, researchers should compare them against finely-tuned implementations of the leading prior algorithms using state-of-the-art evaluation methodologies. With few exceptions, published algorithmic improvements in our field should be accompanied by working code in a standard framework, including test harnesses to reproduce the described results. To that end, we present the design and freely distributable source code of LensKit, a flexible platform for reproducible recommender systems research. LensKit provides carefully tuned implementations of the leading collaborative filtering algorithms, APIs for common recommender system use cases, and an evaluation framework for performing reproducible offline evaluations of algorithms. We demonstrate the utility of LensKit by replicating and extending a set of prior comparative studies of recommender algorithms — showing limitations in some of the original results — and by investigating a question recently raised by a leader in the recommender systems community on problems with error-based prediction evaluation.

Details

Back to Program

Session: Methodological Issues, Evaluation Metrics and Tools

RecSys 2011 (Chicago)

Sponsors and Benefactors

About this site

RecSys 2026