Workshop on Workshop on Recommendation Utility Evaluation: Beyond RMSE

Measuring the error in rating value prediction has been by far the dominant evaluation methodology in the Recommender Systems literature. Yet there seems to be a general consensus that this criterion alone is far from being enough to assess the practical effectiveness of a recommender system in matching user needs. The end users of recommendations receive lists of items rather than rating values, whereby recommendation accuracy metrics –as surrogates of the evaluated task– should target the quality of the item selection, rather than the numeric system scores that determine this selection. Gaps in the adoption of ranking evaluation methodologies (e.g. IR metrics) result in methodological divergences though, which hinder the interpretation and comparability of empirical observations by different authors.

On the other hand, accuracy is only one among several relevant dimensions of recommendation effectiveness. Novelty and diversity, for instance, have been recognized as key aspects of recommendation utility in many application domains. From the business point of view, the value added by recommendation can be measured more directly in terms of clickthrough, conversion rate, order size, returning customers, increased revenue, etc. Furthermore, web portals and social networks commonly face multiple objective optimization problems related to user engagement, requiring appropriate evaluation methodologies for optimizing along the entire recommendation funnel. Other potentially relevant dimensions of effective recommendations for consumers and providers include confidence, coverage, risk, cost, robustness, ease of use, etc.

While the need for further extension, formalization, clarification and standardization of evaluation methodologies is recognized in the community, this need is still unmet for a large extent. When engaging in evaluation work, researchers and practitioners are still often faced with experimental design questions for which there are currently not always precise and consensual answers. RUE 2012 aims to gather researchers and practitioners interested in developing better, clearer, and/or more complete evaluation methodologies for recommender systems –or just seeking clear guidelines for their experimental needs. The workshop aims to provide an informal setting for exchanging and discussing ideas, sharing experiences and viewpoints, seeking to advance in the consolidation and convergence of experimental methods and practice.

Workshop Date

September 9, 2012 (half day, a.m.)

Web site