
Session 18: Women In RecSys
Date: Friday September 22, 4:05 PM – 5:25 PM (GMT+8)
Room: Hall 406D
Session Chair: Ashmi Banerjee
Parallel with: Session 17: Interactive Recommendation 2
The winners of the Women in RecSys Journal Paper of the Year Awards 2023 are:
- JuniorWe’re in This Together: A Multi-Stakeholder Approach for News Recommenders
by Annelien Smets, Jonathan Hendrickx and Pieter BallonNews recommenders are attracting widespread interest in scholarly work. The current research paradigm, however, holds a narrow (mostly user-centered) perspective on the recommendation task. This makes it difficult to understand that their design is in fact the result of a negotiation process among multiple actors involved, such as editors, business executives, technologists and users. To remedy this, a multi-stakeholder recommendation paradigm has been suggested among recommender systems scholars. This work sets out to explore to what extent this paradigm is applicable to the particular context of news recommenders. We conducted 11 interviews with professionals from three leading media companies in Flanders (Belgium) and find that the development of news recommenders is indeed characterized by a negotiation process among multiple stakeholders. However, our results show that the initial multi-stakeholder framework is not adequately accommodating some of our findings, such as the existence of preconditions, the role of product owners, and the indirect involvement of particular stakeholders. Based on our analysis, we suggest an elaborated framework for multi-stakeholder news recommenders that can contribute to scholarship by providing a multi-sided perspective towards the understanding of news recommenders.
- JuniorDaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation
by Zhu Sun, Hui Fang, Jie Yang, Xinghua Qu, Hongyang Liu, Di Yi, Yew-Soon Ong and Jie ZhangRecently, one critical issue looms large in the field of recommender systems – there are no effective benchmarks for rigorous evaluation – which consequently leads to unreproducible evaluation and unfair comparison. We, therefore, conduct studies from the perspectives of practical theory and experiments, aiming at benchmarking recommendation for rigorous evaluation. Regarding the theoretical study, a series of hyper-factors affecting recommendation performance throughout the whole evaluation chain are systematically summarized and analyzed via an exhaustive review on 141 papers published at eight top-tier conferences within 2017-2020. We then classify them into model-independent and model-dependent hyper-factors, and different modes of rigorous evaluation are defined and discussed in-depth accordingly. For the experimental study, we release DaisyRec 2.0 library by integrating these hyper-factors to perform rigorous evaluation, whereby a holistic empirical study is conducted to unveil the impacts of different hyper-factors on recommendation performance. Supported by the theoretical and experimental studies, we finally create benchmarks for rigorous evaluation by proposing standardized procedures and providing performance of ten state-of-the-arts across six evaluation metrics on six datasets as a reference for later study. Overall, our work sheds light on the issues in recommendation evaluation, provides potential solutions for rigorous evaluation, and lays foundation for further investigation.
- JuniorA Framework and Toolkit for Testing the Correctness of Recommendation Algorithms
by Lien Michiels, Robin Verachtert, Andres Ferraro, Kim Falk and Bart GoethalsEvaluating recommender systems adequately and thoroughly is an important task. Significant efforts are dedicated to proposing metrics, methods and protocols for doing so. However, there has been little discussion in the recommender systems’ literature on the topic of testing. In this work, we adopt and adapt concepts from the software testing domain, e.g., code coverage, metamorphic testing, or property-based testing, to help researchers to detect and correct faults in recommendation algorithms. We propose a test suite that can be used to validate the correctness of a recommendation algorithm, and thus identify and correct issues that can affect the performance and behavior of these algorithms. Our test suite contains both black box and white box tests at every level of abstraction, i.e., system, integration and unit. To facilitate adoption, we release RecPack Tests, an open-source Python package containing template test implementations. We use it to test four popular Python packages for recommender systems: RecPack, PyLensKit, Surprise and Cornac. Despite the high test coverage of each of these packages, we find that we are still able to uncover undocumented functional requirements and even some bugs. This validates our thesis that testing the correctness of recommendation algorithms can complement traditional methods for evaluating recommendation algorithms.
- SeniorEffects of Personalized Recommendations versus Aggregate Ratings on Post-Consumption Preference Responses
by Gediminas Adomavicius, Jesse Bockstedt, Shawn Curley and Jingjing ZhangOnline retailers use product ratings to signal quality and help consumers identify products for purchase. These ratings commonly take the form of either non-personalized, aggregate product ratings (i.e., the average rating a product received from a number of consumers such as “the average rating is 4.5/5 based on 100 reviews”), or personalized predicted preference ratings for a product (i.e., recommender-system-generated predictions for a consumer’s rating of a product such as “we think you’d rate this product 4.5/5”). Ratings in either format can provide decision aid to the consumer, but the two formats convey different types of product quality information and operate with different psychological mechanisms. Prior research has indicated that each recommendation type can significantly affect consumer’s post-experience preference ratings, constituting a judgmental bias, but has not compared the effects of these two common product-rating formats. Using a laboratory experiment, we show that aggregate ratings and personalized recommendations create similar biases on post-experience preference ratings when shown separately. Shown together, there is no cumulative increase in the effect. Instead, personalized recommendations tend to dominate. Our findings can help retailers determine how to use these different types of product ratings to most effectively serve their customers. Additionally, these results help to educate the consumer on how product-rating displays influence their stated preferences.
- SeniorPsychology-informed Recommender Systems
by Elisabeth Lex, Dominik Kowald, Paul Seitlinger, Thi Ngoc Trang Tran, Alexander Felfernig and Markus SchedlPersonalized recommender systems have become indispensable in today’s online world. Most of today’s recommendation algorithms are data-driven and based on behavioral data. While such systems can produce useful recommendations, they are often uninterpretable, black-box models, which do not incorporate the underlying cognitive reasons for user behavior in the algorithms’ design. The aim of this survey is to present a thorough review of the state of the art of recommender systems that leverage psychological constructs and theories to model and predict user behavior and improve the recommendation process. We call such systems psychology-informed recommender systems. The survey identifies three categories of psychology-informed recommender systems: cognition-inspired, personality-aware, and affect-aware recommender systems. Moreover, for each category, we highlight domains, in which psychological theory plays a key role and is therefore considered in the recommendation process. As recommender systems are fundamental tools to support human decision making, we also discuss selected decision-psychological phenomena that impact the interaction between a user and a recommender. Besides, we discuss related work that investigates the evaluation of recommender systems from the user perspective and highlight user-centric evaluation frameworks. We discuss potential research tasks for future work at the end of this survey.
- SeniorEvaluating Recommender Systems: Survey and Framework
by Eva Zangerle and Christine BauerThe comprehensive evaluation of the performance of a recommender system is a complex endeavor: many facets need to be considered in configuring an adequate and effective evaluation setting. Such facets include, for instance, defining the specific goals of the evaluation, choosing an evaluation method, underlying data, and suitable evaluation metrics. In this article, we consolidate and systematically organize this dispersed knowledge on recommender systems evaluation. We introduce the Framework for Evaluating Recommender systems (FEVR), which we derive from the discourse on recommender systems evaluation. In FEVR, we categorize the evaluation space of recommender systems evaluation. We postulate that the comprehensive evaluation of a recommender system frequently requires considering multiple facets and perspectives in the evaluation. The FEVR framework provides a structured foundation to adopt adequate evaluation configurations that encompass this required multi-facetedness and provides the basis to advance in the field. We outline and discuss the challenges of a comprehensive evaluation of recommender systems and provide an outlook on what we need to embrace and do to move forward as a research community.