RecSys 2025 - Session 10 - RecSys

Session 10: Women in RecSys

Date: Thursday September 25, 16:50–17:40 (GMT+2)
Session Chair: Ashmi Banerjee

Dynamic Fairness-aware Recommendation Through Multi-agent Social Choice
by Amanda Aird, Paresha Farastu, Joshua Sun, Elena Štefancová, Cassidy All, Amy Voida, Nicholas Mattei, Robin Burke

Algorithmic fairness in the context of personalized recommendation presents significantly different challenges to those commonly encountered in classification tasks. Researchers studying classification have generally considered fairness to be a matter of achieving equality of outcomes (or some other metric) between a protected and unprotected group, and built algorithmic interventions on this basis. We argue that fairness in real-world application settings in general, and especially in the context of personalized recommendation, is much more complex and multi-faceted, requiring a more general approach. To address the fundamental problem of fairness in the presence of multiple stakeholders, with different definitions of fairness, we propose the Social Choice for Recommendation Under Fairness — Dynamic (SCRUF-D) architecture, which formalizes multistakeholder fairness in recommender systems as a two-stage social choice problem. In particular, we express recommendation fairness as a combination of an allocation and an aggregation problem, which integrate both fairness concerns and personalized recommendation provisions, and derive new recommendation techniques based on this formulation. We demonstrate the ability of our framework to dynamically incorporate multiple fairness concerns using both real-world and synthetic datasets.

Full text in ACM Digital Library
On the challenges of studying bias in Recommender Systems: The effect of data characteristics and algorithm configuration
by Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink

Statements on the propagation of bias by recommender systems are often hard to verify or falsify. Research on bias tends to draw from a small pool of publicly available datasets and is therefore bound by their specific properties. Additionally, implementation choices are often not explicitly described or motivated in research, while they may have an effect on bias propagation. In this paper, we explore the challenges of measuring and reporting popularity bias. We showcase the impact of data properties and algorithm configurations on popularity bias by combining real and synthetic data with well known recommender systems frameworks. First, we identify data characteristics that might impact popularity bias, and explore their presence in a set of available online datasets. Accordingly, we generate various datasets that combine these characteristics. Second, we locate algorithm configurations that vary across implementations in literature. We evaluate popularity bias for a number of datasets, three real and five synthetic, and configurations, and offer insights on their joint effect. We find that, depending on the data characteristics, various configurations of the algorithms examined can lead to different conclusions regarding the propagation of popularity bias. These results motivate the need for explicitly addressing algorithmic configuration and data properties when reporting and interpreting bias in recommender systems.

Full text in ACM Digital Library
FEIR: Quantifying and Reducing Envy and Inferiority for Fair Recommendation of Limited Resources
by Nan Li, Bo Kang, Jefrey Lijffijt, Tijl De Bie

Recommendation in settings such as e-recruitment and online dating involves distributing limited opportunities, which differs from recommending practically unlimited goods such as in e-commerce or music recommendation. This setting calls for novel approaches to quantify and enforce fairness. Indeed, typical recommender systems recommend each user their top relevant items, such that desirable items may be recommended simultaneously to more and to less qualified individuals. This is arguably unfair to the latter. Indeed, when they pursue such a desirable recommendation (e.g., by applying for a job), they are unlikely to be successful. To quantify fairness in such settings, we introduce inferiority: a novel (un)fairness measure that quantifies the competitive disadvantage of a user for their recommended items. Inferiority is complementary to envy: a previously-proposed fairness notion that quantifies the extent to which a user prefers other users’ recommendations over their own. We propose to use both inferiority and envy in combination with an accuracy-related measure called utility: the aggregated relevancy scores of the recommended items. Unfortunately, none of these three measures are differentiable, making it hard to optimize them, and restricting their immediate use to evaluation only. To remedy this, we reformulate them in the context of a probabilistic interpretation of recommender systems, resulting in differentiable versions. We show how these loss functions can be combined in a multi-objective optimization problem that we call FEIR (Fairness through Envy and Inferiority Reduction), used as a post-processing of the scores from any standard recommender system. Experiments on synthetic and real-world data show that the proposed approach effectively improves the trade-offs between inferiority, envy and utility, compared to the naive recommendation and the state-of-the-art method for the related problem of congestion alleviation in job recommendation. We discuss and enhance the practical impact of our findings on a wide range of real-world recommendation scenarios, and we offer implementations of visualization tools to render the envy and inferiority metrics more accessible.

Full text in ACM Digital Library
Consumer Social Connectedness and Persuasiveness of Collaborative-Filtering Recommender Systems: Evidence From an Online-to-Offline Recommendation App
by Panagiotis Adamopoulos, Vilma Todri

Consumers often rely on their social connections or social technologies, such as (automated) system-generated recommender systems, to navigate the proliferation of diverse products and services offered in online and offline markets and cope with the corresponding choice overload. In this study, we investigate the relationship between the consumers’ social connectedness and the economic impact of recommender systems. Specifically, we examine whether the social connectedness levels of consumers moderate the effectiveness of online recommendations toward increasing product demand levels. We study this novel research question using a combination of datasets and a demand-estimation model. Interestingly, the empirical results show a positive moderating effect of social connectedness on the demand effect of online-to-offline recommendations. Further delving into the findings, we also provide empirical evidence that social identification might explain why denser social connectedness with local users accentuates the effects of collaborative-filtering online-to-offline recommendations. Our study enhances the understanding of community factors affecting the efficacy of social technologies in multichannel operations while also extending the social identity theory in operations in the digital realm. The results also have intriguing operational implications for operations managers and practitioners, while suggesting several interesting avenues for future research on social technologies and operations management.

Full text in ACM Digital Library
Sequential recommendation by reprogramming pretrained transformer
by Min Tang, Shujie Cui, Zhe Jin, Shiuan-Ni Liang, Chenliang Li, Lixin Zou

Inspired by the success of Pre-trained language models (PLMs), numerous sequential recommenders attempted to replicate their achievements by employing PLMs’ efficient architectures for building large models and using self-supervised learning for broadening training data. Despite their success, there is curiosity about developing a large-scale sequential recommender system since existing methods either build models within a single dataset or utilize text as an intermediary for alignment across different datasets. However, due to the sparsity of user–item interactions, unalignment between different datasets, and lack of global information in the sequential recommendation, directly pre-training a large foundation model may not be feasible. Towards this end, we propose the RecPPT that firstly employs the GPT-2 to model historical sequences by training the input item embedding and the output layer from scratch, which avoids training a large model on the sparse user–item interactions. Additionally, to alleviate the burden of unalignment, the RecPPT is equipped with a reprogramming module to reprogram the target embedding to existing well-trained proto-embeddings. Furthermore, RecPPT integrates global information into sequences by initializing the item embedding using an SVD-based initializer. Extensive experiments over four datasets demonstrated the RecPPT achieved an average improvement of 6.5% on NDCG@5, 6.2% on NDCG@10, 6.1% on Recall@5, and 5.4% on Recall@10 compared to the baselines. Particularly in few-shot scenarios, the significant improvements in NDCG@10 confirm the superiority of the proposed method.

Full text in ACM Digital Library
Mirror, Mirror: Exploring Stereotype Presence Among Top-N Recommendations That May Reach Children
by Robin Ungruh, Murtadha Al Nahadi, Maria Soledad Pera

Children form stereotypes by observing stereotypical expressions during childhood, influencing their future beliefs, attitudes, and behavior. These perceptions, often negative, can surface across the many online media platforms that children access, like streaming services and social media. Given that many of the items displayed on these platforms are commonly selected by recommendation algorithms (RAs), it becomes critical to investigate their role in suggesting items that could negatively impact this vulnerable population. We address this concern by conducting an empirical evaluation to gauge the presence of Gender, Race, and Religion stereotypes among the top-10 recommendations generated by a wide range of RAs across two well-known datasets in different domains: Movielens (movies) and GoodReads (books). Results analyses reveal that all RAs frequently recommend stereotypical items. Gender stereotypes are particularly prevalent, appearing in almost every recommendation list and emerging as the most common stereotype. Our results indicate that no algorithm has a consistent tendency towards recommending more stereotypical content; instead, high stereotype presence can be found across recommendation strategies. Outcomes from this work underscore the potential risks that RAs pose to children in perpetuating and reinforcing harmful stereotypes—this prompts reflections on their implications for the design and evaluation of recommender systems.

Full text in ACM Digital Library

Back to program

Session 10: Women in RecSys

RecSys 2025 (Prague)

Diamond Supporter

Platinum Supporter

Gold Supporter

Bronze Supporter

Challenge Supporter

Women in RecSys’s Event Supporter

Breakfast Symposium

Coffee Break Sponsor

Special Supporters

About this site

RecSys 2026