Accepted Contributions

 

List of all full papers accepted for RecSys 2025 (in alphabetical order).

  • RESA Language Model-Based Playlist Generation Recommender System
    by Enzo Charolois-Pasqua (EURECOM), Eléa Vellard (EURECOM), Youssra Rebboud (EURECOM), Pasquale Lisena (EURECOM), Raphaël Troncy (EURECOM)

    The title of a playlist reflects its intended mood or theme, allowing creators to easily locate their content and enabling other users to discover music that matches specific situations and needs. This study introduces a novel approach to playlist generation using language models to leverage the thematic coherence between a playlist title and its tracks. Our method involves creating semantic clusters from text embeddings, followed by fine-tuning a transformer model on these thematic clusters. Playlists are generated by evaluating cosine similarity scores between known and unknown titles and applying a voting mechanism. Performance evaluation, combining quantitative and qualitative metrics, demonstrates that using the playlist title as a seed provides useful recommendations, even in a zero-shot scenario.

  • RESA Multi-Factor Collaborative Prediction for Review-based Recommendation
    by Junrui Liu (Beijing University of Technology), Tong Li (Beijing University of Technology), Mingliang Yu (TravelSky Technology Limited), Shiqiu Yang (Beijing University of Technology), Zifang Tang (Beijing University of Technology), Zhen Yang (Beijing University of Technology)

    In user behaviors, the higher the click-through rate, the higher the rating. Thus, existing recommendation methods implicitly model click behaviors by learning user preferences and achieving accurate predictions on rating prediction tasks. However, they ignore the help of the rating behaviors for the click-through rate prediction task (CTR). Although the rating behavior occurs after the click behavior, we can still get helpful information about clicks from ratings. In this paper, we propose a multi-factor collaborative prediction method (MFC), which mines the complex relationship between click and rating behaviors, achieving accurate prediction on CTR tasks. Specifically, we factorize the complex relationship into three simple relationships, i.e., linear, sharing, and cross-correlation relationships. Thus, MFC first extracts click factors, rating factors, and their sharing factor from user click and rating behaviors with user reviews, as review-based methods have achieved great results on rating predictions. Then, a rating factor regularization method is used to learn rating factors accurately, helping to model the true relationships between click and rating behavior. Finally, MFC combines those three factors to make predictions, while click and rating factors are used to model the linear and cross-correlation relationships, and the sharing factors correspond to the sharing relation. Experiments on five real-world datasets demonstrate that MFC outperforms the best baseline by 9.19%, 9.80%, 0.69%, and 7.95%, in terms of Accuracy, Precision, Recall, and F1-score, respectively. MFC also reduces the MAE of the rating prediction task by 1.92%.

  • RESA Non-Parametric Choice Model That Learns How Users Choose Between Recommended Options
    by Thorsten Krause (Radboud University), Harrie Oosterhuis (Radboud University)

    Choice models predict which items users choose from presented options. In recommendation settings, they can infer user preferences while countering exposure bias. In contrast with traditional univariate recommendation models, choice models consider which competitors appeared with the chosen item. This ability allows them to distinguish whether a user chose an item due to preference, i.e., they liked it; or competition, i.e., it was the best available option. Each choice model assumes specific user behavior, e.g., the multinomial logit model. However, it is currently unclear how accurately these assumptions capture actual user behavior, how wrong assumptions impact inference, and whether better models exist. In this work, we propose the learned choice model for recommendation (LCM4Rec), a non-parametric method for estimating the choice model. By applying kernel density estimation, LCM4Rec infers the most likely error distribution that describes the effect of inter-item cannibalization and thereby characterizes the users’ choice model. Thus, it simultaneously infers what users prefer and how they make choices. Our experimental results indicate that our method (i) can accurately recover the choice model underlying a dataset; (ii) provides robust user preference inference, in contrast with existing choice models that are only effective when their assumptions match user behavior; and (iii) is more resistant against exposure bias than existing choice models. Thereby, we show that learning choice models, instead of assuming them, can produce more robust predictions. We believe this work provides an important step towards better understanding users’ choice behavior.

  • RESAffect-aware Cross-Domain Recommendation for Art Therapy via Music Preference Elicitation
    by Bereket A. Yilma (University of Luxembourg), Luis A. Leiva (University of Luxembourg)

    Art Therapy (AT) is an established practice that facilitates emotional processing and recovery through creative expression. Recently, Visual Art Recommender Systems (VA RecSys) have emerged to support AT, demonstrating their potential by personalizing therapeutic artwork recommendations. Nonetheless, current VA RecSys rely on visual stimuli for user modeling, limiting their ability to capture the full spectrum of emotional responses during preference elicitation. Previous studies have shown that music stimuli elicit unique affective reflections, presenting an opportunity for cross-domain recommendation (CDR) to enhance personalization in AT. Since CDR has not yet been explored in this context, we propose a family of CDR methods for AT based on music-driven preference elicitation. A large-scale study with 200 users demonstrates the efficacy of music-driven preference elicitation, outperforming the classic visual-only elicitation approach.

  • RESAn Off-Policy Learning Approach for Steering Sentence Generation towards Personalization
    by Haruka Kiyohara (Cornell University), Daniel Cao (Cornell University), Yuta Saito (Cornell University), Thorsten Joachims (Cornell University)

    We study the problem of personalizing the output of a large language model (LLM) by training on logged bandit feedback (e.g., personalizing movie descriptions based on likes). While one may naively treat this as a standard off-policy contextual bandit problem, the large action space and the large parameter space make naive applications of off-policy learning (OPL) infeasible. We overcome this challenge by learning a prompt policy for a frozen LLM that has only a modest number of parameters. The proposed Direct Sentence Off-policy gradient (DSO) effectively propagates the gradient to the prompt policy space by leveraging the smoothness and overlap in the sentence space. Consequently, DSO substantially reduces variance while also suppressing bias. Empirical results on our newly established suite of benchmarks, called OfflinePrompts, demonstrate the effectiveness of the proposed approach in generating personalized descriptions for movie recommendations, particularly when the number of candidate prompts and reward noise are large.

  • RESAuditing Recommender Systems for User Empowerment in Very Large Online Platforms under the Digital Services Act
    by Matteo Fabbri (IMT School for Advanced Studies Lucca), Ludovico Boratto (University of Cagliari)

    The governance of recommender systems (RSs) in very large online platforms (VLOPs) is expected to undergo a major transformation under the Digital Services Act (DSA), which imposes new obligations on transparency and user control. However, beyond legal compliance, a critical question remains: How can RSs be reimagined to genuinely empower users and foster meaningful personalization? This paper addresses this question by analyzing how three major short-video platforms—Instagram, TikTok, and YouTube—have implemented the DSA requirements for RSs. By reviewing their audit reports, systemic risk assessments and compliance strategies, we evaluate the extent to which current approaches enhance user autonomy and control over content exposure. Building on this analysis, we outline a perspective for the future of VLOPs’ RSs grounded in speculative design. We argue that meaningful personalization should integrate algorithmic choice, balancing proportionality and granularity in RS customization, and content curation, ensuring authoritativeness and diversity to mitigate systemic risks. By bridging legal analysis, platform governance, and user-centered design, this paper outlines actionable pathways for aligning technical developments with regulatory objectives. Our findings contribute to interdisciplinary research on RSs by highlighting how platforms can move beyond minimal compliance toward a model that prioritizes user empowerment and content pluralism.

  • RESBeyond Immediate Click: Engagement-Aware and MoE-Enhanced Transformers for Sequential Movie Recommendation
    by Haotian Jiang (Amazon Prime Video), Sibendu Paul (Amazon Prime Video), Haiyang Zhang (Amazon Prime Video), Caren Chen (Amazon Prime Video)

    Modern video streaming services heavily rely on recommender systems. Although there are many methods for content personalization and recommendation, sequential recommendation models stand out due to their ability to summarize user behavior over time. We propose a novel sequential recommendation framework to address the following key issues: suboptimal negative sampling strategies, fixed user-history context lengths, and single-task optimization objectives, insufficient engagement-aware learning, and short-sighted prediction horizons, ultimately improving both immediate and multi-step next-title prediction for video streaming services. In this work, we propose a novel approach to capture patterns of interaction at different time scales. We also align long-term user happiness with instantaneous intent signals using multi-task learning with engagement-aware personalized loss. Finally, we extend traditional next-item prediction into a next-K forecasting task using a training strategy with soft positive label. Extensive experiments on large-scale streaming data validate the effectiveness of our approach. Our best model outperforms the baseline in NDCG@1 by up to 3.52% under realistic ranking scenarios showing the effectiveness of our engagement-aware and MoE-enhanced designs. Results also show that soft-label Multi-K training is a practical and scalable extension, and that a balanced personalized negative sampling strategy generalizes well. Our framework outperforms baselines across all ranking metrics, providing a robust solution for production-scale streaming recommendations.

  • RESBreaking Knowledge Boundaries: Cognitive Distillation-enhanced Cross-Behavior Course Recommendation Model
    by Ruoyu Li (Xidian University), Yangtao Zhou (Xidian University), Chenzhang Li (Xidian University), Hua Chu (Xidian University), Jianan Li (Xidian University), Yuhan Bian (Xidian University)

    Online Course Recommendation (CR) stands as a promising educational strategy within online education platforms, with the goal of providing personalized learning experiences for learners and enhancing their learning efficiency. Existing CR methods focus on modeling learners’ learning needs from their historical course interactions by adopting general recommendation techniques, but fail to consider the shifts in course preferences caused by cognitive states. While Cognitive Diagnosis (CD) techniques are adept at tracking cognitive states’ evolution via mining learner-exercise interactions and benefit the CR task, it is non-trivial to integrate CD and CR properly due to several challenges, including accurate diagnosis, divergent task objectives, and inconsistent data magnitude. To address these challenges, we propose a Cognitive Distillation-enhanced Cross-Behavior Course Recommendation model (C3Rec), which aims to transfer the knowledge of learners’ cognitive states to enhance the CR task. Specifically, for accurate diagnosis, we introduce a dual-granularity cognitive diagnosis module to capture learner representations at both coarse and fine granularities, thereby achieving a comprehensive construction of learners’ cognitive states. For divergent task objectives, we design a cross-behavior course recommendation module to jointly profile the dynamic course preferences from two temporal interleaved learning behaviors, achieving the seamlessly semantic alignment between these two tasks. For inconsistent data magnitude, we introduce a triple-stage distillation mechanism to exploit cognitive state features as prior knowledge, enhancing the CR task by further profiling learners’ course preferences. Experimental comparisons with multiple state-of-the-art methods on two real-world educational datasets demonstrate the effectiveness of our model.

  • RESEnhancing Online Video Recommendation via a Coarse-to-fine Dynamic Uplift Modeling Framework
    by Chang Meng (Kuaishou Technology), Chenhao Zhai (Tsinghua University), Xueliang Wang (Kuaishou Technology), Shuchang Liu (Kuaishou Technology), Xiaoqiang Feng (Kuaishou Technology), Lantao Hu (Kuaishou Technology), Xiu Li (Tsinghua University), Han Li (Kuaishou Technology), Kun Gai (Kuaishou Technology)

    The popularity of short video applications has brought new opportunities and challenges to video recommendation. In addition to the traditional ranking-based pipeline, industrial solutions usually introduce additional distribution management components to guarantee a diverse and content-rich user experience. However, existing solutions are either non-personalized or fail to generalize well to the ever-changing user preferences. Inspired by the success of uplift modeling in online marketing, we attempt to implement uplift modeling in the video recommendation scenario to mitigate the problems. However, we face two main challenges when migrating the technique: 1) the complex-response causal relation in distribution management problem, and 2) the modeling of long-term and real-time user preferences. To address these challenges, we correspond each treatment to a specific adjustment of the distribution over video types, then propose a Coarse-to-fine Dynamic Uplift Modeling (CDUM) framework for real-time video recommendation scenarios. Specifically, CDUM consists of two modules, a coarse-grained module that utilizes the offline features of users to model their long-term preferences, and a fine-grained module that leverages online real-time contextual features and request-level candidates to model users’ real-time interests. These two modules collaboratively and dynamically identify and target specific user groups, and then apply treatments effectively. We conduct comprehensive experiments on two offline public datasets, an industrial offline dataset, and an online A/B test, demonstrating the superiority and effectiveness of CDUM. The proposed method is fully deployed on a large-scale short video platform, serving hundreds of millions of users every day. We plan to make source code available after the paper is accepted.

  • RESEnhancing Sequential Recommender with Large Language Models for Joint Video and Comment Recommendation
    by Bowen Zheng (Renmin University of China), Zihan Lin (Kuaishou Technology), Enze Liu (Renmin University of China), Chen Yang (Renmin University of China), Enyang Bai (Kuaishou Technology), Cheng Ling (Kuaishou Technology), Han Li (Kuaishou Technology), Wayne Xin Zhao (Renmin University of China), Ji-Rong Wen (Renmin University of China)

    Nowadays, reading or writing comments on captivating videos has emerged as a critical part of the viewing experience on online video platforms. However, existing recommender systems primarily focus on users’ interaction behaviors with videos, neglecting comment content and interaction in user preference modeling. In this paper, we propose a novel recommendation approach called LSVCR that utilizes user interaction histories with both videos and comments to jointly perform personalized video and comment recommendation. Specifically, our approach comprises two key components: sequential recommendation (SR) model and supplemental large language model (LLM) recommender. The SR model functions as the primary recommendation backbone (retained in deployment) of our method for efficient user preference modeling. Concurrently, we employ a LLM as the supplemental recommender (discarded in deployment) to better capture underlying user preferences derived from heterogeneous interaction behaviors. In order to integrate the strengths of the SR model and the supplemental LLM recommender, we introduce a two-stage training paradigm. The first stage, personalized preference alignment, aims to align the preference representations from both components, thereby enhancing the semantics of the SR model. The second stage, recommendation-oriented fine-tuning, involves fine-tuning the alignment-enhanced SR model according to specific objectives. Extensive experiments in both video and comment recommendation tasks demonstrate the effectiveness of LSVCR. Moreover, online A/B testing on a real-world video platform verifies the practical benefits of our approach. In particular, we attain a cumulative gain of 4.13\% in comment watch time.

  • RESEnhancing Transferability and Consistency in Cross-Domain Recommendations via Supervised Disentanglement
    by Yuhan Wang (Wuhan University of Technology), Qing Xie (Wuhan University of Technology), Zhifeng Bao (School of Computing Technologies, RMIT University), Mengzi Tang (Wuhan University of Technology), Lin Li (Wuhan University of Technology), Yongjian Liu (Wuhan University of Technology)

    Cross-domain recommendation (CDR) aims to alleviate the data sparsity by transferring knowledge across domains. Disentangled representation learning provides an effective solution to model complex user preferences by separating intra-domain features (domain-shared and domain-specific features), thereby enhancing robustness and interpretability. However, disentanglement-based CDR methods employing generative modeling or GNNs with contrastive objectives face two key challenges: (i) pre-separation strategies decouple features before extracting collaborative signals, disrupting intra-domain interactions and introducing noise; (ii) unsupervised disentanglement objectives lack explicit task-specific guidance, resulting in limited consistency and suboptimal alignment. To address these challenges, we propose DGCDR, a GNN-enhanced encoder-decoder framework. For challenge (i), DGCDR first applies GNN to extract high-order collaborative signals, providing enriched representations as a robust foundation for disentanglement. The encoder then dynamically disentangles features into domain-shared and -specific spaces, preserving collaborative information during the separation process. To handle challenge (ii), the decoder introduces an anchor-based supervision mechanism that leverages hierarchical feature relationships to enhance intra-domain consistency and cross-domain alignment. Extensive experiments on real-world datasets demonstrate that DGCDR achieves state-of-the-art performance, with improvements of up to 11.59% across key metrics. Qualitative analyses further validate its superior disentanglement quality and transferability.

  • RESExploring Scaling Laws of CTR Model for Online Performance Improvement
    by Weijiang Lai (Institute of Software,Chinese Academy of Sciences), Beihong Jin (Institute of Software Chinese Academy of Sciences), Jiongyan Zhang (Meituan), Yiyuan Zheng (Institute of Software Chinese Academy of Sciences), Jian Dong (Meituan), Jia Cheng (Meituan), Jun Lei (Meituan), Xingxing Wang (Meituan)

    Click-Through Rate (CTR) models play a vital role in improving user experience and boosting business revenue in many online personalized services. However, current CTR models generally encounter bottlenecks in performance improvement. Inspired by the scaling law phenomenon of Large Language Models (LLMs), we propose a new paradigm for improving CTR predictions: first, constructing a CTR model with accuracy scalable to the model grade and data size, and then distilling the knowledge implied in this model into its lightweight model that can serve online users. To put it into practice, we construct a CTR model named SUAN (Stacked Unified Attention Network). In SUAN, we propose the unified attention block (UAB) as a behavior sequence encoder. A single UAB unifies the modeling of the sequential and non-sequential features and also measures the importance of each user behavior feature from multiple perspectives. Stacked UABs elevate the configuration to a high grade, paving the way for performance improvement. In order to benefit from the high performance of the high-grade SUAN and avoid the disadvantage of its long inference time, we modify the SUAN with sparse self-attention and parallel inference strategies to form LightSUAN, and then adopt online distillation to train the low-grade LightSUAN, taking a high-grade SUAN as a teacher. The distilled LightSUAN has superior performance but the same inference time as the LightSUAN, making it well-suited for online deployment. Experimental results show that SUAN performs exceptionally well and holds the scaling laws spanning three orders of magnitude in model grade and data size, and the distilled LightSUAN outperforms the SUAN configured with one grade higher. More importantly, the distilled LightSUAN has been integrated into an online service, increasing the CTR by 2.81\% and CPM by 1.69\% while keeping the average inference time acceptable.

  • RESGRACE: Generative Recommendation via Journey-Aware Sparse Attention on Chain-of-Thought Tokenization
    by Luyi Ma (Walmart Global Tech), Wanjia Zhang (Walmart Global Tech), Kai Zhao (Walmart Global Tech), Abhishek Kulkarni (Walmart Global Tech), Lalitesh Morishetti (Walmart Global Tech), Anjana Ganesh (Walmart Global Tech), Ashish Ranjan (Walmart Global Tech), Aashika Padmanabhan (Walmart Global Tech), Jianpeng Xu (Walmart Global Tech), Jason H.D. Cho (Walmart Global Tech), Praveen Kumar Kanumala (Walmart Inc), Kaushiki Nag (Walmart), Sumit Dutta (Walmart Global Tech), Kamiya Motwani (Walmart Global Tech), Malay Patel (Walmart Global Tech), Evren Korpeoglu (Walmart), Sushant Kumar (Walmart Global Tech), Kannan Achan (Walmart Global Tech)

    Generative models have recently demonstrated strong potential in multi-behavior recommendation systems, leveraging the expressive power of transformers and tokenization to generate personalized item sequences. However, their adoption is hindered by (1) the lack of explicit information for token reasoning, (2) high computational costs due to quadratic attention complexity and dense sequence representations after tokenziation, and (3) limited multi-scale modeling over user history. In this work, we propose GRACE (Generative Recommendation via journey-aware sparse Attention on Chain-of-thought tokEnization), a novel generative framework for multi-behavior sequential recommendation. GRACE introduces a hybrid Chain-of-Thought (CoT) tokenization method that encodes user-item interactions with explicit attributes from product knowledge graphs (e.g., category, brand, price) over semantic tokenization, enabling interpretable and behavior-aligned generation. To address the inefficiency of standard attention, we design a Journey-Aware Sparse Attention (JSA) mechanism, which selectively attends to compressed, intra-, inter-, and current-context segments in the tokenized sequence. Experiments on two real-world datasets show that GRACE significantly outperforms state-of-the-art baselines, achieving up to +106.9% HR@10 and +106.7% NDCG@10 improvement over state-of-the-art baseline on the Home domain, and +22.1% HR@10 on the Electronics domain. GRACE also reduces attention computation by up to 48% with long sequences.

  • RESGenSAR: Unifying Balanced Search and Recommendation with Generative Retrieval
    by Teng Shi (Renmin University of China), Jun Xu (Renmin University of China), Xiao Zhang (Renmin University of China), Xiaoxue Zang (Kuaishou Technology Co., Ltd.), Kai Zheng (Kuaishou Technology Co.. Ltd.), Yang Song (Kuaishou Technology Co., Ltd.), Enyun Yu (Independent)

    Many commercial platforms provide both search and recommendation (S&R) services to meet different user needs. This creates an opportunity for joint modeling of S&R. Although many joint S&R studies have demonstrated the advantages of integrating S&R, they have also identified a trade-off between the two tasks. That is, when recommendation performance improves, search performance may decline, or vice versa. This trade-off stems from the different information requirements: search prioritizes the semantic relevance between the queries and the items, while recommendation heavily relies on the collaborative relationship between users and items. To balance semantic and collaborative information and mitigate this trade-off, two main challenges arise: (1) How to incorporate both semantic and collaborative information in item representations. (2) How to train the model to understand the different information requirements of S&R. The recent rise of generative retrieval based on Large Language Models (LLMs) for S&R offers a potential solution. Generative retrieval represents each item as an identifier, allowing us to assign multiple identifiers to each item to capture both semantic and collaborative information. Additionally, generative retrieval formulates both S&R as sequence-to-sequence tasks, enabling us to unify different tasks through varied prompts, thereby helping the model better understand the requirements of each task. Based on this, we propose GenSAR, a method that unifies balanced S&R through generative retrieval. We design joint S&R identifiers and training tasks to address the above challenges, mitigate the trade-off between S&R, and further improve both tasks. Experimental results on a public dataset and a commercial dataset validate the effectiveness of GenSAR.

  • RESHeterogeneous User Modeling for LLM-based Recommendation
    by Honghui Bao (National University of Singapore), Wenjie Wang (University of Science and Technology of China), Xinyu Lin (National University of Singapore), Fengbin Zhu (National University of Singapore), Teng Sun (Shandong University), Fuli Feng (University of Science and Technology of China), Tat-Seng Chua (National University of Singapore)

    Leveraging Large Language Models (LLMs) for recommendation has demonstrated notable success in various domains, showcasing their potential for open-domain recommendation. A key challenge to advancing open-domain recommendation lies in effectively modeling user preferences within users’ heterogeneous behaviors across multiple domains. Existing approaches, including ID-based and semantic-based modeling, struggle with poor generalization, an inability to compress noisy interactions effectively, and the domain seesaw phenomenon. To address these challenges, we propose a Heterogeneous User Modeling (HUM) method, which incorporates a compression enhancer and a robustness enhancer for LLM-based recommendation. The compression enhancer uses a customized prompt to compress heterogeneous behaviors into a tailored token, while a masking mechanism enhances cross-domain knowledge extraction and understanding. The robustness enhancer introduces a domain importance score to mitigate the domain seesaw phenomenon by guiding domain optimization. Extensive experiments on heterogeneous datasets validate that HUM effectively models user heterogeneity by achieving both high efficacy and robustness, leading to superior performance in open-domain recommendation.

  • RESHierarchical Graph Information Bottleneck for Multi-Behavior Recommendation
    by Hengyu Zhang (The Chinese University of Hong Kong), Chunxu Shen (WeChat, Tencent), Xiangguo Sun (The Chinese University of Hong Kong), Jie Tan (The Chinese University of Hong Kong), Yanchao Tan (Fuzhou University), Yu Rong (The Chinese University of Hong Kong), Hong Cheng (The Chinese University of Hong Kong), Lingling Yi (WeChat, Tencent)

    In real-world recommendation scenarios, users typically engage with platforms through multiple types of behavioral interactions. Multi-behavior recommendation algorithms aim to leverage various auxiliary user behaviors to enhance prediction for target behaviors of primary interest (e.g., buy), thereby overcoming performance limitations caused by data sparsity in target behavior records. Current state-of-the-art approaches typically employ hierarchical design following either cascading (e.g., view→cart→buy) or parallel (unified→behavior→specific components) paradigms, to capture behavioral relationships. However, these methods still face two critical challenges: (1) severe distribution disparities across behaviors, and (2) negative transfer effects caused by noise in auxiliary behaviors. In this paper, we propose a novel model-agnostic Hierarchical Graph Information Bottleneck (HGIB) framework for multi-behavior recommendation to effectively address these challenges. Following information bottleneck principles, our framework optimizes the learning of compact yet sufficient representations that preserve essential information for target behavior prediction while eliminating task-irrelevant redundancies. To further mitigate interaction noise, we introduce a Graph Refinement Encoder (GRE) that dynamically prunes redundant edges through learnable edge dropout mechanisms. We conduct comprehensive experiments on three real-world public datasets, which demonstrate the superior effectiveness of our framework. Beyond these widely used datasets in the academic community, we further expand our evaluation on several real industrial scenarios, showing again a significant improvement in multi-behavior recommendations.

  • RESHow Do Users Perceive Recommender Systems’ Objectives?
    by Patrik Dokoupil (Faculty of Mathematics and Physics, Charles University), Ludovico Boratto (University of Cagliari), Ladislav Peska (Faculty of Mathematics and Physics, Charles University)

    Multi-objective recommender systems (MORS) aim to optimize multiple criteria while generating recommendations, such as relevance, novelty, diversity, or exploration. These algorithms are based on the assumption that an operationalization of these criteria (i.e., translating abstract goals into measurable metrics), will reflect how users perceive them. Nevertheless, such beliefs are rarely rigorously evaluated, which can lead to a mismatch between algorithmic goals and user satisfaction. Moreover, if users are allowed to control the RS via their propensities towards such objectives, the misconceptions may further impact users’ trust and engagement. To characterize this problem, we conduct a large user study focusing on recommender systems in two domains: books and movies. Part of the study is focused on how users perceive different recommendation objectives, which we compared with well-established metrics aiming at the same objectives. We found that despite such metrics correlating to some extent with users’ perceptions, the mapping is far from perfect. Moreover, we also report on conceptual-level differences in users’ understanding of RS objectives and how this affects the results.

  • RESIP2: Entity-Guided Interest Probing for Personalized News Recommendation
    by Youlin Wu (Dalian University of Technology), Yuanyuan Sun (Dalian University of Technology), Xiaokun Zhang (City University of Hong Kong), Haoxi Zhan (Dalian University of Technology), Bo Xu (Dalian University of Technology), Liang Yang (Dalian University of Technology), Hongfei Lin (Dalian University of Technology)

    News recommender systems aim to deliver personalized news articles for users based on their reading history. Previous behavior study suggested that screen-based news reading contains three successive steps: scanning, title reading, and then clicking. Adhere to these steps, we find that intra-news entity interest dominates the scanning stage, while inter-news entity interest guides title reading and influences click decisions. Unfortunately, current methods overlook the unique utility of entities in news recommendation. To this end, we propose a novel method IP2 to probe entity-guided reading interest at both intra- and inter-news levels. At intra-news level, a transformer-based entity encoder is devised to aggregate mentioned entities in the news title into one signature entity. Then, a signature entity-title contrastive pre-training is adopted to initialize entities with proper meanings in the news context, which in the meantime facilitates us to probe for intra-news entity interest. As for the inter-news level, a dual tower user encoder is presented to capture inter-news reading interest from both title meaning and entity sides. In addition, to highlight the contribution of inter-news entity guidance, a cross-tower attention link is adopted to calibrate title reading interest using inter-news entity interest, thus further aligning with real-world behavior. Extensive experiments on two real-world datasets demonstrate that our IP2 achieves state-of-the-art performance in news recommendation.

  • RESIntegrating Individual and Group Fairness for Recommender Systems through Social Choice
    by Amanda Aird (University of Colorado Boulder), Elena Štefancová (Comenius University Bratislava), Anas Buhayh (University of Colorado Boulder), Cassidy All (Department of Information Science; University of Colorado, Boulder), Martin Homola (Comenius University Bratislava), Nicholas Mattei (Tulane University), Robin Burke (University of Colorado, Boulder)

    Fairness in recommender systems is a complex concept, involving multiple definitions of fairness, different parties for whom fairness is sought, and various scopes over which fairness might be measured. Researchers have derived a variety of solutions, usually highly tailored to specific choices along each of these dimensions, and typically aimed at tackling a single fairness concern. However, in practical contexts, we find a multiplicity of fairness concerns within a given recommendation application. We explore a general solution to recommender system fairness using social choice methods to integrate multiple heterogeneous fairness definitions. In this paper, we extend group-fairness results from prior research to provider-side individual fairness, demonstrating in multiple datasets that both individual and group fairness objectives can be integrated and optimized jointly. We identify both synergies and tensions among different fairness objectives with individual fairness correlated with group fairness for some groups and anti-correlated with others.

  • RESLANCE: Exploration and Reflection for LLM-based Textual Attacks on News Recommender Systems
    by Yuyue Zhao (University of Science and Technology of China), Jin Huang (University of Cambridge), Shuchang Liu (Rutgers University), Jiancan Wu (University of Science and Technology of China), Xiang Wang (University of Science and Technology of China), Maarten de Rijke (University of Amsterdam)

    News recommender systems rely on rich textual information from news articles to generate user-specific recommendations. This reliance may expose these systems to potential vulnerabilities through textual attacks. To explore this vulnerability, we propose LANCE, a LArge language model-based News Content rEwriting framework, designed to influence news rankings and highlight the unintended promotion of manipulated news. LANCE consists of two key components: an explorer and a reflector. The explorer first generates rewritten news using diverse prompts, incorporating different writing styles, sentiments, and personas. We then collect these rewrites, evaluate their ranking impact within news ecommender systems, and apply a filtering mechanism to retain effective rewrites. Next, the reflector fine-tunes an open-source LLM using the successful rewrites, enhancing its ability to generate more effective textual attacks. Experimental results demonstrate the effectiveness of LANCE in manipulating rankings within news ecommender systems. Unlike attacks in other recomendation domains, negative and neutral rewrites consistently outperform positive ones, revealing a unique vulnerability specific to news recommendation. Once trained, LANCE successfully attacks unseen news ecommender systems, highlighting its generalization ability and exposing shared vulnerabilities across different systems. Our work underscores the urgent need for research on textual attacks and paves the way for future studies on defense strategies.

  • RESLEAF: Lightweight, Efficient, Adaptive and Flexible Embedding for Large-Scale Recommendation Models
    by Chaoyi Jiang (University of Southern California), Abdulla Alshabanah (University of Southern California), Murali Annavaram (University of Southern California)

    Deep Learning Recommendation Models (DLRMs) are central to modeling user behavior, enhancing user experience, and boosting revenues for internet companies. DLRMs rely heavily on embedding tables, which scale to tens of terabytes as the number of users and features grows, presenting challenges in training and storage. These models typically require substantial GPU memory, as embedding operations are not compute-intensive but occupy significant storage. While some solutions have explored CPU storage, this approach still demands terabytes of memory. We introduce LEAF, a multi-level hashing framework that compresses the large embedding tables based on access frequency. In particular, LEAF leverages a streaming algorithm to estimate access distributions on the fly without relying on model gradients or requiring a priori knowledge of access distribution. By using multiple hash functions, LEAF minimizes collision rates of feature instances. Experiments show that LEAF outperforms state-of-the-art compression methods on Criteo Kaggle, Avazu, KDD12, and Criteo Terabyte datasets, with testing AUC improvements of 1.411\%, 1.885\%, 2.761\%, and 1.243\%, respectively.

  • RESLLM-RecG: A Semantic Bias-Aware Framework for Zero-Shot Sequential Recommendation
    by Yunzhe Li (University of Illinois, Urbana-Champaign), Junting Wang (University of Illinois at Urbana-Champaign), Hari Sundaram (University of Illinois at Urbana-Champaign), Zhining Liu (University of Illinois at Urbana Champaign)

    Zero-shot cross-domain sequential recommendation (ZCDSR) enables predictions in unseen domains without additional training or fine-tuning, addressing the limitations of traditional models in sparse data environments. Recent advancements in large language models (LLMs) have significantly enhanced ZCDSR by facilitating cross-domain knowledge transfer through rich, pretrained representations. Despite this progress, domain semantic bias—arising from differences in vocabulary and content focus between domains—remains a persistent challenge, leading to misaligned item embeddings and reduced generalization across domains. To address this, we propose a novel semantic bias-aware framework that enhances LLM-based ZCDSR by improving cross-domain alignment at both the item and sequential levels. At the item level, we introduce a generalization loss that aligns the embeddings of items across domains (inter-domain compactness), while preserving the unique characteristics of each item within its own domain (intra-domain diversity). This ensures that item embeddings can be transferred effectively between domains without collapsing into overly generic or uniform representations. At the sequential level, we develop a method to transfer user behavioral patterns by clustering source domain user sequences and applying attention-based aggregation during target domain inference. We dynamically adapt user embeddings to unseen domains, enabling effective zero-shot recommendations without requiring target-domain interactions. Extensive experiments across multiple datasets and domains demonstrate that our framework significantly enhances the performance of sequential recommendation models on the ZCDSR task. By addressing domain bias and improving the transfer of sequential patterns, our method offers a scalable and robust solution for better knowledge transfer, enabling improved zero-shot recommendations across domains.

  • RESLONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
    by Zheng Chai (ByteDance), Qin Ren (ByteDance), Xijun Xiao (ByteDance), Huizhi Yang (ByteDance), Bo Han (ByteDance), Sijun Zhang (ByteDance), Di Chen (ByteDance), Hui Lu (ByteDance), Wenlin Zhao (ByteDance), Lele Yu (ByteDance), Xionghang Xie (ByteDance), Shiru Ren (ByteDance), Xiang Sun (ByteDance), Yaocheng Tan (ByteDance), Peng Xu (ByteDance), Yuchao Zheng (ByteDance), Di Wu (ByteDance)

    Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.

  • RESLasso: Large Language Model-based User Simulator for Cross-Domain Recommendation
    by Yue Chen (College of Computer Science, Sichuan University), Susen Yang (kuaishou Technology), Tong Zhang (College of Computer Science, Sichuan University), Chao Wang (Kuaishou Technology), Mingyue Cheng (State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China), Chenyi Lei (Kuaishou Technology), Han Li (Kuaishou Technology)

    Cross-Domain Recommendation (CDR) aims to mitigate the cold-start problem in target domains by leveraging user interactions from source domains. However, existing CDR methods offer suffer from low data efficiency, as they require a substantial number of historical interactions from overlapping users for training, which is impractical in real-world scenarios. To address this challenge, we propose Lasso, a novel framework that leverages the large language model (LLM) as a user simulator to capture cross-domain user preferences based on the remarkable internal knowledge of the LLM. Specifically, we introduce a cross-domain training paradigm to fine-tune the LLM-based simulator, enabling it to simulate user behaviors in the target domain using historical interactions from the source domain. Furthermore, to enhance the efficiency and accuracy of Lasso, we propose two effective modules: Personalized Candidate Pool (PCP) and Confidence-Guided Inference (CGI). The PCP module employs cross-domain collaborative filtering to construct a tailored set of candidate items for simulating interactions of each cold-start user in the target domain, thereby improving the inference efficiency of the LLM. The CGI module utilizes confidence scores from the LLM to reduce noise in the simulated data, ensuring more accurate estimations. During the application phase, the simulated interactions serve as additional inputs for downstream recommendation models, effectively alleviating cold-start problems for users. Extensive experiments on public benchmark datasets and real-world industrial dataset demonstrate that Lasso achieves superior accuracy while requiring fewer historical interactions from overlapping users.

  • RESLeave No One Behind: Fairness-Aware Cross-Domain Recommender Systems for Non-Overlapping Users
    by Weixin Chen (Hong Kong Baptist University), Yuhan Zhao (Hong Kong Baptist University), Li Chen (Hong Kong Baptist University), Weike Pan (Shenzhen University)

    Cross-domain recommendation (CDR) methods predominantly leverage overlapping users to transfer knowledge from a source domain to a target domain. However, through empirical studies, we uncover a critical bias inherent in these approaches: while overlapping users experience significant enhancements in recommendation quality, non-overlapping users benefit minimally and even face performance degradation. This unfairness may erode user trust, and, consequently, negatively impact business engagement and revenue. To address this issue, we propose a novel solution that generates virtual source-domain users for non-overlapping target-domain users. Our method utilizes a dual attention mechanism to discern similarities between overlapping and non-overlapping users, thereby synthesizing realistic virtual user embeddings. We further introduce a limiter component that ensures the generated virtual users align with real-data distributions while preserving each user’s unique characteristics. Notably, our method is model-agnostic and can be seamlessly integrated into any CDR model. Comprehensive experiments conducted on three public datasets with five CDR baselines demonstrate that our method effectively mitigates the CDR non-overlapping user bias, without loss of overall accuracy.

  • RESMDSBR: Multimodal Denoising for Session-based Recommendation
    by Yutong Li (University College London), Xinyi Zhang (Imperial College London)

    Multimodal session-based recommendation (SBR) has emerged as a promising direction for capturing user intent using visual and textual item content. However, existing methods often overlook a fundamental issue: the modality features extracted from pre-trained models (e.g., BERT, CLIP) are inherently noisy and misaligned with user-specific preferences. This noise arises from label errors, task mismatch, and over-inclusion of irrelevant content, ultimately degrading recommendation quality. In this work, we propose a diffusion-based denoising framework that explicitly refines noisy pre-trained representations without full fine-tuning. By progressively removing noise through a structured denoising process, our Multimodal Denoising Diffusion Layer enhances task-specific semantics. Furthermore, we introduce two auxiliary modules: an Interest-Guided Denoising Layer that filters modality features using session context, and a Multimodal Alignment Layer that enforces cross-modal coherence. Extensive experiments on real-world datasets demonstrate that our model significantly outperforms state-of-the-art methods while maintaining practical training efficiency.

  • RESMapping Stakeholder Needs to Multi-Sided Fairness in Candidate Recommendation for Algorithmic Hiring
    by Mesut Kaya (Jobindex A/S), Toine Bogers (IT University of Copenhagen)

    Already before the enactment of the EU AI Act, candidate or job recommendation for algorithmic hiring—semi-automatically matching CVs to job postings—was used as an example of a high-risk application where unfair treatment could result in serious harms to job seekers. Recommending candidates to jobs or jobs to candidates, however, is also a fitting example of a multi-stakeholder recommendation problem. In such multi-stakeholder systems, the end user is not the only party whose interests should be considered when generating recommendations. In addition to job seekers, other stakeholders—such as recruiters, organizations behind the job postings, and the recruitment agency itself—are also stakeholders in this and deserve to have their perspectives included in the design of relevant fairness metrics. Nevertheless, past analyses of fairness in algorithmic hiring have been restricted to single-side fairness, ignoring the perspectives of the other stakeholders. In this paper, we address this gap and present a multi-stakeholder approach to fairness in a candidate recommender system that recommends relevant candidate CVs to human recruiters in a human-in-the-loop algorithmic hiring scenario. We conducted semi-structured inter- views with 40 different stakeholders (job seekers, companies, recruiters, and other job portal employees). We used these interviews to explore their lived experiences of unfairness in hiring, co-design definitions of fairness as well as metrics that might capture these experiences. Finally, we then attempt to reconcile and map these different (and sometimes conflicting) perspectives and definitions to existing (categories of) fairness metrics that are relevant for our candidate recommendation scenario.

  • RESMeasuring Interaction-Level Unlearning Difficulty for Collaborative Filtering
    by Haocheng Dou (Taiyuan University of Technology), Tao Lian (Taiyuan University of Technology), Xin Xin (Shandong University)

    The growing emphasis on data privacy and user controllability mandates that recommendation models support the removal of specified data, known as recommendation unlearning (RU). Although model retraining is often regarded as the gold standard for machine unlearning, it is inadequate to attain complete unlearning in collaborative filtering recommendation due to interdependency between user-item interactions. To this end, we introduce the concept of interaction-level unlearning difficulty, which serves as a foresighted indicator of the unlearning incompleteness or actual unlearning effectiveness after forgetting each interaction. Through extensive experiments with retraining and model-agnostic unlearning methods, we identify two interpretable data characteristics that can serve as useful unlearning difficulty indicators: Embedding Entanglement Index (EEI) and Subgraph Average Degree (AD). They have a strong correlation with existing membership inference metrics focusing on data removal as well as our proposed unlearning effectiveness metrics from the recommendation perspective—Score Shift, UnlearnMRR, and UnlearnRecall. In addition, we investigate the efficacy of an unlearning enhancement technique named Extra Deletion in handling unlearning requests of different difficulty levels. The results show that more related interactions need to be extra deleted to achieve acceptable unlearning effectiveness for difficult unlearning requests, while fewer or no extra deletions are needed for easier-to-forget requests. This study provides a novel perspective for advancing the development of more tailored RU methods.

  • RESMoRE: A Mixture of Reflectors Framework for Large Language Model-Based Sequential Recommendation
    by Weicong Qin (Gaoling School of Artificial Intelligence, Renmin University of China), Yi Xu (Gaoling School of Artificial Intelligence, Renmin University of China), Weijie Yu (School of Information Technology and Management, University of International Business and Economics), Chenglei Shen (Gaoling School of Artificial Intelligence, Renmin University of China), Xiao Zhang (Gaoling School of Artificial Intelligence, Renmin University of China), Ming He (AI Lab at Lenovo Research, Lenovo Group Limited), Jianping Fan (AI Lab at Lenovo Research, Lenovo Group Limited), Jun Xu (Gaoling School of Artificial Intelligence, Renmin University of China)

    Large language models (LLMs) have emerged as a cutting-edge approach in sequential recommendation, leveraging historical interactions to model dynamic user preferences. Current methods mainly focus on learning processed recommendation data in the form of sequence-to-sequence text. While effective, they exhibit three key limitations: 1) failing to decouple intra-user explicit features (e.g., product titles) from implicit behavioral patterns (e.g., brand loyalty) within interaction histories; 2) underutilizing cross-user collaborative filtering (CF) signals; and 3) relying on inefficient reflection update strategies. To address this, We propose MoRE (Mixture of REflectors), which introduces three perspective-aware offline reflection processes to address these gaps. This decomposition directly resolves Challenges 1 (explicit/implicit ambiguity) and 2 (CF underutilization). Furthermore, MoRE’s meta-reflector employs a self-improving strategy and a dynamic selection mechanism (Challenge 3) to adapt to evolving user preferences. First, two intra-user reflectors decouple explicit and implicit patterns from a user’s interaction sequence, mimicking traditional recommender systems’ ability to distinguish surface-level and latent preferences. A third cross-user reflector captures CF signals by analyzing user similarity patterns from multiple users’ interactions. To optimize reflection quality, MoRE’s meta-reflector employs a offline self-improving strategy that evaluates reflection impacts through comparisons of presence/absence and iterative refinement of old/new versions, with a online contextual bandit mechanism dynamically selecting the optimal perspective for recommendation for each user. Experiments on three benchmarks show MoRE outperforms both traditional recommenders and LLM-based methods with minimal computational overhead, validating its effectiveness in bridging LLMs’ semantic understanding with multidimensional recommendation principles.

  • RESModeling Long-term User Behaviors with Diffusion-driven Multi-interest Network for CTR Prediction
    by Weijiang Lai (Institute of Software,Chinese Academy of Sciences), Beihong Jin (Institute of Software Chinese Academy of Sciences), Yapeng Zhang (Meituan), Yiyuan Zheng (Institute of Software Chinese Academy of Sciences), Rui Zhao (Institute of Software Chinese Academy of Sciences), Jian Dong (Meituan), Jun Lei (Meituan), Xingxing Wang (Meituan)

    CTR (Click-Through Rate) prediction, crucial for recommender systems and online advertising, etc., has been confirmed to benefit from modeling long-term user behaviors. Nonetheless, the vast number of behaviors and complexity of noise interference pose challenges to prediction efficiency and effectiveness. Recent solutions have evolved from single-stage models to two-stage models. However, current two-stage models often filter out significant information, resulting in an inability to capture diverse user interests and build the complete latent space of user interests. Inspired by multi-interest and generative modeling, we propose DiffuMIN (Diffusion-driven Multi-Interest Network) to model long-term user behaviors and thoroughly explore the user interest space. Specifically, we propose a target-oriented multi-interest extraction method that begins by orthogonally decomposing the target to obtain interest channels. This is followed by modeling the relationships between interest channels and user behaviors to disentangle and extract multiple user interests. We then introduce a diffusion module guided by contextual interests and interest channels, which anchor users’ personalized and target-oriented interest types, enabling the generation of augmented interests that align with the latent spaces of user interests, thereby further exploring restricted interest space. Finally, we leverage contrastive learning to ensure that the generated augmented interests align with users’ genuine preferences. Extensive offline experiments are conducted on two public datasets and one industrial dataset, yielding results that demonstrate the superiority of DiffuMIN. Moreover, DiffuMIN increased CTR by 1.52\% and CPM by 1.10\% in online A/B testing.

  • RESMulti-Granularity Distribution Modeling for Video Watch Time Prediction via Exponential-Gaussian Mixture Network
    by Xu Zhao (Xiaohongshu), Ruibo Ma (Xiaohongshu), Jiaqi Chen (Xiaohongshu), Weiqi Zhao (Xiaohongshu), Ping Yang (Xiaohongshu), Yao Hu (Xiaohongshu)

    Accurate watch time prediction is crucial for enhancing user engagement in streaming short-video platforms, although it is challenged by complex distribution characteristics across multi-granularity levels. Through systematic analysis of real-world industrial data, we uncover two critical challenges in watch time prediction from a distribution aspect: (1) coarse-grained skewness induced by a significant concentration of quick-skips, (2) fine-grained diversity arising from various user-video interaction patterns. Consequently, we assume that the watch time follows the Exponential-Gaussian Mixture (EGM) distribution, where the exponential and Gaussian components respectively characterize the skewness and diversity. Accordingly, an Exponential-Gaussian Mixture Network (EGMN) is proposed for the parameterization of EGM distribution, which consists of two key modules: a hidden representation encoder and a mixture parameter generator. We conduct extensive offline experiments and online A/B tests on our industrial short-video platform to validate the superiority of EGMN compared with existing state-of-the-art methods. Remarkably, comprehensive experimental results have proven that EGMN exhibits excellent distribution fitting ability across coarse-to-fine-grained levels.

  • RESNLGCL: Naturally Existing Neighbor Layers Graph Contrastive Learning for Recommendation
    by Jinfeng Xu (The University of Hong Kong), Zheyu Chen (The Hong Kong Polytechnic University), Shuo Yang (The Univerisity of Hong Kong), Jinze Li (The University of Hong Kong), Hewei Wang (Carnegie Mellon University), Wei Wang (Shenzhen MSU-BIT University), Xiping Hu (Beijing Institute of Technology), Edith Ngai (The University of Hong Kong)

    Graph Neural Networks (GNNs) are widely used in collaborative filtering to capture high-order user-item relationships. To address the data sparsity problem in recommendation systems, Graph Contrastive Learning (GCL) has emerged as a promising paradigm that maximizes mutual information between contrastive views. However, existing GCL methods rely on augmentation techniques that introduce semantically irrelevant noise and incur significant computational and storage costs, limiting effectiveness and efficiency. To overcome these challenges, we propose NLGCL, a novel contrastive learning framework that leverages naturally contrastive views between neighbor layers within GNNs. By treating each node and its neighbors in the next layer as positive pairs, and other nodes as negatives, NLGCL avoids augmentation-based noise while preserving semantic relevance. This paradigm eliminates costly view construction and storage, making it computationally efficient and practical for real-world scenarios. Extensive experiments on four public datasets demonstrate that NLGCL outperforms state-of-the-art baselines in effectiveness and efficiency.

  • RESNon-parametric Graph Convolution for Re-ranking in Recommendation Systems
    by Zhongyu Ouyang (Dartmouth College), Mingxuan Ju (Snap Inc.), Soroush Vosoughi (Dartmouth College), Yanfang Ye (University of Notre Dame)

    Graph knowledge has been proven effective in enhancing item rankings in recommender systems (RecSys), particularly during the retrieval stage. However, its application in the ranking stage, where richer contextual information (e.g., user, item, and interaction features) is available, remains underexplored. A major challenge lies in the substantial computational cost associated with repeatedly retrieving neighborhood information from billions of items stored in distributed systems. This resource-intensive requirement makes it difficult to scale graph-based methods during model training, and apply them in practical RecSys. To bridge this gap, we first demonstrate that incorporating graphs in the ranking stage improves ranking qualities. Notably, while the improvement is evident, we show that the substantial computational overheads entailed by graphs are prohibitively expensive for real-world recommendations. In light of this, we propose a non-parametric strategy that utilizes graph convolution for re-ranking only during test time. Our strategy circumvents the notorious computational overheads from graph convolution during training, and utilizes structural knowledge hidden in graphs on-the-fly during testing. It can be used as a plug-and-play module and easily employed to enhance the ranking ability of various ranking layers of a real-world RecSys with significantly reduced computational overhead. Through comprehensive experiments across four benchmark datasets with varying levels of sparsity, we demonstrate that our strategy yields noticeable improvements (i.e., 8.1% on average) during testing time with little to no additional computational overheads (i.e., 0.5% on average).

  • RESOff-Policy Evaluation and Learning for Matching Markets
    by Yudai Hayashi (Wantedly, inc.), Shuhei Goda (Independent Researcher), Yuta Saito (Cornell University)

    Matching users based on mutual preferences is a fundamental aspect of services driven by reciprocal recommendations, such as job search and dating applications. Although A/B testing remains the gold standard for evaluating new policies in recommender systems for matching markets, it is costly and impractical for frequent policy updates. Off-Policy Evaluation (OPE) thus plays a crucial role by enabling the evaluation of recommendation policies using only offline logged data naturally collected on the platform. However, unlike conventional recommendation settings, the bidirectional nature of user interactions in matching platforms introduces complex biases and exacerbates reward sparsity, making standard OPE methods unreliable. To address these challenges and facilitate effective offline evaluation, we propose novel OPE estimators, DiPS and DPR, specifically designed for matching markets. Our methods combine elements of the Direct Method (DM), Inverse Propensity Score (IPS), and Doubly Robust (DR) estimators while incorporating intermediate labels, such as initial engagement signals, to achieve better bias-variance control, particularly in sparse-reward environments. Theoretically, we derive the bias and variance of the proposed estimators and demonstrate their advantages over conventional methods. Furthermore, we show that these estimators can be seamlessly extended to offline policy learning methods for improving recommendation policies for making more matches. We empirically evaluate our methods through experiments on both synthetic data and real-world AB testing logs from the job-matching platform Wantedly Visit. The empirical results highlight the superiority of our approach over existing methods in both off-policy evaluation and policy learning tasks particularly when the match labels are sparse where existing methods tend to collapse.

  • RESOff-Policy Evaluation of Candidate Generators in Two-Stage Recommender Systems
    by Peiyao Wang (Amazon.com), Zhan Shi (Amazon.com), Amina Shabbeer (Amazon.com), Ben London (Amazon.com)

    We study offline evaluation of two-stage recommender systems, focusing on the first stage, candidate generation. Traditionally, candidate generators have been evaluated in terms of standard information retrieval metrics, using curated or heuristically labeled data, which does not always reflect their true impact to user experience or business metrics. We instead take a holistic view, measuring their effectiveness with respect to the downstream recommendation task, using data logged from past user interactions with the system. Using the contextual bandit formalism, we frame this evaluation task as off-policy evaluation (OPE) with a new action set induced by a new candidate generator. To the best of our knowledge, ours is the first study to examine evaluation of candidate generators through the lens of OPE. We propose two importance-weighting methods to measure the impact of a new candidate generator using data collected from a downstream task. We analyze the asymptotic properties of these methods and derive expressions for their respective biases and variances. This analysis illuminates a procedure to optimize the estimators so as to reduce bias. Finally, we present empirical results that demonstrate the estimators’ efficacy on synthetic and benchmark data. We find that our proposed methods achieve lower bias with comparable or reduced variance relative to baseline approaches that do not account for the new action set.

  • RESOn the Reliability of Sampling Strategies in Offline Recommender Evaluation
    by Bruno Pereira (Universidade Federal de Minas Gerais), Alan Said (University of Gothenburg), Rodrygo Santos (Universidade Federal de Minas Gerais)

    Offline evaluation plays a central role in benchmarking recommender systems when online testing is impractical or risky. However, it is susceptible to two key sources of bias: exposure bias, where users only interact with items they are shown, and sampling bias, introduced when evaluation is performed on a subset of logged items rather than the full catalog. While prior work has proposed methods to mitigate sampling bias, these are typically assessed on fixed logged datasets rather than for their ability to support reliable model comparisons under varying exposure conditions or relative to true user preferences. In this paper, we investigate how different combinations of logging and sampling choices affect the reliability of offline evaluation. Using a fully observed dataset as ground truth, we systematically simulate diverse exposure biases and assess the reliability of common sampling strategies along four dimensions: discriminative power (recommender model separability), fidelity (agreement with full evaluation), robustness (stability under exposure bias), and predictive power (alignment with ground truth). Our findings highlight when and how sampling distorts evaluation outcomes and offer practical guidance for selecting strategies that yield faithful and robust offline comparisons.

  • RESParagon: Parameter Generation for Controllable Multi-Task Recommendation
    by Chenglei Shen (RenminUniversity of China), Jiahao Zhao (Renmin University of China), Xiao Zhang (Renmin University of China), Weijie Yu (University of International Business and Economics), Ming He (AI Lab at Lenovo Research), Jianping Fan (AI Lab at Lenovo Research)

    Commercial recommender systems face the challenge that task requirements from platforms or users often change dynamically (e.g., varying preferences for accuracy or diversity). Ideally, the model should be re-trained after resetting a new objective function, adapting to these changes in task requirements. However, in practice, the high computational costs associated with retraining make this process impractical for models already deployed to online environments. This raises a new challenging problem: how to efficiently adapt the learned model to different task requirements by controlling the model parameters after deployment, without the need for retraining. To address this issue, we propose a novel controllable learning approach via parameter generation for controllable multi-task recommendation (Paragon), which allows the customization and adaptation of recommendation model parameters to new task requirements without retraining. Specifically, we first obtain the optimized model parameters through adapter tunning based on the feasible task requirements. Then, we utilize the generative model as a parameter generator, employing classifier-free guidance in conditional training to learn the distribution of optimized model parameters under various task requirements. Finally, the parameter generator is applied to effectively generate model parameters in a test-time adaptation manner given task requirements. Moreover, Paragon seamlessly integrates with various existing recommendation models to enhance their controllability. Extensive experiments indicate that Paragon can effectively enhance controllability for recommendation through efficient model parameter generation.

  • RESPinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform
    by Xiangyi Chen (Pinterest), Kousik Rajesh (Pinterest), Matthew Lawhon (Pinterest), Zelun Wang (Pinterest), Hanyu Li (Pinterest), Haomiao Li (Pinterest), Saurabh Vishwas Joshi (Pinterest), Pong Eksombatchai (Pinterest), Jaewon Yang (Pinterest), Yi-Ping Hsu (Pinterest), Jiajing Xu (Pinterest), Charles Rosenberg (Pinterest)

    User activity sequences have emerged as one of the most important signals in recommender systems. We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform. We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications, efficiently coupling it with existing models. While this pretrainingand- fine-tuning approach has been popular in other domains, such as Vision and NLP, its application in industrial recommender systems presents numerous challenges. The foundational model must be scalable enough to score millions of items every second while meeting tight cost and latency constraints imposed by these systems,. Additionally, it should capture the interactions between user activities and other features and handle new items that were not present during the pretraining stage. We developed innovative techniques to address these challenges. Our infrastructure and algorithmic optimizations, such as the Deduplicated Cross-Attention Transformer (DCAT), improved our throughput by 600%. We demonstrate that PinFM can learn interactions between user sequences and candidate items by altering input sequences, leading to a 20% increase in engagement with new items. PinFM is now deployed to help improve the experience of more than a half billion users across various applications.

  • RESPrivacy-Preserving Social Recommendation: Privacy Leakage and Countermeasure
    by Yuyue Chen (Harbin Institute of Technology, Shenzhen), Peng Yang (The University of Hong Kong), Zoe Lin Jiang (Harbin Institute of Technology, Shenzhen), Wenhao Wu (Harbin Institute of Technology, Shenzhen), Junbin Fang (Jinan University), Xuan Wang (Harbin Institute of Technology, Shenzhen), Chuanyi Liu (Harbin Institute of Technology, Shenzhen)

    Social recommendation systems generally utilize two types of data, user-item interaction matrices (R) from rating platform (P0), and user-user social graphs (S) from social platform (P1). Considering user privacy that neither R nor S can be directly shared, Chen et al. introduced the Secure Social Recommendation (SeSoRec) framework with the Secret Sharing-based Matrix Multiplication (SSMM) protocol. However, we find that the leakage of intermeidate information introduced by SSMM will eventually lead to the leakage of S to P0, which challenges the privacy guarantees of SeSoRec.

    This work firstly identifies that the claimed “innocuous” leakage in SeSoRec originates from reusing the same One-Time Pad key during two randomization phases in SSMM, with formal proof that SSMM violates semi-honest security. Secondly, this work proposes the Two-Time Pad Attack with two reconstruction algorithms to evaluate the severity of the leakage. The Two-Time Pad Attack can extract the column-wise sum of matrices and , and the row-wise difference of matrices and , where such matrices are closely related to R or S. The Sparse Matrix Reconstruction (SMR) algorithm can achieve 99.35%, 83.83%, and 77.14% reconstruction rates for non-zero entries in S on FilmTrust, Epinions, and Douban datasets, respectively. The Grayscale Image Reconstruction (GIR) algorithm can successfully recover MNIST image contours. Thirdly, when the number of columns/rows of the input matrix A/B in SSMM is odd (requiring zero-padding to an even dimension), this work proposes the Zero-Padding Attack which can directly expose the last column/row of A/B. Finally, this work proposes the Privacy-Preserving Matrix Multiplication (PPMM) protocol with experimental demonstration as a replacement for SSMM, which eliminates such leakage while maintaining efficiency.

  • RESPrompt-to-Slate: Diffusion Models for Prompt-Conditioned Slate Generation
    by Federico Tomasi (Spotify), Francesco Fabbri (Spotify), Justin Carter (Spotify), Elias Kalomiris (Spotify), Mounia Lalmas (Spotify), Zhenwen Dai (Spotify)

    Slate generation is a common task in streaming and e-commerce platforms, where multiple items are presented together as a list or “slate”. Traditional systems focus mostly on item-level ranking and often fail to capture the coherence of the slate as a whole. A key challenge lies in the combinatorial nature of selecting multiple items jointly. To manage this, conventional approaches often assume users interact with only one item at a time, assumption that breaks down when items are meant to be consumed together. In this paper, we introduce DMSG, a generative framework based on diffusion models for prompt-conditioned slate generation. DMSG learns high-dimensional structural patterns and generates coherent, diverse slates directly from natural language prompts. Unlike retrieval-based or autoregressive models, DMSG models the joint distribution over slates, enabling greater flexibility and diversity. We evaluate DMSG in two key domains: music playlist generation and e-commerce bundle creation. In both cases, DMSG produces high-quality slates from textual prompts without explicit personalization signals. Offline and online results show that DMSG outperforms strong baselines in both relevance and diversity, offering a scalable, low-latency solution for prompt-driven recommendation. A live A/B test on a production playlist system further demonstrates increased user engagement and content diversity.

  • RESRecPS: Privacy Risk Scoring for Recommender Systems
    by Jiajie He (University of Maryland, Baltimore County), Yuechun Gu (University of Maryland, Baltimore County), Keke Chen (University of Maryland, Baltimore County)

    Recommender systems (RecSys) have become an essential component of many web applications. The core of the system is a recommendation model trained on highly sensitive user-item interaction data. While privacy-enhancing techniques are actively studied in the research community, the real-world model development still depends on minimum privacy protection, e.g., via controlled access. Users of such systems should have the right to choose not to share highly sensitive interactions. However, there is no method allowing the user to know which interactions are more sensitive than others. Thus, quantifying the privacy risk of RecSys training data is a critical step to enabling privacy-aware RecSys model development and deployment. We propose a membership-inference-attack (MIA) based privacy scoring method, RecPS, to measure privacy risks at the interaction and the user levels. The RecPS interaction-level score definition is motivated and derived from differential privacy, which is then extended to the user-level scoring method. A critical component is the interaction-level MIA method RecLiRA, which gives high-quality membership estimation. We have conducted extensive experiments on well-known benchmark datasets and RecSys models to show the unique features and benefits of RecPS scoring in risk assessment and RecSys model unlearning.

  • RESRecommendation and Temptation
    by Md Sanzeed Anwar (University of Michigan), Paramveer Dhillon (University of Michigan), Grant Schoenebeck (University of Michigan)

    Traditional recommender systems relying on revealed preferences often fail to capture users’ dual-self nature, where consumption choices are driven by both long-term benefits (enrichment) and desire for instant gratification (temptation). Consequently, these systems may generate recommendations that fail to provide long-lasting satisfaction to users. To address this issue, we propose a reimagination of recommender design paradigms. We begin by introducing a novel user model that accounts for dual-self behaviors and the existence of off-platform options. We then propose a novel recommendation objective aligned with long-lasting user satisfaction, and develop the optimal recommendation strategy for this objective. Finally, we present an estimation framework that makes minimal assumptions and leverages the distinction between explicit user feedback and implicit choice data to implement this strategy in practice. We evaluate our approach through both synthetic simulations and simulations based on real-world data from the MovieLens dataset. Results demonstrate that our proposed recommender can deliver superior enrichment compared to several competitive baseline algorithms that operate under the revealed preferences assumption and do not account for dual-self behaviors. Our work opens the door to more nuanced and user-centric recommender design, with significant implications for the development of responsible AI systems.

  • RESR⁴ec: A Reasoning, Reflection, and Refinement Framework for Recommendation Systems
    by Hao Gu (Institute of Automation, Chinese Academy of Sciences), Rui Zhong (Kuaishou Technology), Yu Xia (University of Chinese Academy of Sciences), Wei Yang (Kuaishou Technology), Chi Lu (Kuaishou Technology), Peng Jiang (Kuaishou Technology), Kun Gai (Kuaishou Technology)

    Harnessing Large Language Models (LLMs) for recommendation systems has emerged as a prominent avenue, drawing substantial research interest. However, existing approaches primarily involve basic prompt techniques for knowledge acquisition, which resemble System-1 thinking. This makes these methods highly sensitive to errors in the reasoning path, where even a small mistake can lead to an incorrect inference. To this end, in this paper, we propose R⁴ec, a reasoning, reflection and refinement framework that evolves the recommendation system into a weak System-2 model. Specifically, we introduce two models: an actor model that engages in reasoning, and a reflection model that judges these responses and provides valuable feedback. Then the actor model will refine its response based on the feedback, ultimately leading to improved responses. We employ an iterative reflection and refinement process, enabling LLMs to facilitate slow and deliberate System-2-like thinking. Ultimately, the final refined knowledge will be incorporated into a recommendation backbone for prediction. We conduct extensive experiments on Amazon-Book and MovieLens-1M datasets to demonstrate the superiority of R⁴ec. We also deploy R⁴ec on a large scale online advertising platform, showing 2.2\% increase of revenue. Furthermore, we investigate the scaling properties of the actor model and reflection model.

  • RESScalable Data Debugging for Neighborhood-based Recommendation with Data Shapley Values
    by Barrie Kersbergen (Bol & University of Amsterdam), Olivier Sprangers (Nixtla), Bojan Karlaš (Harvard University), Maarten de Rijke (University of Amsterdam), Sebastian Schelter (BIFOLD & TU Berlin)

    Machine learning-powered recommendation systems help users find items they like. Issues in the interaction data processed by these systems frequently lead to problems, e.g., to the accidental recommendation of low-quality products or dangerous items. Such data issues are hard to anticipate upfront, and are typically detected post-deployment after they have already impacted the user experience. We argue that a principled data debugging process is required during which human experts identify potentially hurtful data issues and preemptively mitigate them. Recent notions of “data importance”, such as the Data Shapley value (DSV), represent a promising direction to identify training data points likely to cause issues. However, the scale of real-world interaction datasets makes it infeasible to apply existing techniques to compute the DSV in recommendation scenarios. We tackle this problem by introducing the KMC-Shapley algorithm for the scalable estimation of Data Shapley values in neighborhood-based recommendation on sparse interaction data. We conduct an experimental evaluation of the efficiency and scalability of our algorithm on both public and proprietary datasets with millions of interactions, and showcase that the DSV identifies impactful data points for two recommendation tasks in e-commerce. Furthermore, we discuss applications of the DSV on real-world click and purchase data in e-commerce from CompanyX, such as identifying dangerous and low-quality products as well as improving the ecological sustainability of product recommendations.

  • RESTag-augmented Dual-target Cross-domain Recommendation
    by Mingfan Pan (University of Science and Technology of China), Qingyang Mao (University of Science and Technology of China), Xu An (University of Science and Technology of China), Jianhui Ma (University of Science and Technology of China), Gang Zhou (Information Engineering University), Mingyue Cheng (State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China), Enhong Chen (University of Science and Technology of China)

    Cross-domain recommendation (CDR) has been proposed to alleviate the data sparsity issue in recommendation systems and has garnered substantial research interest. In recent years, dual-target CDR has been an increasingly prevalent research topic that emphasizes simultaneous enhancement in both the source and target domains. Many existing approaches rely on overlapping users as bridges between domains, yet in real-world scenarios, the number of such users is often severely limited, restricting their practical applicability. To overcome this limitation, alternative methods for cross-domain connections are needed, and item tags serve as a promising solution. However, real-world tags suffer from severe deficiencies in terms of both quantity and diversity, and existing studies have not fully exploited their potential. In this paper, we introduce Tag-augmented Dual-target Cross-domain Recommendation (TA-DTCDR), which is the first to apply LLM-distilled tag information to CDR. TA-DTCDR utilizes item tags distilled by large language models (LLMs) as an additional channel to facilitate information transfer, thereby mitigating performance decline caused by the lack of overlapping users. Furthermore, to fully leverage the natural language information carried by the distilled tags, we design a series of training tasks to align tag semantics across domains while preserving their semantic independence. The proposed method is validated on multiple tasks using public datasets, showing significant improvements over existing state-of-the-art approaches.

  • RESTest-Time Alignment with State Space Model for Tracking User Interest Shifts in Sequential Recommendation
    by Changshuo Zhang (Gaoling School of Artificial Intelligence, Renmin University of China), Xiao Zhang (Gaoling School of Artificial Intelligence, Renmin University of China), Teng Shi (Gaoling School of Artificial Intelligence, Renmin University of China), Jun Xu (Gaoling School of Artificial Intelligence, Renmin University of China), Ji-Rong Wen (Gaoling School of Artificial Intelligence, Renmin University of China)

    Sequential recommendation is essential in modern recommender systems, aiming to predict the next item a user may interact with based on their historical behaviors. However, real-world scenarios are often dynamic and subject to shifts in user interests. Conventional sequential recommendation models are typically trained on static historical data, limiting their ability to adapt to such shifts and resulting in significant performance degradation during testing. Recently, Test-Time Training (TTT) has emerged as a promising paradigm, enabling pre-trained models to dynamically adapt to test data by leveraging unlabeled examples during testing. However, applying TTT to effectively track and address user interest shifts in recommender systems remains an open and challenging problem. Key challenges include how to capture temporal information effectively and explicitly identifying shifts in user interests during the testing phase. To address these issues, we propose T2ARec, a novel model leveraging state space model for TTT by introducing two Test Time Alignment modules tailored for sequential recommendation, effectively capturing the distribution shifts in user interest patterns over time. Specifically, T2ARec aligns absolute time intervals with model-adaptive learning intervals to capture temporal dynamics and introduce an interest state alignment mechanism to effectively and explicitly identify the user interest shifts with theoretical guarantees. These two alignment modules enable efficient and incremental updates to model parameters in a self-supervised manner during testing, enhancing predictions for online recommendation. Extensive evaluations on three benchmark datasets demonstrate that T2ARec achieves state-of-the-art performance and robustly mitigates the challenges posed by user interest shifts.

  • RESUSB-Rec: An Effective Framework for Improving Conversational Recommendation Capability of Large Language Model
    by Jianyu Wen (Harbin Institute of Technology), Jingyun Wang (Beihang University), Cilin Yan (Xiaohongshu Inc.), Jiayin Cai (Xiaohongshu Inc.), Xiaolong Jiang (Xiaohongshu Inc.), Ying Zhang (Harbin Institute of Technology)

    Recently, Large Language Models (LLMs) have been widely employed in Conversational Recommender Systems (CRSs).Unlike traditional language model approaches that focus on training, all existing LLMs-based approaches are mainly centered around how to leverage the summarization and analysis capabilities of LLMs while ignoring the issue of training.Therefore, in this work, we propose an integrated training-inference framework, User-Simulator-Based framework (USB-Rec), for improving the performance of LLMs in conversational recommendation at the model level.Firstly, we design a LLM-based Preference Optimization (PO) dataset construction strategy for RL training, which helps the LLMs understand the strategies and methods in conversational recommendation.Secondly, we propose a Self-Enhancement Strategy (SES) at the inference stage to further exploit the conversational recommendation potential obtained from RL training.Extensive experiments on various datasets demonstrate that our method consistently outperforms previous state-of-the-art methods.

  • RESVL-CLIP: Enhancing Multimodal Recommendations via Visual Grounding and LLM-Augmented CLIP Embeddings
    by Ramin Giahi (Walmart Global Tech), Kehui Yao (Walmart Global Tech), Sriram Kollipara (Walmart Global Tech), Kai Zhao (Walmart Global Tech), Vahid Mirjalili (Walmart Global Tech), Jianpeng Xu (Walmart Global Tech), Topojoy Biswas (Walmart Global Tech), Evren Korpeoglu (Walmart Global Tech), Kannan Achan (Walmart Global Tech)

    Multimodal learning plays a critical role in e-commerce recommendation platforms today, enabling accurate recommendations and product understanding. However, existing vision-language models, such as CLIP, face key challenges in e-commerce recommendation systems: 1) Weak object-level alignment, where global image embeddings fail to capture fine-grained product attributes, leading to suboptimal retrieval performance; 2) Ambiguous textual representations, where product descriptions often lack contextual clarity, affecting cross-modal matching; and 3) Domain mismatch, as generic vision-language models may not generalize well to e-commerce-specific data. To address these limitations, we propose a framework, VL-CLIP, that enhances CLIP embeddings by integrating Visual Grounding for fine-grained visual understanding and an LLM-based agent for generating enriched text embeddings. Visual Grounding refines image representations by localizing key products, while the LLM agent enhances textual features by disambiguating product descriptions. Our approach significantly improves retrieval accuracy, multimodal retrieval effectiveness, and recommendation quality across tens of millions of items on one of the largest e-commerce platforms in the U.S., increasing CTR by 18.6%, ATC by 15.5%, and GMV by 4.0%. Additional experimental results show that our framework outperforms vision-language models, including CLIP, FashionCLIP, and GCL, in both precision and semantic alignment, demonstrating the potential of combining object-aware visual grounding and LLM-enhanced text representation for robust multimodal recommendations.

  • RESYou Don’t Bring Me Flowers: Mitigating Unwanted Recommendations Through Conformal Risk Control
    by Giovanni De Toni (Fondazione Bruno Kessler (FBK)), Erasmo Purificato (European Commission, Joint Research Centre (JRC)), Emilia Gomez (European Commission, Joint Research Centre (JRC)), Andrea Passerini (University of Trento), Bruno Lepri (Fondazione Bruno Kessler (FBK)), Cristian Consonni (European Commission, Joint Research Centre (JRC))

    Recommenders are significantly shaping online information consumption. While effective at personalizing content, these systems increasingly face criticism for propagating irrelevant, unwanted, and even harmful recommendations. Such content degrades user satisfaction and contributes to significant societal issues, including misinformation, radicalization, and erosion of user trust. Although platforms offer mechanisms to mitigate exposure to undesired content, these mechanisms are often insufficiently effective and slow to adapt to users’ feedback. This paper introduces an intuitive, model-agnostic, and distribution-free method that uses conformal risk control to provably bound unwanted content in personalized recommendations by leveraging simple binary feedback on items. We also address a limitation of traditional conformal risk control approaches, i.e., the fact that the recommender can provide a smaller set of recommended items, by leveraging implicit feedback on consumed items to expand the recommendation set while ensuring robust risk mitigation. Our experimental evaluation on data coming from a popular online video-sharing platform demonstrates that our approach ensures an effective and controllable reduction of unwanted recommendations with minimal effort.

Diamond Supporter
 
 
 
Platinum Supporter
 
Gold Supporter
 
Bronze Supporter
 
 
 
Challenge Supporter
 
Women in RecSys’s Event Supporter
 
Breakfast Symposium
 
Coffee Break Sponsor
 
Special Supporters
 
 
 

This event is supported by the Capital City of Prague