Accepted Contributions

List of all long papers accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.

RESA Multi-modal Modeling Framework for Cold-start Short-video Recommendation
by Gaode Chen (Kuaishou Technology), Ruina Sun (Kuaishou Technology), Yuezihan Jiang (Kuaishou Technology), Jiangxia Cao (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Han Li (Kuaishou Technology), Kun Gai (Kuaishou Technology) and Xinghua Zhang (Chinese Academy of Sciences)

Short video has witnessed rapid growth in the past few years in multimedia platforms. To ensure the freshness of the videos, platforms receive a large number of user-uploaded videos every day, making collaborative filtering-based recommender methods suffer from the item cold-start problem (e.g., the new-coming videos are difficult to compete with existing videos). Consequently, increasing efforts tackle the cold-start issue from the content perspective, focusing on modeling the multi-modal preferences of users, a fair way to compete with new-coming and existing videos. However, recent studies ignore the existing gap between multi-modal embedding extraction and user interest modeling as well as the discrepant intensities of user preferences for different modalities. In this paper, we propose M3CSR, a multi-modal modeling framework for cold-start short video recommendation. Specifically, we preprocess content-oriented multi-modal features for items and obtain trainable category IDs by performing clustering. In each modality, we combine modality-specific cluster ID embedding and the mapped original modality feature as modality-specific representation of the item to address the gap. Meanwhile, M3CSR measures the user modality-specific intensity based on the correlation between modality-specific interest and behavioral interest and employs pairwise loss to further decouple user multi-modal interests. Extensive experiments on four real-world datasets demonstrate the superiority of our proposed model. The framework has been deployed on a billion-user scale short video application and has shown improvements in various commercial metrics within cold-start scenarios.

Full text in ACM Digital Library
RESA multimodal single-branch embedding network for recommendation in cold-start and missing modality scenarios
by Christian Ganhör (Johannes Kepler University Linz), Marta Moscati (Johannes Kepler University Linz), Anna Hausberger (Johannes Kepler University Linz), Shah Nawaz (Johannes Kepler University Linz) and Markus Schedl (Johannes Kepler University Linz; Linz Institute of Technology)

Most recommender systems adopt collaborative filtering (CF) and provide recommendations based on past collective interactions. Therefore, the performance of CF algorithms degrades when few or no interactions are available, a scenario referred to as cold-start. To address this issue, previous work relies on models leveraging both collaborative data and side information on the users or items. Similar to multimodal learning, these models aim at combining collaborative and content representations in a shared embedding space. In this work we propose a novel technique for multimodal recommendation, relying on a multimodal Single-Branch embedding network for Recommendation (SiBraR). Leveraging weight-sharing, SiBraR encodes interaction data as well as multimodal side information using the same single-branch embedding network on different modalities. This makes SiBraR effective in scenarios of missing modality, including cold start. Our extensive experiments on large-scale recommendation datasets from three different recommendation domains (music, movie, and e-commerce) and providing multimodal content information (audio, text, image, labels, and interactions) show that SiBraR significantly outperforms CF as well as state-of-the-art content-based RSs in cold-start scenarios, and is competitive in warm scenarios. We show that SiBraR’s recommendations are accurate in missing modality scenarios, and that the model is able to map different modalities to the same region of the shared embedding space, hence reducing the modality gap.

Full text in ACM Digital Library
RESA Pre-trained Zero-shot Sequential Recommendation Framework via Popularity Dynamics
by Junting Wang (Urbana-Champaign), Praneet Rathi (Urbana-Champaign) and Hari Sundaram (Urbana-Champaign)

This paper proposes a novel pre-trained framework for zero-shot cross-domain sequential recommendation without auxiliary information. While using auxiliary information (e.g., item descriptions) seems promising for cross-domain transfer, a cross-domain adaptation of sequential recommenders can be challenging when the target domain differs from the source domain—item descriptions are in different languages; metadata modalities (e.g., audio, image, and text) differ across source and target domains. If we can learn universal item representations independent of the domain type (e.g., groceries, movies), we can achieve zero-shot cross-domain transfer without auxiliary information. Our critical insight is that user interaction sequences highlight shifting user preferences via the popularity dynamics of interacted items. We present a pre-trained sequential recommendation framework: PrepRec, which utilizes a novel popularity dynamics-aware transformer architecture. Through extensive experiments on five real-world datasets, we show that PrepRec, without any auxiliary information, can zero-shot adapt to new application domains and achieve competitive performance compared to state-of-the-art sequential recommender models. In addition, we show that PrepRec complements existing sequential recommenders. With a simple post-hoc interpolation, PrepRec improves the performance of existing sequential recommenders on average by 11.8% in Recall@10 and 22% in NDCG@10. We provide an anonymized implementation of PrepRec at https://github.com/CrowdDynamicsLab/preprec.

Full text in ACM Digital Library
RESA Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation
by Zixuan Yi (University of Glasgow) and Iadh Ounis (University of Glasgow)

With the rapid development of online multimedia services, especially in e-commerce platforms, there is a pressing need for personalised recommender systems that can effectively encode the diverse multi-modal content associated with each item. However, we argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality encoding. Such isolated processes can harm the recommendation performance. Firstly, an isolated extraction process underestimates the importance of effective feature extraction in multi-modal recommendations, potentially incorporating non-relevant information, which is harmful to item representations. Second, an isolated modality encoding process produces disjoint embeddings for item modalities due to the individual processing of each modality, which leads to a suboptimal fusion of user/item representations for an effective user preferences prediction. We hypothesise that the use of a unified model for addressing both aforementioned isolated processes will enable the consistent extraction and cohesive fusion of joint multi-modal features, thereby enhancing the effectiveness of multi-modal recommender systems. In this paper, we propose a novel model, called Unified multi-modal Graph Transformer (UGT), which firstly leverages a multi-way transformer to extract aligned multi-modal features from raw data for top-k recommendation. Subsequently, we build a unified graph neural network in our UGT model to jointly fuse the multi-modal user/item representations derived from the output of the multi-way transformer. Using the graph transformer architecture of our UGT model, we show that the UGT model achieves significant effectiveness gains, especially when jointly optimised with the commonly used recommendation losses. Our extensive experiments conducted on three benchmark datasets demonstrate that our proposed UGT model consistently outperforms nine existing state-of-the-art recommendation approaches and by up to 13.97% over the best baseline.

Full text in ACM Digital Library
RESAccelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems
by Yunfan Wu (Chinese Academy of Sciences), Qi Cao (Chinese Academy of Sciences), Shuchang Tao (Chinese Academy of Sciences), Kaike Zhang (Chinese Academy of Sciences), Fei Sun (Chinese Academy of Sciences) and Huawei Shen (Chinese Academy of Sciences)

Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repetitive retraining is highly time-consuming, hindering the efficient assessment and optimization of fake users. To mitigate this computational bottleneck and develop a more effective attack in an affordable time, we analyze the retraining process and find that a change in the representation of one user/item will cause a cascading effect through the user-item interaction graph. Under theoretical guidance, we introduce Gradient Passing (GP), a novel technique that explicitly passes gradients between interacted user-item pairs during backpropagation, thereby approximating the cascading effect and accelerating retraining. With just a single update, GP can achieve effects comparable to multiple original training iterations. Under the same number of retraining epochs, GP enables a closer approximation of the surrogate recommender to the victim. This more accurate approximation provides better guidance for optimizing fake users, ultimately leading to enhanced data poisoning attacks. Extensive experiments on real-world datasets demonstrate the efficiency and effectiveness of our proposed GP.

Full text in ACM Digital Library
RESAdaptive Fusion of Multi-View for Graph Contrastive Recommendation
by Mengduo Yang (Zhejiang University), Yi Yuan (Zhejiang University), Jie Zhou (Zhejiang University), Meng Xi (Zhejiang University), Xiaohua Pan (Zhejiang University), Ying Li (Zhejiang University), Yangyang Wu (Zhejiang University), Jinshan Zhang (Zhejiang University) and Jianwei Yin (Zhejiang University)

Recommendation is a key mechanism for modern users to access items of their interests from massive entities and information. Recently, graph contrastive learning (GCL) has demonstrated satisfactory results on recommendation, due to its ability to enhance representation by integrating graph neural networks (GNNs) with contrastive learning. However, those methods often generate contrastive views by performing random perturbation on edges or embeddings, which is likely to bring noise in representation learning. Besides, in all these methods, the degree of user preference on items is omitted during the representation learning process, which may cause incomplete user/item modeling. To address these limitations, we propose the Adaptive Fusion of Multi-View Graph Contrastive Recommendation (AMGCR) model. Specifically, to generate the informative and less noisy views for better contrastive learning, we design four view generators to learn the edge weights focusing on weight adjustment, feature transformation, neighbor aggregation, and attention mechanism, respectively. Then, we employ an adaptive multi-view fusion module to combine different views from both the view-shared and the view-specific levels. Moreover, to make the model capable of capturing preference information during the learning process, we further adopt a preference refinement strategy on the fused contrastive view. Experimental results on three real-world datasets demonstrate that AMGCR consistently outperforms the state-of-the-art methods, with average improvements of over 10% in terms of Recall and NDCG. Our code is available on https://github.com/Du-danger/AMGCR.

Full text in ACM Digital Library
RESAIE: Auction Information Enhanced Framework for CTR Prediction in Online Advertising
by Yang Yang (Huawei Noah’s Ark Lab), Bo Chen (Huawei Noah’s Ark Lab), Chenxu Zhu (Huawei Noah’s Ark Lab), Menghui Zhu (Huawei Noah’s Ark Lab), Xinyi Dai (Huawei Noah Ark’s Lab), Huifeng Guo (Huawei Noah Ark’s Lab), Muyu Zhang (Huawei Noah Ark’s Lab), Zhenhua Dong (Huawei Noah Ark’s Lab) and Ruiming Tang (Huawei Noah Ark’s Lab)

Click-Through Rate (CTR) prediction is a fundamental technique for online advertising recommendation and the complex online competitive auction process also brings many difficulties to CTR optimization. Recent studies have shown that introducing posterior auction information contributes to the performance of CTR prediction. However, existing work doesn’t fully capitalize on the benefits of auction information and overlooks the data bias brought by the auction, leading to biased and suboptimal results. To address these limitations, we propose Auction Information Enhanced Framework (AIE) for CTR prediction in online advertising, which delves into the problem of insufficient utilization of auction signals and first reveals the auction bias. Specifically, AIE introduces two pluggable modules, namely Adaptive Market-price Auxiliary Module (AM2) and Bid Calibration Module (BCM), which work collaboratively to excavate the posterior auction signals better and enhance the performance of CTR prediction. Furthermore, the two proposed modules are lightweight, model-agnostic, and friendly to inference latency. Extensive experiments are conducted on a public dataset and an industrial dataset to demonstrate the effectiveness and compatibility of AIE. Besides, a one-month online A/B test in a large-scale advertising platform shows that AIE improves the base model by 5.76% and 2.44% in terms of eCPM and CTR, respectively.

Full text in ACM Digital Library
RESBayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation
by David Austin (University of Waterloo), Anton Korikov (University of Toronto), Armin Toroghi (University of Toronto) and Scott Sanner (University of Toronto)

Designing preference elicitation (PE) methodologies that can quickly ascertain a user’s top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) enable fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but cannot generate arbitrary NL queries or reason over content in NL item descriptions – requiring users to express preferences via ratings or comparisons of unfamiliar items. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to actively elicit NL feedback to identify the best recommendation. Key challenges in generalizing BO to deal with natural language feedback include determining: (a) how to leverage LLMs to model the likelihood of NL preference feedback as a function of item utilities, and (b) how to design an acquisition function for NL BO that can elicit preferences in the infinite space of language. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses: 1) Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain Bayesian preference beliefs, and 2) BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to guide LLM query generation. We numerically evaluate our methods in controlled simulations, finding that after 10 turns of dialogue, PEBOL can achieve an MRR@10 of up to 0.27 compared to the best monolithic LLM baseline’s MRR@10 of 0.17, despite relying on earlier and smaller LLMs.

Full text in ACM Digital Library
RESBiased User History Synthesis for Personalized Long-Tail Item Recommendation
by Keshav Balasubramanian (University of Southern California), Abdulla Alshabanah (University of Southern California), Elan Markowitz (University of Southern California), Greg Ver Steeg (University of California Riverside) and Murali Annavaram (University of Southern California)

Recommendation systems connect users to items and create value chains in the internet economy. Recommendation systems learn from past user-item interaction histories. As such, items that have short interaction histories, either because they are new or not popular, have been shown to be disproportionately under-recommended. This long-tail item problem can exacerbate model bias, and reinforce poor recommendation of tail items. In this paper, we propose biased user history synthesis, to not only address this problem but also achieve better personalization in recommendation systems. As a result, we concurrently improve tail and head item recommendation performance. Our approach is built on a tail item biased User Interaction History (UIH) sampling strategy and a synthesis model that produces an augmented user representation from the sampled user history. We provide a theoretical justification for our approach using information theory and demonstrate through extensive experimentation, that our model outperforms state-of-the-art baselines on tail, head, and overall recommendation. The source code is available at https://github.com/lkp411/BiasedUserHistorySynthesis.

Full text in ACM Digital Library
RESBridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?
by Gustavo Penha (Spotify), Ali Vardasbi (Spotify), Enrico Palumbo (Spotify), Marco De Nadai (Spotify) and Hugues Bouchard (Spotify)

Generative retrieval for search and recommendation is a promising paradigm for retrieving items, offering an alternative to traditional methods that depend on external indexes and nearest-neighbor searches. Instead, generative models directly associate inputs with item IDs. Given the breakthroughs of Large Language Models (LLMs), these generative systems can play a crucial role in centralizing a variety of Information Retrieval (IR) tasks in a single model that performs tasks such as query understanding, retrieval, recommendation, explanation, re-ranking, and response generation. Despite the growing interest in such a unified generative approach for IR systems, the advantages of using a single, multi-task model over multiple specialized models are not well established in the literature. This paper investigates whether and when such a unified approach can outperform task-specific models in the IR tasks of search and recommendation, broadly co-existing in multiple industrial online platforms, such as Spotify, YouTube, and Netflix. Previous work shows that (1) the latent representations of items learned by generative recommenders are biased towards popularity, and (2) content-based and collaborative-filtering-based information can improve an item’s representations. Motivated by this, our study is guided by two hypotheses: [H1] the joint training regularizes the estimation of each item’s popularity, and [H2] the joint training regularizes the item’s latent representations, where search captures content-based aspects of an item and recommendation captures collaborative-filtering aspects. Our extensive experiments with both simulated and real-world data support both [H1] and [H2] as key contributors to the effectiveness improvements observed in the unified search and recommendation generative models over the single-task approaches.

Full text in ACM Digital Library
RESCALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation
by Yaoyiran Li (University of Cambridge), Xiang Zhai (Google), Moustafa Alzantot (Google Research), Keyi Yu (Google), Ivan Vulić (University of Cambridge), Anna Korhonen (University of Cambridge) and Mohamed Hammad (Google)

Traditional recommender systems such as matrix factorization methods have primarily focused on learning a shared dense embedding space to represent both items and user preferences. Subsequently, sequence models such as RNN, GRUs, and, recently, Transformers have emerged and excelled in the task of sequential recommendation. This task requires understanding the sequential structure present in users’ historical interactions to predict the next item they may like. Building upon the success of Large Language Models (LLMs) in a variety of tasks, researchers have recently explored using LLMs that are pretrained on vast corpora of text for sequential recommendation. To use LLMs for sequential recommendation, both the history of user interactions and the model’s prediction of the next item are expressed in text form. We propose CALRec, a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss: the LLM is first finetuned on a data mixture from multiple domains followed by another round of target domain finetuning. Our model significantly outperforms many state-of-the-art baselines (+37% in Recall@1 and +24% in NDCG@10) and our systematic ablation studies reveal that (i) both stages of finetuning are crucial, and, when combined, we achieve improved performance, and (ii) contrastive alignment is effective among the target domains explored in our experiments.

Full text in ACM Digital Library
RESConFit: Improving Resume-Job Matching using Data Augmentation and Contrastive Learning
by Xiao Yu (Columbia University), Jinzhong Zhang (Intellipro Group Inc.) and Zhou Yu (Columbia University)

A reliable resume-job matching system helps a company find suitable candidates from a pool of resumes, and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction records in resume-job datasets are sparse. Different from many prior work that use complex modeling techniques, we tackle this sparsity problem using data augmentations and a simple contrastive learning approach. ConFit first formulates resume-job datasets as a sparse bipartite graph, and creates an augmented dataset by paraphrasing specific sections in a resume or a job post. Then, ConFit finetunes pre-trained encoders with contrastive learning to further increase training samples from B pairs per batch to
O(B2) per batch. We evaluate ConFit on two real-world datasets and find it outperforms prior methods (including BM25 and OpenAI text-ada-002) by up to 19% and 31% absolute in nDCG@10 for ranking jobs and ranking resumes, respectively. We believe ConFit’s simple yet highly performant approach lays a strong foundation for future research in modeling person-job fit

Full text in ACM Digital Library
RESCross-Domain Latent Factors Sharing via Implicit Matrix Factorization
by Abdulaziz Samra (Skolkovo Institute of Science and Technology), Evgeny Frolov (AIRI; Skolkovo Institute of Science and Technology), Alexey Vasilev (Sber), Alexander Grigorevskiy (Independent researcher) and Anton Vakhrushev (Independent researcher)

Data sparsity has been one of the long-standing problems for recommender systems. One of the solutions to mitigate this issue is to exploit knowledge available in other source domains. However, many cross-domain recommender systems introduce a complex architecture that makes them less scalable in practice. On the other hand, matrix factorization methods are still considered to be strong baselines for single-domain recommendations. In this paper, we introduce the CDIMF, a model that extends the standard implicit matrix factorization with ALS to cross-domain scenarios. We apply the Alternating Direction Method of Multipliers to learn shared latent factors for overlapped users while factorizing the interaction matrix. In a dual-domain setting, experiments on industrial datasets demonstrate a competing performance of CDIMF for both cold-start and warm-start. The proposed model can outperform most other recent cross-domain and single-domain models. We also provide the code to reproduce experiments on GitHub.

Full text in ACM Digital Library
RESDiscerning Canonical User Representation for Cross-Domain Recommendation
by Siqian Zhao (University at Albany – SUNY) and Sherry Sahebi (University at Albany – SUNY)

Cross-domain recommender systems (CDRs) aim to enhance recommendation outcomes by information transfer across different domains. Existing CDRs have investigated the learning of both domain-specific and domain-shared user preferences to enhance recommendation performance. However, these models typically allow the disparities between shared and distinct user preferences to emerge freely in any space, lacking sufficient constraints to identify differences between two domains and to ensure that both domains are considered simultaneously. Canonical Correlation Analysis (CCA) has shown promise for transferring information between domains. However, CCA only models domain similarities and fails to capture the potential differences between user preferences in different domains. We propose Discerning Canonical User Representation for Cross-Domain Recommendation (DiCUR-CDR) that learns domain-shared and domain-specific user representations simultaneously considering both domains’ latent spaces. DiCUR-CDR introduces Discerning Canonical Correlation (DisCCA) user representation learning, a novel design of non-linear CCA for mapping user representations. Unlike prior CCA models that only model the domain-shared multivariate representations by finding their linear transformations, DisCCA uses the same transformations to discover the domain-specific representations too. We compare DiCUR-CDR against several state-of-the-art approaches using two real-world datasets and demonstrate the significance of separately learning shared and specific user representations via DisCCA.

Full text in ACM Digital Library
RESDistillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Models
by Yu Cui (Zhejiang University), Feng Liu (OPPO Research Institute), Pengbo Wang (University of Electronic Science and Technology of China), Bohao Wang (Zhejiang University), Heng Tang (Zhejiang University), Yi Wan (OPPO Research Institute), Jun Wang (OPPO Research Institute) and Jiawei Chen (Zhejiang University)

Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conventional sequential models. It encounters three challenges: 1) the teacher’s knowledge may not always be reliable; 2) the capacity gap between the teacher and student makes it difficult for the student to assimilate the teacher’s knowledge; 3) divergence in semantic space poses a challenge to distill the knowledge from embeddings.

To tackle these challenges, this work proposes a novel distillation strategy, DLLM2Rec, specifically tailored for knowledge distillation from LLM-based recommendation models to conventional sequential models. DLLM2Rec comprises: 1) Importance-aware ranking distillation, which filters reliable and student-friendly knowledge by weighting instances according to teacher confidence and student-teacher consistency; 2) Collaborative embedding distillation integrates knowledge from teacher embeddings with collaborative signals mined from the data. Extensive experiments demonstrate the effectiveness of the proposed DLLM2Rec, boosting three typical sequential models with an average improvement of 47.97%, even enabling them to surpass LLM-based recommenders in some cases.

Full text in ACM Digital Library
RESDNS-Rec: Data-aware Neural Architecture Search for Recommender Systems
by Sheng Zhang (City University of Hong Kong), Maolin Wang (City University of Hong Kong), Xiangyu Zhao (City University of Hong Kong), Ruocheng Guo (ByteDance Research), Yao Zhao (Ant Group) and Chenyi Zhuang (Ant Group),
Jinjie Gu (Ant Group), Zijian Zhang (Jilin University) and Hongzhi Yin (The University of Queensland)

In the era of data proliferation, efficiently sifting through vast information to extract meaningful insights has become increasingly crucial. This paper addresses the computational overhead and resource inefficiency prevalent in existing Sequential Recommender Systems (SRSs). We introduce an innovative approach combining pruning methods with advanced model designs. Furthermore, we delve into resource-constrained Neural Architecture Search (NAS), an emerging technique in recommender systems, to optimize models in terms of FLOPs, latency, and energy consumption while maintaining or enhancing accuracy. Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec). DNS-Rec is specifically designed to tailor compact network architectures for attention-based SRS models, thereby ensuring accuracy retention. It incorporates data-aware gates to enhance the performance of the recommendation network by learning information from historical user-item interactions. Moreover, DNS-Rec employs a dynamic resource constraint strategy, stabilizing the search process and yielding more suitable architectural solutions. We demonstrate the effectiveness of our approach through rigorous experiments conducted on three benchmark datasets, which highlight the superiority of DNS-Rec in SRSs. Our findings set a new standard for future research in efficient and accurate recommendation systems, marking a significant step forward in this rapidly evolving field.

Full text in ACM Digital Library
RESDynamic Stage-aware User Interest Learning for Heterogeneous Sequential Recommendation
by Weixin Li (Shenzhen University), Xiaolin Lin (Shenzhen University), Weike Pan (Shenzhen University) and Zhong Ming (Shenzhen Technology University)

Sequential recommendation has been widely used to predict users’ potential preferences by learning their dynamic user interests, for which most previous methods focus on capturing item-level dependencies. Despite the great success, they often overlook the stage-level interest dependencies. In real-world scenarios, user interests tend to be staged, e.g., following an item purchase, a user’s interests may undergo a transition into the subsequent phase. And there are intricate dependencies across different stages. Meanwhile, users’ behaviors are usually heterogeneous, including auxiliary behaviors (e.g., examinations) and target behaviors (e.g., purchases), which imply more fine-grained user interests. However, existing methods have limitations in explicitly modeling the relationships between the different types of behaviors. To address the above issues, we propose a novel framework, i.e., dynamic stage-aware user interest learning (DSUIL), for heterogeneous sequential recommendation, which is the first solution to model user interests in a cross-stage manner. Specifically, our DSUIL consists of four modules: (1) a dynamic graph construction module transforms a heterogeneous sequence into several subgraphs to model user interests in a stage-wise manner; (2) a dynamic graph convolution module dynamically learns item representations in each subgraph; (3) a behavior-aware subgraph representation learning module learns the heterogeneous dependencies between behaviors and aggregates item representations to represent the staged user interests; and (4) an interest evolving pattern extractor learns the users’ overall interests for the item prediction. Extensive experimental results on two public datasets show that our DSUIL performs significantly better than the state-of-the-art methods.

Full text in ACM Digital Library
RESEffective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by Tatsuhiro Shimizu (Independent Researcher) and Koichi Tanaka (Keio Univercity),
Ren Kishimoto (Tokyo Institute of Technology), Haruka Kiyohara (Cornell University), Masahiro Nomura (CyberAgent, Inc.) and Yuta Saito (Cornell University)

We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB remains unexplored in the relevant literature. Typical OPE/L methods such as regression and importance sampling can be applied to the CCB problem, however, they face significant challenges due to high bias or variance, exacerbated by the exponential growth in the number of available subsets. To address these challenges, we introduce a concept of factored action space, which allows us to decompose each subset into binary indicators. This formulation allows us to distinguish between the “main effect” derived from the main actions, and the “residual effect”, originating from the supplemental actions, facilitating more effective OPE. Specifically, our estimator, called OPCB, leverages an importance sampling-based approach to unbiasedly estimate the main effect, while employing regression-based approach to deal with the residual effect with low variance. OPCB achieves substantial variance reduction compared to conventional importance sampling methods and bias reduction relative to regression methods under certain conditions, as illustrated in our theoretical analysis. Experiments demonstrate OPCB’s superior performance over typical methods in both OPE and OPL.

Full text in ACM Digital Library
RESEmbedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBark
by Shijie Liu (NVIDIA Corporation), Nan Zheng (NVIDIA Corporation), Hui Kang (NVIDIA Corporation), Xavier Simmons (NVIDIA Corporation), Junjie Zhang (NVIDIA Corporation), Matthias Langer (NVIDIA Corporation), Wenjing Zhu (NVIDIA Corporation), Minseok Lee (NVIDIA Corporation) and Zehuan Wang (NVIDIA Corporation)

Training large-scale deep learning recommendation models (DLRMs) with embedding tables stretching across multiple GPUs in a cluster presents a unique challenge, demanding the efficient scaling of embedding operations that require substantial memory and network bandwidth within a hierarchical network of GPUs. To tackle this bottleneck, we introduce EMBark—a comprehensive solution aimed at enhancing embedding performance and overall DLRM training throughput at scale. EMBark empowers users to create and customize sharding strategies, and features a highly-automated sharding planner, to accelerate diverse model architectures on different cluster configurations. EMBark groups embedding tables, considering their preferred communication compression method to reduce communication overheads effectively. It embraces efficient data-parallel category distribution, combined with topology-aware hierarchical communication, and pipelining support to maximize the DLRM training throughput. Across four representative DLRM variants (DLRM-DCNv2, T180, T200, and T510), EMBark achieves an average end-to-end training throughput speedup of 1.5 × and up to 1.77 × over traditional table-row-wise sharding approaches.

Full text in ACM Digital Library
RESEnd-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling
by Zexu Sun (Renmin University of China), Hao Yang (Renmin University of China), Dugang Liu (Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)), Yunpeng Weng (Tencent), Xing Tang (Tencent) and Xiuqiang He (Tencent)

In modern online platforms, incentives (e.g., discounts, bonus) are essential factors that enhance user engagement and increase platform revenue. Over recent years, uplift modeling has been introduced as a strategic approach to assign incentives to individual customers. Especially in many real-world applications, online platforms can only incentivize customers with specific budget constraints. This problem can be reformulated as the multi-choice knapsack problem (MCKP). The objective of this optimization is to select the optimal incentive for each customer to maximize the return on investment (ROI). Recent works in this field frequently tackle the budget allocation problem using a two-stage approach. However, this solution is confronted with the following challenges: (1) The causal inference methods often ignore the domain knowledge in online marketing, where the expected response curve of a customer should be monotonic and smooth as the incentive increases. (2) There is an optimality gap between the two stages, resulting in inferior sub-optimal allocation performance due to the loss of the incentive recommendation information for the uplift prediction under the limited budget constraint. To address these challenges, we propose a novel End-to-End Cost-Effective Incentive Recommendation (E3IR) model under the budget constraint. Specifically, our methods consist of two modules, i.e., the uplift prediction module and the differentiable allocation module. In the uplift prediction module, we construct prediction heads to capture the incremental improvement between adjacent treatments with the marketing domain constraints (i.e., monotonic and smooth). We incorporate integer linear programming (ILP) as a differentiable layer input in the differentiable allocation module. Furthermore, we conduct extensive experiments on public and real product datasets, demonstrating that our E3IR improves allocation performance compared to existing two-stage approaches.

Full text in ACM Digital Library
RESFair Reciprocal Recommendation in Matching Markets
by Yoji Tomita (CyberAgent Inc.) and Tomohiko Yokoyama (The University of Tokyo)

Recommender systems play an increasingly crucial role in shaping people’s opportunities, particularly in online dating platforms. It is essential from the user’s perspective to increase the probability of matching with a suitable partner while ensuring an appropriate level of fairness in the matching opportunities.

We investigate reciprocal recommendation in two-sided matching markets between agents divided into two sides. In our model, a match is considered successful only when both individuals express interest in each other. Additionally, we assume that agents prefer to appear prominently in the recommendation lists presented to those on the other side. We define each agent’s opportunity to be recommended and introduce its fairness criterion, envy-freeness, from the perspective of fair division theory. The recommendations that approximately maximize the expected number of matches, empirically obtained by heuristic algorithms, are likely to result in significant unfairness of opportunity. Therefore, there can be a trade-off between maximizing the expected matches and ensuring fairness of opportunity. To address this challenge, we propose a method to find a policy that is close to being envy-free by leveraging the Nash social welfare function. Experiments on synthetic and real-world datasets demonstrate the effectiveness of our approach in achieving both relatively high expected matches and fairness for opportunities of both sides in reciprocal recommender systems.

Full text in ACM Digital Library
RESFairCRS: Towards User-oriented Fairness in Conversational Recommendation Systems
by Qin Liu (Jinan University), Xuan Feng (Jinan University), Tianlong Gu (Jinan University) and Xiaoli Liu (Jinan University)

Conversational Recommendation Systems (CRSs) enable recommender systems to explicitly acquire user preferences during multi-turn interactions, providing more accurate and personalized recommendations. However, the data imbalance in CRSs, due to inconsistent interaction history among users, may lead to disparate treatment for disadvantaged user groups. In this paper, we investigate the discriminate problems in CRS from the user’s perspective, called as user-oriented fairness. To reveal the unfairness problems of different user groups in CRS, we conduct extensive empirical analyses. To mitigate user unfairness, we propose a user-oriented fairness framework, named FairCRS, which is a model-agnostic framework. In particular, we develop a user-embedding reconstruction mechanism that enriches user embeddings by incorporating more interaction information, and design a user-oriented fairness strategy that optimizes the recommendation quality differences among user groups while alleviating unfairness. Extensive experimental results on English and Chinese datasets show that FairCRS outperforms state-of-the-art CRSs in terms of overall recommendation performance and user fairness.

Full text in ACM Digital Library
RESFedLoCA: Low-Rank Coordinated Adaptation with Knowledge Decoupling for Federated Recommendations
by Yuchen Ding (University of Science and Technology of China), Siqing Zhang (University of Science and Technology of China), Boyu Fan (University of Helsinki), Wei Sun (University of Science and Technology of China), Yong Liao (University of Science and Technology of China) and Peng Yuan Zhou (Aarhus University)

Privacy protection in recommendation systems is gaining increasing attention, for which federated learning has emerged as a promising solution. Current federated recommendation systems grapple with high communication overhead due to sharing dense global embeddings, and also poorly reflect user preferences due to data heterogeneity. To overcome these challenges, we propose a two-stage Federated Low-rank Coordinated Adaptation (FedLoCA) framework to decouple global and client-specific knowledge into low-rank embeddings, which significantly reduces communication overhead while enhancing the system’s ability to capture individual user preferences amidst data heterogeneity. Further, to tackle gradient estimation inaccuracies stemming from data sparsity in federated recommendation systems, we introduce an adversarial gradient projected descent approach in low-rank spaces, which significantly boosts model performance while maintaining robustness. Remarkably, FedLoCA also alleviates performance loss even under the stringent constraints of differential privacy. Extensive experiments on various real-world datasets demonstrate that FedLoCA significantly outperforms existing methods in both recommendation accuracy and communication efficiency.

Full text in ACM Digital Library
RESFLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction
by Hangyu Wang (Shanghai Jiao Tong University), Jianghao Lin (Shanghai Jiao Tong University), Xiangyang Li (Huawei Noah’s Ark Lab), Bo Chen (Huawei Noah’s Ark Lab), Chenxu Zhu (Huawei Noah’s Ark Lab), Ruiming Tang (Huawei Noah’s Ark Lab), Weinan Zhang (Shanghai Jiao Tong University) and Yong Yu (Shanghai Jiao Tong University)

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (i.e., IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is available.

Full text in ACM Digital Library
RESImproving Adversarial Robustness for Recommendation Model via Cross-Domain Distributional Adversarial Training
by Jingyu Chen (Sichuan University), Lilin Zhang (Sichuan University) and Ning Yang (Sichuan University)

Recommendation models based on deep learning are fragile when facing adversarial examples (AE). Adversarial training (AT) is the existing mainstream method to promote the adversarial robustness of recommendation models. However, these AT methods often have two drawbacks. First, they may be ineffective due to the ubiquitous sparsity of interaction data. Second, point-wise perturbation used by these AT methods leads to suboptimal adversarial robustness, because not all examples are equally susceptible to such perturbations. To overcome these issues, we propose a novel method called Cross-domain Distributional Adversarial Training (CDAT) which utilizes a richer auxiliary domain to improve the adversarial robustness of a sparse target domain. CDAT comprises a Domain adversarial network (Dan) and a Cross-domain adversarial example generative network (Cdan). Dan learns a domain-invariant preference distribution which is obtained by aligning user embeddings from two domains and paves the way to leverage the knowledge from another domain for the target domain. Then, by adversarially perturbing the domain-invariant preference distribution under the guidance of a discriminator, Cdan captures an aggressive and imperceptible AE distribution. In this way, CDAT can transfer distributional adversarial robustness from the auxiliary domain to the target domain. The extensive experiments conducted on real datasets demonstrate the remarkable superiority of the proposed CDAT in improving the adversarial robustness of the sparse domain. The codes and datasets are available on https://github.com/HymanLoveGIN/CDAT.

Full text in ACM Digital Library
RESImproving the Shortest Plank: Vulnerability-Aware Adversarial Training for Robust Recommender System
by Kaike Zhang (Chinese Academy of Sciences), Qi Cao (Chinese Academy of Sciences), Yunfan Wu (Chinese Academy of Sciences), Fei Sun (Chinese Academy of Sciences), Huawei Shen (Chinese Academy of Sciences) and Xueqi Cheng (Chinese Academy of Sciences)

Recommender systems play a pivotal role in mitigating information overload in various fields. Nonetheless, the inherent openness of these systems introduces vulnerabilities, allowing attackers to insert fake users into the system’s training data to skew the exposure of certain items, known as poisoning attacks. Adversarial training has emerged as a notable defense mechanism against such poisoning attacks within recommender systems. Existing adversarial training methods apply perturbations of the same magnitude across all users to enhance system robustness against attacks. Yet, in reality, we find that attacks often affect only a subset of users who are vulnerable. These perturbations of indiscriminate magnitude make it difficult to balance effective protection for vulnerable users without degrading recommendation quality for those who are not affected. To address this issue, our research delves into understanding user vulnerability. Considering that poisoning attacks pollute the training data, we note that the higher degree to which a recommender system fits users’ training data correlates with an increased likelihood of users incorporating attack information, indicating their vulnerability. Leveraging these insights, we introduce the Vulnerability-aware Adversarial Training (VAT), designed to defend against poisoning attacks in recommender systems. VAT employs a novel vulnerability-aware function to estimate users’ vulnerability based on the degree to which the system fits them. Guided by this estimation, VAT applies perturbations of adaptive magnitude to each user, not only reducing the success ratio of attacks but also preserving, and potentially enhancing, the quality of recommendations. Comprehensive experiments confirm VAT’s superior defensive capabilities across different recommendation models and against various types of attacks.

Full text in ACM Digital Library
RESInformation-Controllable Graph Contrastive Learning for Recommendation
by Zirui Guo (Beijing University of Posts and Telecommunications), Yanhua Yu (Beijing University of Posts and Telecommunications), Yuling Wang (Hangzhou Dianzi University), Kangkang Lu (Beijing University of Posts and Telecommunications), Zixuan Yang (Beijing University of Posts and Telecommunications), Liang Pang (Chinese Academy of Sciences) and Tat-Seng Chua (National University of Singapore)

In the evolving landscape of recommender systems, Graph Contrastive Learning (GCL) has become a prominent method for enhancing recommendation performance by alleviating the issue of data sparsity. However, existing GCL-based recommendations often overlook the control of shared information between the contrastive views. In this paper, we initially analyze and experimentally demonstrate these methods often lead to the issue of augmented representation collapse, where the representations between views become excessively similar, diminishing their distinctiveness. To address this issue, we propose the Information-Controllable Graph Contrastive Learning (IGCL) framework, a novel approach that focuses on optimizing the shared information between views to include as much relevant information for the recommendation task as possible while maintaining an appropriate level. In particular, we design the Collaborative Signals Enhanced Augmentation module to infuse the augmented representation with rich, task-relevant collaborative signals. Furthermore, the Information-Controllable Contrastive Learning module is designed to direct control over the magnitude of shared information between the contrastive views to avoid over-similarity. Extensive experiments on three public datasets demonstrate the effectiveness of IGCL, showcasing significant improvements in performance and the capability to alleviate augmented representation collapse.

Full text in ACM Digital Library
RESInstructing and Prompting Large Language Models for Explainable Cross-domain Recommendations
by Alessandro Petruzzelli (University of Bari Aldo Moro), Cataldo Musto (University of Bari), Lucrezia Laraspata (University of Bari), Ivan Rinaldi (University of Bari Aldo Moro), Marco de Gemmis (University of Bari Aldo Moro), Pasquale Lops (University of Bari) and Giovanni Semeraro (University of Bari)

In this paper, we present a strategy to provide users with explainable cross-domain recommendations (CDR) that exploits large language models (LLMs). Generally speaking, CDR is a task that is hard to tackle, mainly due to data sparsity issues. Indeed, CDR models require a large amount of data labeled in both source and target domains, which are not easy to collect. Accordingly, our approach relies on the intuition that the knowledge that is already encoded in LLMs can be used to more easily bridge the domains and seamlessly provide users with personalized cross-domain suggestions.

To this end, we designed a pipeline to: (a) instruct a LLM to handle a CDR task; (b) design a personalized prompt, based on the preferences of the user in a source domain, and a list of items to be ranked in target domain; (c) feed the LLM with the prompt, in both zero-shot and one-shot settings, and process the answer in order to extract the recommendations and a natural language explanation. As shown in the experimental evaluation, our approach beats several established state-of-the-art baselines for CDR in most of the experimental settings, thus showing the effectiveness of LLMs also in this novel and scarcely investigated scenario.

Full text in ACM Digital Library
RESLARR: Large Language Model Aided Real-time Scene Recommendation with Semantic Understanding
by Zhizhong Wan (Meituan), Bin Yin (Meituan), Junjie Xie (Meituan), Fei Jiang (Meituan), Xiang Li (Meituan) and Wei Lin (Meituan)

Click-Through Rate (CTR) prediction is crucial for Recommendation System(RS), aiming to provide personalized recommendation services for users in many aspects such as food delivery, e-commerce and so on. However, traditional RS relies on collaborative signals, which lacks semantic understanding to real-time scenes. We also noticed that a major challenge in utilizing Large Language Models (LLMs) for practical recommendation purposes is their efficiency in dealing with long text input. To break through the problems above, we propose Large Language Model Aided Real-time Scene Recommendation(LARR), adopt LLMs for semantic understanding, utilizing real-time scene information in RS without requiring LLM to process the entire real-time scene text directly, thereby enhancing the efficiency of LLM-based CTR modeling. Specifically, recommendation domain-specific knowledge is injected into LLM and then RS employs an aggregation encoder to build real-time scene information from separate LLM’s outputs. Firstly, a LLM is continual pretrained on corpus built from recommendation data with the aid of special tokens. Subsequently, the LLM is fine-tuned via contrastive learning on three kinds of sample construction strategies. Through this step, LLM is transformed into a text embedding model. Finally, LLM’s separate outputs for different scene features are aggregated by an encoder, aligning to collaborative signals in RS, enhancing the performance of recommendation model.

Full text in ACM Digital Library
RESLow Rank Field-Weighted Factorization Machines for Low Latency Item Recommendation
by Alex Shtoff (Yahoo Research), Michael Viderman (Yahoo Research), Naama Haramaty-Krasne, Oren Somekh (Yahoo Research), Ariel Raviv (Meta) and Tularam Ban (Yahoo Research)

Factorization machine (FM) variants are widely used in recommendation systems that operate under strict throughput and latency requirements, such as online advertising systems. FMs have two prominent strengths. First, is their ability to model pairwise feature interactions while being resilient to data sparsity by learning factorized representations. Second, their computational graphs facilitate fast inference and training. Moreover, when items are ranked as a part of a query for each incoming user, these graphs facilitate computing the portion stemming from the user and context fields only once per query. Thus, the computational cost for each ranked item is proportional only to the number of fields that vary among the ranked items. Consequently, in terms of inference cost, the number of user or context fields is practically unlimited.

More advanced variants of FMs, such as field-aware and field-weighted FMs, provide better accuracy by learning a representation of field-wise interactions, but require computing all pairwise interaction terms explicitly. In particular, the computational cost during inference is proportional to the square of the number of fields, including user, context, and item. When the number of fields is large, this is prohibitive in systems with strict latency constraints, and imposes a limit on the number of user and context fields for a given computational budget. To mitigate this caveat, heuristic pruning of low intensity field interactions is commonly used to accelerate inference.

In this work we propose an alternative to the pruning heuristic in field-weighted FMs using a diagonal plus symmetric low-rank decomposition. Our technique reduces the computational cost of inference, by allowing it to be proportional to the number of item fields only. Using a set of experiments on real-world datasets, we show that aggressive rank reduction outperforms similarly aggressive pruning in both accuracy and item recommendation speed. Beyond computational complexity analysis, we corroborate our claim of faster inference experimentally, both via a synthetic test, and by having deployed our solution to a major online advertising system, where we observed significant ranking latency improvements. We have made the code to reproduce the results on public datasets and synthetic tests available at https://github.com/michaelviderman/pytorch-fm.

Full text in ACM Digital Library
RESMARec: Metadata Alignment for cold-start Recommendation
by Julien Monteil (Amazon), Volodymyr Vaskovych (Amazon), Wentao Lu (Amazon), Anirban Majumder (Amazon) and Anton van den Hengel (University of Adelaide)

For many recommender systems, the primary data source is a historical record of user clicks. The associated click matrix is often very sparse, as the number of users × products can be far larger than the number of clicks. Such sparsity is accentuated in cold-start settings, which makes the efficient use of metadata information of paramount importance. In this work, we propose a simple approach to address cold-start recommendations by leveraging content metadata, Metadata Alignment for cold-start Recommendation (MARec). We show that this approach can readily augment existing matrix factorization and autoencoder approaches, enabling a smooth transition to top performing algorithms in warmer set-ups. Our experimental results indicate three separate contributions: first, we show that our proposed framework largely beats SOTA results on 4 cold-start datasets with different sparsity and scale characteristics, with gains ranging from +8.4% to +53.8% on reported ranking metrics; second, we provide an ablation study on the utility of semantic features, and proves the additional gain obtained by leveraging such features ranges between +46.8% and +105.5%; and third, our approach is by construction highly competitive in warm set-ups, and we propose a closed-form solution outperformed by SOTA results by only 0.8% on average.

Full text in ACM Digital Library
RESMLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction
by Zhiming Yang (Northwestern Polytechnical University), Haining Gao (Alibaba Group), Dehong Gao (Northwestern Polytechnical University), Luwei Yang (Alibaba Group), Libin Yang (Northwestern Polytechnical University), Xiaoyan Cai (Northwestern Polytechnical University), Wei Ning (Alibaba Group) and Guannan Zhang (Alibaba Group)

Click-through rate (CTR) prediction is one of the fundamental tasks in the industry, especially in e-commerce, social media, and streaming media. It directly impacts website revenues, user satisfaction, and user retention. However, real-world production platforms often encompass various domains to cater for diverse customer needs. Traditional CTR prediction models struggle in multi-domain recommendation scenarios, facing challenges of data sparsity and disparate data distributions across domains. Existing multi-domain recommendation approaches introduce specific-domain modules for each domain, which partially address these issues but often significantly increase model parameters and lead to insufficient training. In this paper, we propose a Multi-domain Low-Rank Adaptive network (MLoRA) for CTR prediction, where we introduce a specialized LoRA module for each domain. This approach enhances the model’s performance in multi-domain CTR prediction tasks and is able to be applied to various deep-learning models. We evaluate the proposed method on several multi-domain datasets. Experimental results demonstrate our MLoRA approach achieves a significant improvement compared with state-of-the-art baselines. Furthermore, we deploy it in the production environment of the Alibaba.COM. The online A/B testing results indicate the superiority and flexibility in real-world production environments. The code of our MLoRA is publicly available.

Full text in ACM Digital Library
RESMMGCL: Meta Knowledge-Enhanced Multi-view Graph Contrastive Learning for Recommendations
by Yuezihan Jiang (Kuaishou Technology), Changyu Li (Kuaishou Technology), Gaode Chen (Chinese Academy of Sciences), Peiyi Li (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Peng Jiang (Kuaishou Inc.), Fei Sun (China) and Wentao Zhang (Peking University)

Multi-view Graph Learning is popular in recommendations due to its ability to capture relationships and connections across multiple views. Existing multi-view graph learning methods generally involve constructing graphs of views and performing information aggregation on view representations. Despite their effectiveness, they face two data limitations: Multi-focal Multi-source data noise and multi-source Data Sparsity. The former arises from the combination of noise from individual views and conflicting edges between views when information from all views is combined. The latter occurs because multi-view learning exacerbate the negative influence of data sparsity because these methods require more model parameters to learn more view information. Motivated by these issues, we propose MMGCL, a meta knowledge-enhanced multi-view graph contrastive learning framework for recommendations. To tackle the data noise issue, MMGCL extract meta knowledge to preserve important information from all views to form a meta view representation. It then rectifies every view in multi-learning frameworks, thus simultaneously removing the view-private noisy edges and conflicting edges across different views. To address the data sparsity issue, MMGCL performs meta knowledge transfer contrastive learning optimization on all views to reduce the searching space for model parameters and add more supervised signal. Besides, we have deployed MMGCL in a real industrial recommender system in China, and we further evaluate it on three benchmark datasets and a practical industry online application. Extensive experiments on these datasets demonstrate the state-of-the-art recommendation performance of MMGCL.

Full text in ACM Digital Library
RESMulti-Objective Recommendation via Multivariate Policy Learning
by Olivier Jeunen (ShareChat), Jatin Mandav (ShareChat), Ivan Potapov (ShareChat), Nakul Agarwal (ShareChat), Sourabh Vaid (ShareChat), Wenzhe Shi (ShareChat) and Aleksei Ustimenko (ShareChat)

Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform.

We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

Full text in ACM Digital Library
RESNot All Videos Become Outdated: Short-Video Recommendation by Learning to Deconfound Release Interval Bias
by Lulu Dong (East China Normal University), Guoxiu He (East China Normal University) and Aixin Sun (Nanyang Technological University)

Short-video recommender systems often exhibit a biased preference to recently released videos. However, not all videos become outdated; certain classic videos can still attract user’s attention. Such bias along temporal dimension can be further aggravated by the matching model between users and videos, because the model learns from preexisting interactions. From real data, we observe that different videos have varying sensitivities to recency in attracting users’ attention. Our analysis, based on a causal graph modeling short-video recommendation, suggests that the release interval serves as a confounder, establishing a backdoor path between users and videos. To address this confounding effect, we propose a model-agnostic causal architecture called Learning to Deconfound the Release Interval Bias (LDRI). LDRI enables jointly learning of the matching model and the video recency sensitivity perceptron. In the inference stage, we apply a backdoor adjustment, effectively blocking the backdoor path by intervening on each video. Extensive experiments on two benchmarks demonstrate that LDRI consistently outperforms backbone models and exhibits superior performance against state-of-the-art models. Additional comprehensive analyses confirm the deconfounding capability of LDRI.

Full text in ACM Digital Library
RESOptimal Baseline Corrections for Off-Policy Contextual Bandits
by Shashank Gupta (University of Amsterdam), Olivier Jeunen (ShareChat), Harrie Oosterhuis (Radboud University) and Maarten de Rijke (University of Amsterdam)

The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of control variates, either additive (i.e., baseline corrections or doubly robust methods) or multiplicative (i.e., self-normalisation).

Our work unifies these approaches by proposing a single framework built on their equivalence in learning scenarios. The foundation of our framework is the derivation of an equivalent baseline correction for all of the existing control variates. Consequently, our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it. This optimal estimator brings significantly improved performance in both evaluation and learning, and minimizes data requirements. Empirical observations corroborate our theoretical findings.

Full text in ACM Digital Library
RESPrompt Tuning for Item Cold-start Recommendation
by Yuezihan Jiang (Kuaishou Technology), Gaode Chen (Kuaishou Technology), Wenhan Zhang (Peking University), Jingchi Wang (Peking University), Yinjie Jiang (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Peng Jiang (Kuaishou Technology) and Kaigui Bian (Peking University)

The item cold-start problem is crucial for online recommender systems, as the success of the cold-start phase determines whether items can transition into popular ones. Prompt learning, a powerful technique used in natural language processing (NLP) to address zero- or few-shot problems, has been adapted for recommender systems to tackle similar challenges. However, existing methods typically rely on content-based properties or text descriptions for prompting, which we argue may be suboptimal for cold-start recommendations due to 1) semantic gaps with recommender tasks, 2) model bias caused by warm-up items contribute most of the positive feedback to the model, which is the core of the cold-start problem that hinders the recommender quality on cold-start items. We propose to leverage high-value positive feedback, termed pinnacle feedback as prompt information, to simultaneously resolve the above two problems. We experimentally prove that compared to the content description proposed in existing works, the positive feedback is more suitable to serve as prompt information by bridging the semantic gaps. Besides, we propose item-wise personalized prompt networks to encode pinnaclce feedback to relieve the model bias by the positive feedback dominance problem. Extensive experiments on four real-world datasets demonstrate the superiority of our model over state-of-the-art methods. Moreover, PROMO has been successfully deployed on a popular short-video sharing platform, a billion-user scale commercial short-video application, achieving remarkable performance gains across various commercial metrics within cold-start scenarios.

Full text in ACM Digital Library
RESPutting Popularity Bias Mitigation to the Test: A User-Centric Evaluation in Music Recommenders
by Robin Ungruh (Delft University of Technology), Karlijn Dinnissen (Utrecht University), Anja Volk (Utrecht University), Maria Soledad Pera (Delft University of Technology) and Hanna Hauptmann (Utrecht University)

Popularity bias is a prominent phenomenon in recommender systems (RS), especially in the music domain. Although popularity bias mitigation techniques are known to enhance the fairness of RS while maintaining their high performance, there is a lack of understanding regarding users’ actual perception of the suggested music. To address this gap, we conducted a user study (n=40) exploring user satisfaction and perception of personalized music recommendations generated by algorithms that explicitly mitigate popularity bias. Specifically, we investigate item-centered and user-centered bias mitigation techniques, aiming to ensure fairness for artists or users, respectively. Results show that neither mitigation technique harms the users’ satisfaction with the recommendation lists despite promoting underrepresented items. However, the item-centered mitigation technique impacts user perception; by promoting less popular items, it reduces users’ familiarity with the items. Lower familiarity evokes discovery—the feeling that the recommendations enrich the user’s taste. We demonstrate that this can ultimately lead to higher satisfaction, highlighting the potential of less-popular recommendations to improve the user experience.

Full text in ACM Digital Library
RESRanking-Aware Unbiased Post-Click Conversion Rate Estimation via AUC Optimization on Entire Exposure Space
by Yu Liu (Nanjing University;Huawei Technologies Co., Ltd.), Qinglin Jia (Huawei Noah’s Ark Lab), Shuting Shi (Huawei Technologies Co., Ltd.), Chuhan Wu (Huawei Noah’s Ark Lab), Zhaocheng Du (Huawei Noah’s Ark Lab), Zheng Xie (Nanjing University), Ruiming Tang (Huawei Noah’s Ark Lab), Muyu Zhang (Huawei Technologies Co., Ltd.) and Ming Li (Nanjing University)

Estimating the post-click conversion rate (CVR) accurately in ranking systems is crucial in industrial applications. However, this task is often challenged by data sparsity and selection bias, which hinder accurate ranking. Previous approaches to address these challenges have typically focused on either modeling CVR across the entire exposure space which includes all exposure events, or providing unbiased CVR estimation separately. However, the lack of integration between these objectives has limited the overall performance of CVR estimation. Therefore, there is a pressing need for a method that can simultaneously provide unbiased CVR estimates across the entire exposure space. To achieve it, we formulate the CVR estimation task as an Area Under the Curve (AUC) optimization problem and propose the Entire-space Weighted AUC (EWAUC) framework. EWAUC utilizes sample reweighting techniques to handle selection bias and employs pairwise AUC risk, which incorporates more information from limited clicked data, to handle data sparsity. In order to model CVR across the entire exposure space unbiasedly, EWAUC treats the exposure data as both conversion data and non-conversion data to calculate the loss. The properties of AUC risk guarantee the unbiased nature of the entire space modeling. We provide comprehensive theoretical analysis to validate the unbiased nature of our approach. Additionally, extensive experiments conducted on real-world datasets demonstrate that our approach outperforms state-of-the-art methods in terms of ranking performance for the CVR estimation task.

Full text in ACM Digital Library
RESReLand: Integrating Large Language Models’ Insights into Industrial Recommenders via a Controllable Reasoning Pool
by Changxin Tian (Ant Group), Binbin Hu (Ant Group), Chunjing Gan (Ant Group), Haoyu Chen (Ant Group), Zhuo Zhang (Ant Group), Li Yu (Ant Group), Ziqi Liu (Ant Group), Zhiqiang Zhang (Ant Group), Jun Zhou (Ant Group) and Jiawei Chen (Zhejiang University)

Recently, Large Language Models (LLMs) have shown significant potential in addressing the isolation issues faced by recommender systems. However, despite performance comparable to traditional recommenders, the current methods are cost-prohibitive for industrial applications. Consequently, existing LLM-based methods still need to catch up regarding effectiveness and efficiency. To tackle the above challenges, we present an LLM-enhanced recommendation framework named ReLand, which leverages Retrieval to effortlessly integrate Large language models’ insights into industrial recommenders. Specifically, ReLand employs LLMs to perform generative recommendations on sampled users (a.k.a., seed users), thereby constructing an LLM Reasoning Pool. Subsequently, we leverage retrieval to attach reliable recommendation rationales for the entire user base, ultimately effectively improving recommendation performance. Extensive offline and online experiments validate the effectiveness of ReLand. Since January 2024, ReLand has been deployed in the recommender system of Alipay, achieving statistically significant improvements of 3.19% in CTR and 1.08% in CVR.

Full text in ACM Digital Library
RESRepeated Padding for Sequential Recommendation
by Yizhou Dang (Northeastern University), Yuting Liu (Northeastern University), Enneng Yang (Northeastern University), Guibing Guo (Northeastern University), Linying Jiang (Northeastern University), Xingwei Wang (Northeastern University) and Jianzhe Zhao (Northeastern University)

Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batch-based training needs to ensure that the sequences in each batch have the same length. The special value 0 is usually used as the padding content, which does not contain the actual information and is ignored in the model calculations. This common-sense padding strategy leads us to a problem that has never been explored in the recommendation field: Can we utilize this idle input space by padding other content to improve model performance and training efficiency further?

In this paper, we propose a simple yet effective padding method called Repeated Padding (RepPad). Specifically, we use the original interaction sequences as the padding content and fill it to the padding positions during model training. This operation can be performed a finite number of times or repeated until the input sequences’ length reaches the maximum limit. Our RepPad can be considered as a sequence-level data augmentation strategy. Unlike most existing works, our method contains no trainable parameters or hyperparameters and is a plug-and-play data augmentation operation. Extensive experiments on various categories of sequential models and five real-world datasets demonstrate the effectiveness and efficiency of our approach. The average recommendation performance improvement is up to 60.3% on GRU4Rec and 24.3% on SASRec. We also provide in-depth analysis and explanation of what makes RepPad effective from multiple perspectives. Our datasets and codes are available at https://github.com/KingGugu/RepPad.

Full text in ACM Digital Library
RESRight Tool, Right Job: Recommendation for Repeat and Exploration Consumption in Food Delivery
by Jiayu Li (Tsinghua University), Aixin Sun (Nanyang Technological University), Weizhi Ma (Tsinghua University), Peijie Sun (Tsinghua University) and Min Zhang (Tsinghua University

From e-commerce to music and news, recommender systems are tailored to specific scenarios. While researching generic models applicable to various scenarios is crucial, studying recommendations based on the unique characteristics of a specific and vital scenario holds both research and, more importantly, practical value.

In this paper, we focus on store recommendations in the food delivery scenario, which is an intriguing and significant domain with unique behavior patterns and influential factors. First, we offer an in-depth analysis of real-world food delivery data across platforms and countries, revealing that (i) repeat and exploration orders are both noticeable behaviors and (ii) the influences of historical and collaborative situations on repeat and exploration consumption are distinct. Second, based on the observations, we separately design two simple yet effective recommendation models: RepRec for repeat orders and ExpRec for exploration ones. An ensemble module is further proposed to combine recommendations from two models for a unified recommendation list. Finally, experiments are conducted on three datasets spanning three countries across two food delivery platforms. Results demonstrate the superiority of our proposed recommenders on repeat, exploration, and combined recommendation tasks over various baselines. Such simple yet effective approaches will be beneficial for real applications. This work shows that dedicated analyses and methods for domain-specific characteristics are essential for the recommender system studies.

Full text in ACM Digital Library
RESRPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems
by Shuo Su (Kuaishou Technology), Xiaoshuang Chen (Kuaishou Technology), Yao Wang (Kuaishou Technology), Yulin Wu (Kuaishou Technology), Ziqiang Zhang (Tsinghua University), Kaiqiao Zhan (Kuaishou Technology), Ben Wang (Kuaishou Technology) and Kun Gai

Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users’ overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users’ engagement under computational budget constraints.

Full text in ACM Digital Library
RESScalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs
by Gleb Mezentsev (Skolkovo Institute of Science and Technology), Danil Gusak (Skolkovo Institute of Science and Technology; HSE University), Ivan Oseledets (Artificial Intelligence Research Institute; Skolkovo Institute of Science and Technology) and Evgeny Frolov (Artificial Intelligence Research Institute; Skolkovo Institute of Science and Technology; HSE University)

Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

Full text in ACM Digital Library
RESScaling Law of Large Sequential Recommendation Models
by Gaowei Zhang (Renmin University of China), Yupeng Hou (University of California San Diego), Hongyu Lu (Tencent), Yu Chen (Tencent), Wayne Xin Zhao (Renmin University of China) and Ji-Rong Wen (Renmin University of China)

Scaling of neural networks has recently shown great potential to improve the model capacity in various fields. Specifically, model performance has a power-law relationship with model size or data size, which provides important guidance for the development of large-scale models. However, there is still limited understanding on the scaling effect of user behavior models in recommender systems, where the unique data characteristics (e.g., data scarcity and sparsity) pose new challenges in recommendation tasks.

In this work, we focus on investigating the scaling laws in large sequential recommendation models. Specifically, we consider a pure ID-based task formulation, where the interaction history of a user is formatted as a chronological sequence of item IDs. We don’t incorporate any side information (e.g., item text), to delve into the scaling law’s applicability from the perspective of user behavior. We successfully scale up the model size to 0.8B parameters, making it feasible to explore the scaling effect in a diverse range of model sizes. As the major findings, we empirically show that the scaling law still holds for these trained models, even in data-constrained scenarios. We then fit the curve for scaling law, and successfully predict the test loss of the two largest tested model scales.

Furthermore, we examine the performance advantage of scaling effect on five challenging recommendation tasks, considering the unique issues (e.g., cold start, robustness, long-term preference) in recommender systems. We find that scaling up the model size can greatly boost the performance on these challenging tasks, which again verifies the benefits of large recommendation models.

Full text in ACM Digital Library
RESScene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction
by Wenhao Li (Huazhong University of Science and Technology; Meituan), Jie Zhou (Beihang University), Chuan Luo (Beihang University), Chao Tang (Meituan), Kun Zhang (Meituan) and Shixiong Zhao (The University of Hong Kong)

In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices, along with the short online lifespan of scenes, give rise to the complexity of effective recommendations in online and dynamic scenes. In this work, we propose Scene-wise Adaptive Network (SwAN 1), a novel approach that emphasizes high-performance cold-start online recommendations for new scenes. Our approach introduces several crucial capabilities, including scene similarity learning, user-specific scene transition cognition, scene-specific information construction for the new scene, and enhancing the diverged logical information between scenes. We demonstrate SwAN’s potential to optimize dynamic multi-scene recommendation problems by effectively online handling cold-start recommendations for any newly arrived scenes. More encouragingly, SwAN has been successfully deployed in Meituan’s online catering recommendation service, which serves millions of customers per day, and SwAN has achieved a 5.64% CTR index improvement relative to the baselines and a 5.19% increase in daily order volume proportion.

Full text in ACM Digital Library
RESSeCor: Aligning Semantic and Collaborative Representations by Large Language Models for Next-Point-of-Interest Recommendations
by Shirui Wang (Tongji University), Bohan Xie (Tongji University), Ling Ding (Tongji University), Xiaoying Gao (Tongji University), Jianting Chen (Tongji University) and Yang Xiang (Tongji University)

The widespread adoption of location-based applications has created a growing demand for point-of-interest (POI) recommendation, which aims to predict a user’s next POI based on their historical check-in data and current location. However, existing methods often struggle to capture the intricate relationships within check-in data. This is largely due to their limitations in representing temporal and spatial information and underutilizing rich semantic features. While large language models (LLMs) offer powerful semantic comprehension to solve them, they are limited by hallucination and the inability to incorporate global collaborative information. To address these issues, we propose a novel method SeCor, which treats POI recommendation as a multi-modal task and integrates semantic and collaborative representations to form an efficient hybrid encoding. SeCor first employs a basic collaborative filtering model to mine interaction features. These embeddings, as one modal information, are fed into LLM to align with semantic representation, leading to efficient hybrid embeddings. To mitigate the hallucination, SeCor recommends based on the hybrid embeddings rather than directly using the LLM’s output text. Extensive experiments on three public real-world datasets show that SeCor outperforms all baselines, achieving improved recommendation performance by effectively integrating collaborative and semantic information through LLMs.

Full text in ACM Digital Library
RESThe Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation
by Zekai Qu (China University of Geosciences Beijing), Ruobing Xie (Tencent Inc.), Chaojun Xiao (Tsinghua University), Zhanhui Kang (Tencent Inc.) and Xingwu Sun (Tencent Inc.)

Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior’s text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensive model analyses between PLMs and PLM-based SR models, discovering great underutilization and parameter redundancy of PLMs in behavior sequence modeling. Inspired by this, we explore different lightweight usages of PLMs in SR, aiming to maximally stimulate the ability of PLMs for SR while satisfying the efficiency and usability demands of practical systems. We discover that adopting behavior-tuned PLMs for item initializations of conventional ID-based SR models is the most economical framework of PLM-based SR, which would not bring in any additional inference cost but could achieve a dramatic performance boost compared with the original version. Extensive experiments on five datasets show that our simple and universal framework leads to significant improvement compared to classical SR and SOTA PLM-based SR models without additional inference costs. Our code can be found in https://github.com/777pomingzi/Rethinking-PLM-in-RS.

Full text in ACM Digital Library
RESThe Fault in Our Recommendations: On the Perils of Optimizing the Measurable
by Omar Besbes (Columbia University), Yash Kanoria (Columbia University) and Akshit Kumar (Columbia University)

Recommendation systems are widespread, and through customized recommendations, promise to match users with options they will like. To that end, data on engagement is collected and used. Most recommendation systems are ranking-based, where they rank and recommend items based on their predicted engagement. However, the engagement signals are often only a crude proxy for user utility, as data on the latter is rarely collected or available. This paper explores the following question: By optimizing for measurable proxies, are recommendation systems at risk of significantly under-delivering on user utility? If that is indeed the case, how can one improve utility which is seldom measured?To study these questions, we introduce a model of repeated user consumption in which, at each interaction, users select between an outside option and the best option from a recommendation set. Our model accounts for user heterogeneity, with the majority preferring “popular” content, and a minority favoring “niche” content. The system initially lacks knowledge of individual user preferences but can learn these preferences through observations of users’ choices over time. Our theoretical and numerical analysis demonstrate that optimizing for engagement signals can lead to significant utility losses. Instead, we propose a utility-aware policy that initially recommends a mix of popular and niche content. We show that such a policy substantially improves utility despite not measuring it. As the platform becomes more forward-looking, our utility-aware policy achieves the best of both worlds: near-optimal user utility and near-optimal engagement simultaneously. Our study elucidates an important feature of recommendation systems; given the ability to suggest multiple items, one can perform significant exploration without incurring significant reductions in short term engagement. By recommending high-risk, high-reward items alongside popular items, systems can enhance discovery of high utility items without significantly affecting engagement.

Full text in ACM Digital Library
RESThe Role of Unknown Interactions in Implicit Matrix Factorization — A Probabilistic View
by Joey De Pauw (University of Antwerp) and Bart Goethals (University of Antwerp)

Matrix factorization is a well-known and effective methodology for top-k list recommendation. It became widely known during the Netflix challenge in 2006, and since then, many adapted and improved versions have been published. A particularly interesting matrix factorization algorithm called iALS (for implicit Alternating Least Squares) adapts the method for implicit feedback, i.e. a setting where only a very small amount of positive labels are available along with a majority of unknown labels. Compared to the classical task of rating prediction, learning from implicit feedback is applicable to many more domains, as the data is more abundant and requires less effort to elicit from users. However, the sparsity, imbalance, and implicit nature of the signal also pose unique challenges to retrieving the most relevant items to recommend.

We revisit the role of unknown interactions in implicit matrix factorization. Traditionally, all unknowns are interpreted as negative samples and their importance in the training objective is then down-weighted to balance them out with the known, positive interactions. Interestingly, by adapting a probabilistic view of matrix factorization, we can retain the unknown nature of these interactions by modelling them as either positive or negative. With this new formulation that better fits the underlying data, we gain improved performance on the downstream recommendation task without any computational overhead compared to the popular iALS method.

This paper outlines the key insights needed to adapt iALS to use logistic regression. Furthermore, a logistic version of the popular full-rank EASE model is introduced in a similar fasion. An extensive experimental evaluation on several real-world datasets demonstrates the effectiveness of our approach. Additionally, a discrepancy between the need for weighting between factorization and autoencoder models is discovered, leading towards a better understanding of these methods.

Full text in ACM Digital Library
RESTouch the Core: Exploring Task Dependence Among Hybrid Targets for Recommendation
by Xing Tang (Tencent), Yang Qiao (Tencent), Fuyuan Lyu (McGill University), Dugang Liu (Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)) and Xiuqiang He (Tencent)

As user behaviors become complicated on business platforms, online recommendations focus more on how to touch the core conversions, which are highly related to the interests of platforms. These core conversions are usually continuous targets, such as watch time, revenue, and so on, whose predictions can be enhanced by previous discrete conversion actions. Therefore, multi-task learning (MTL) can be adopted as the paradigm to learn these hybrid targets. However, existing works mainly emphasize investigating the sequential dependence among discrete conversion actions, which neglects the complexity of dependence between discrete conversions and the final continuous conversion. Moreover, simultaneously optimizing hybrid tasks with stronger task dependence will suffer from volatile issues where the core regression task might have a larger influence on other tasks. In this paper, we study the MTL problem with hybrid targets for the first time and propose the model named Hybrid Targets Learning Network (HTLNet) to explore task dependence and enhance optimization. Specifically, we introduce label embedding for each task to explicitly transfer the label information among these tasks, which can effectively explore logical task dependence. We also further design the gradient adjustment regime between the final regression task and other classification tasks to enhance the optimization. Extensive experiments on two offline public datasets and one real-world industrial dataset are conducted to validate the effectiveness of HTLNet. Moreover, online A/B tests on the financial recommender system also show that our model has improved significantly. Our implementation is available here.

Full text in ACM Digital Library
RESTowards Empathetic Conversational Recommender Systems
by Xiaoyu Zhang (Shandong University), Ruobing Xie (Tencent), Yougang Lyu (Shandong University; University of Amsterdam), Xin Xin (Shandong University), Pengjie Ren (Shandong University), Mingfei Liang (Tencent), Bo Zhang (Tencent), Zhanhui Kang (Tencent), Maarten de Rijke (University of Amsterdam) and Zhaochun Ren (Leiden University)

Conversational recommender systems (CRSs) are able to elicit user preferences through multi-turn dialogues. They typically incorporate external knowledge and pre-trained language models to capture the dialogue context. Most CRS approaches, trained on benchmark datasets, assume that the standard items and responses in these benchmarks are optimal. However, they overlook that users may express negative emotions with the standard items and may not feel emotionally engaged by the standard responses. This issue leads to a tendency to replicate the logic of recommenders in the dataset instead of aligning with user needs. To remedy this misalignment, we introduce empathy within a CRS. With empathy we refer to a system’s ability to capture and express emotions. We propose an empathetic conversational recommender (ECR) framework.

ECR contains two main modules: emotion-aware item recommendation and emotion-aligned response generation. Specifically, we employ user emotions to refine user preference modeling for accurate recommendations. To generate human-like emotional responses, ECR applies retrieval-augmented prompts to fine-tune a pre-trained language model aligning with emotions and mitigating hallucination. To address the challenge of insufficient supervision labels, we enlarge our empathetic data using emotion labels annotated by large language models and emotional reviews collected from external resources. We propose novel evaluation metrics to capture user satisfaction in real-world CRS scenarios. Our experiments on the ReDial dataset validate the efficacy of our framework in enhancing recommendation accuracy and improving user satisfaction.

Full text in ACM Digital Library
RESTowards Open-World Recommendation with Knowledge Augmentation from Large Language Models
by Yunjia Xi (Shanghai Jiao Tong University), Weiwen Liu (Huawei Noah’s Ark Lab), Jianghao Lin (Shanghai Jiao Tong University), Xiaoling Cai (Huawei), Hong Zhu (Huawei), Jieming Zhu (Huawei Noah’s Ark Lab), Bo Chen (Huawei Noah’s Ark Lab), Ruiming Tang (Huawei Noah’s Ark Lab), Weinan Zhang (Shanghai Jiao Tong University) and Yong Yu (Shanghai Jiao Tong University)

Recommender system plays a vital role in various online services. However, its insulated nature of training and deploying separately within a specific closed domain limits its access to open-world knowledge. Recently, the emergence of large language models (LLMs) has shown promise in bridging this gap by encoding extensive world knowledge and demonstrating reasoning capabilities. Nevertheless, previous attempts to directly use LLMs as recommenders cannot meet the inference latency demand of industrial recommender systems. In this work, we propose an Open-World Knowledge Augmented Recommendation Framework with Large Language Models, dubbed KAR, to acquire two types of external knowledge from LLMs — the reasoning knowledge on user preferences and the factual knowledge on items. We introduce factorization prompting to elicit accurate reasoning on user preferences. The generated reasoning and factual knowledge are effectively transformed and condensed into augmented vectors by a hybrid-expert adaptor in order to be compatible with the recommendation task. The obtained vectors can then be directly used to enhance the performance of any recommendation model. We also ensure efficient inference by preprocessing and prestoring the knowledge from the LLM. Extensive experiments show that KAR significantly outperforms the state-of-the-art baselines and is compatible with a wide range of recommendation algorithms. We deploy KAR to Huawei’s news and music recommendation platforms and gain a 7% and 1.7% improvement in the online A/B test, respectively.

Full text in ACM Digital Library
RESTransformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation
by Viet-Anh Tran (Deezer Research), Guillaume Salha-Galvan (Deezer Research), Bruno Sguerra (Deezer Research) and Romain Hennequin (Deezer Research)

Music streaming services often leverage sequential recommender systems to predict the best music to showcase to users based on past sequences of listening sessions. Nonetheless, most sequential recommendation methods ignore or insufficiently account for repetitive behaviors. This is a crucial limitation for music recommendation, as repeatedly listening to the same song over time is a common phenomenon that can even change the way users perceive this song. In this paper, we introduce PISA (Psychology-Informed Session embedding using ACT-R), a session-level sequential recommender system that overcomes this limitation. PISA employs a Transformer architecture learning embedding representations of listening sessions and users using attention mechanisms inspired by Anderson’s ACT-R (Adaptive Control of Thought-Rational), a cognitive architecture modeling human information access and memory dynamics. This approach enables us to capture dynamic and repetitive patterns from user behaviors, allowing us to effectively predict the songs they will listen to in subsequent sessions, whether they are repeated or new ones. We demonstrate the empirical relevance of PISA using both publicly available listening data from Last.fm and proprietary data from Deezer, a global music streaming service, confirming the critical importance of repetition modeling for sequential listening session recommendation. Along with this paper, we publicly release our proprietary dataset to foster future research in this field, as well as the source code of PISA to facilitate its future use.

Full text in ACM Digital Library
RESUnified Denoising Training for Recommendation
by Haoyan Chua (Nanyang Technological University), Yingpeng Du (Nanyang Technological University), Zhu Sun (Singapore University of Technology and Design), Ziyan Wang (Nanyang Technological University), Jie Zhang (Nanyang Technological University) and Yew-Soon Ong (Nanyang Technological University)

Most existing denoising recommendation methods alleviate noisy implicit feedback (user behaviors) through mainly empirical studies. However, such studies may lack theoretical explainability and fail to model comprehensive noise patterns, which hinders the understanding and capturing of different noise patterns that affect users’ behaviors. Thus, we propose to capture comprehensive noise patterns through theoretical and empirical analysis for more effective denoising, where users’ behaviors are divided into willingness and action phases to disentangle independent noise patterns. Willingness refers to the user’s intent to interact with an item, which may not lead to actual interaction due to different factors such as misclicking. Action denotes the user’s actual interaction with an item. Our analysis unveils that (1) in the willingness phase, high uncertainty in the user’s willingness to interact with the item can lead to high expectation loss which aligns with the findings of existing denoising methods; and (2) in the action phase, higher user-specific inconsistency between willingness and action not only leads to more noise in the user’s overall behaviors but also makes it harder to distinguish between true and noisy behaviors. Inspired by these findings, we propose a Unified Denoising Training (UDT) method for recommendation. To alleviate uncertainty in the willingness phase, we lower the importance of the user-item interaction with high willingness uncertainty recognized by high loss. To ease the inconsistency in the action phase, we lower the importance for users with high user-specific inconsistency as it may lead to noisier behaviors. Then, we increase the importance gap between the clean and noisy behaviors for users with low user-specific inconsistency as their behaviors are more distinguishable. Extensive experiments on three real-world datasets show that our proposed UDT outperforms state-of-the-art denoising recommendation methods.

Full text in ACM Digital Library
RESUnleashing the Retrieval Potential of Large Language Models in Conversational Recommender Systems
by Ting Yang (Hong Kong Baptist University) and Li Chen (Hong Kong Baptist University)

Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through interactive natural language interaction. The recent advent of large language models (LLMs) has revolutionized human engagement in natural conversation, driven by their extensive world knowledge and remarkable natural language understanding and generation capabilities. However, introducing LLMs into CRSs presents new technical challenges. Directly prompting LLMs for recommendation generation requires understanding a large and evolving item corpus, as well as grounding the generated recommendations in the real item space. On the other hand, generating recommendations based on external recommendation engines or directly integrating their suggestions into responses may constrain the overall performance of LLMs, since these engines generally have inferior representation abilities compared to LLMs. To address these challenges, we propose an end-to-end large-scale CRS model, named as ReFICR, a novel LLM-enhanced conversational recommender that empowers a retrievable large language model to perform conversational recommendation by following retrieval and generation instructions through lightweight tuning. By decomposing the complex CRS task into multiple subtasks, we formulate these subtasks into two types of instruction formats: retrieval and generation. The hidden states of ReFICR are utilized for generating text embeddings for retrieval, and simultaneously ReFICR is fine-tuned to handle generation subtasks. We optimize the contrastive objective to enhance text embeddings for retrieval and jointly fine-tune the large language model objective for generation. Our experimental results on public datasets demonstrate that ReFICR significantly outperforms baselines in terms of recommendation accuracy and response quality. Our code is publicly available at the link: https://github.com/yt556677/ReFICR.

Full text in ACM Digital Library
RESUnlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data
by Yuhan Zhao (Harbin Engineering University), Rui Chen (Harbin Engineering University), Qilong Han (Harbin Engineering University), Hongtao Song (Harbin Engineering University) and Li Chen (Hong Kong Baptist University)

Collaborative filtering (CF) stands as a cornerstone in recommender systems, yet effectively leveraging the massive unlabeled data presents a significant challenge. Current research focuses on addressing the challenge of unlabeled data by extracting a subset that closely approximates negative samples. Regrettably, the remaining data are overlooked, failing to fully integrate this valuable information into the construction of user preferences. To address this gap, we introduce a novel positive-neutral-negative (PNN) learning paradigm. PNN introduces a neutral class, encompassing intricate items that are challenging to categorize directly as positive or negative samples. By training a model based on this triple-wise partial ranking, PNN offers a promising solution to learning complex user preferences. Through theoretical analysis, we connect PNN to one-way partial AUC (OPAUC) to validate its efficacy. Implementing the PNN paradigm is, however, technically challenging because: (1) it is difficult to classify unlabeled data into neutral or negative in the absence of supervised signals; (2) there does not exist any loss function that can handle set-level triple-wise ranking relationships. To address these challenges, we propose a semi-supervised learning method coupled with a user-aware attention model for knowledge acquisition and classification refinement. Additionally, a novel loss function with a two-step centroid ranking approach enables handling set-level rankings. Extensive experiments on four real-world datasets demonstrate that, when combined with PNN, a wide range of representative CF models can consistently and significantly boost their performance. Even with a simple matrix factorization, PNN can achieve comparable performance to sophisticated graph neutral networks. Our code is publicly available at https://github.com/Asa9aoTK/PNN-RecBole.

Full text in ACM Digital Library
RESUtilizing Non-click Samples via Semi-supervised Learning for Conversion Rate Prediction
by Jiahui Huang (University of Science and Technology of China), Lan Zhang (University of Science and Technology of China), Junhao Wang (University of Science and Technology of China), Shanyang Jiang (University of Science and Technology of China), Dongbo Huang (Tencent), Cheng Ding (Tencent) and Lan Xu (Tencent)

Conversion rate (CVR) prediction is essential in recommender systems, facilitating precise matching between recommended items and users’ preferences. However, the sample selection bias (SSB) and data sparsity (DS) issues pose challenges to accurate prediction. Existing works have proposed the click-through and conversion rate (CTCVR) prediction task which models samples from exposure to “click and conversion” in entire space and incorporates multi-task learning. This approach has shown efficacy in mitigating these challenges. Nevertheless, it intensifies the false negative sample (FNS) problem. To be more specific, the CTCVR task implicitly treats all the CVR labels of non-click samples as negative, overlooking the possibility that some samples might convert if clicked. This oversight can negatively impact CVR model performance, as empirical analysis has confirmed. To this end, we advocate for discarding the CTCVR task and proposing a Non-click samples Improved Semi-supErvised (NISE) method for conversion rate prediction, where the non-click samples are treated as unlabeled. Our approach aims to predict their probabilities of conversion if clicked, utilizing these predictions as pseudo-labels for further model training. This strategy can help alleviate the FNS problem, and direct modeling of the CVR task across the entire space also mitigates the SSB and DS challenges. Additionally, we conduct multi-task learning by introducing an auxiliary click-through rate prediction task, thereby enhancing embedding layer representations. Our approach is applicable to various multi-task architectures. Comprehensive experiments are conducted on both public and production datasets, demonstrating the superiority of our proposed method in mitigating the FNS challenge and improving the CVR estimation. The implementation code is available at https://github.com/Hjh233/NISE.

Full text in ACM Digital Library

List of all short papers accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.

RESA Dataset for Adapting Recommender Systems to the Fashion Rental Economy
by Karl Audun Kagnes Borgersen (Universitetet i Agder), Morten Goodwin (University of Agder), Morten Grundetjern (Universitetet i Agder) and Jivitesh Sharma (University of Agder)

In response to the escalating ecological challenges that threaten global sustainability, there’s a need to investigate alternative methods of commerce, such as rental economies. Like most online commerce, rental or otherwise, a functioning recommender system is crucial for their success. Yet the domain has, until this point, been largely neglected by the recommender system research community.

Our dataset, derived from our collaboration with the leading Norwegian fashion rental company Vibrent, encompasses 77.1k transactions, rental histories from 7.4k anonymized users, and 15.6k unique outfits in which each physical item’s attributes and rental history is meticulously tracked. All outfits are listed as individual items or their corresponding item groups, referring to shared designs between the individual items. This notation underlines the novel challenges of rental as compared to more traditional recommender system problems where items are generally interchangeable. For example, an RS for rental items requires tracking each physical item to ensure it isn’t rented for the same time period to several different customers, as compared to retail, in which tracking or recommending individual items is largely unnecessary. Each outfit is accompanied by a set of tags describing some of their attributes. We also provide a total of 50.1k images displaying across all items, along with a set of precomputed zero-shot embeddings.

We apply a myriad of common recommender system methods to the dataset to provide a performance baseline. This baseline is calculated for both the traditional fashion recommender system problem of recommending outfit groups and the novel problem of predicting individual item rental. To our knowledge, this is the first published article to directly discuss fashion rental recommender systems, as well as the first published dataset intended for this purpose. We hope that the publication of this dataset will serve as a catalyst for a new branch of research for specialized fashion rental recommender systems.

The dataset has been made freely available at https://www.kaggle.com/datasets/kaborg15/vibrent-clothes-rental-dataset

All code associated with the project have been made available at:https://github.com/cair/Vibrent_Clothes_Rental_Dataset_Collection

Full text in ACM Digital Library
RESBetter Generalization with Semantic IDs: A Case Study in Ranking for Recommendations
by Anima Singh (Google), Trung Vu (Google), Nikhil Mehta (Google DeepMind), Raghunandan Keshavan (Google), Maheswaran Sathiamoorthy (Google DeepMind), Yilin Zheng (Google), Lichan Hong (Google DeepMind), Lukasz Heldt (Google), Li Wei (Google), Devansh Tandon (Google), Ed Chi (Google DeepMind) and Xinyang Yi (Google DeepMind)

Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs, a compact and discrete item representation, as a replacement for random item ids. Semantic IDs are learned from frozen content embeddings using RQ-VAE and thus can capture the hierarchy of concepts in items. Similar to content embeddings, the compactness of Semantic IDs poses a problem of adaption in recommendation models. We propose novel methods for adapting Semantic IDs in industry-scale ranking models, through hashing sub-pieces of of the Semantic-ID sequences. In particular, we find that the SentencePiece model that is commonly used in LLM tokenization outperforms manually crafted pieces such as N-grams. To the end, we evaluate our approaches in a real-world ranking model for YouTube recommendations. Our experiments demonstrate that Semantic IDs can replace the direct use of video IDs by improving the generalization ability on new and long-tail item slices without sacrificing overall model quality.

Full text in ACM Digital Library
RESCalibrating the Predictions for Top-N Recommendations
by Masahiro Sato (FUJIFILM)

Well-calibrated predictions of user preferences are essential for many applications. Since recommender systems typically select the top-N items for users, calibration for those top-N items, rather than for all items, is important. We show that previous calibration methods result in miscalibrated predictions for the top-N items, despite their excellent calibration performance when evaluated on all items. In this work, we address the miscalibration in the top-N recommended items. We first define evaluation metrics for this objective and then propose a generic method to optimize calibration models focusing on the top-N items. It groups the top-N items by their ranks and optimizes distinct calibration models for each group with rank-dependent training weights. We verify the effectiveness of the proposed method for both explicit and implicit feedback datasets, using diverse classes of recommender models.

Full text in ACM Digital Library
RESCan editorial decisions impair journal recommendations? Analysing the impact of journal characteristics on recommendation systems
by Elias Entrup (TIB Leibniz Information Centre for Science and Technology), Ralph Ewerth (TIB Leibniz Information Centre for Science and Technology) and Anett Hoppe (TIB Leibniz Information Centre for Science and Technology)

Recommendation services for journals help scientists choose appropriate publication venues for their research results. They often use a semantic matching process to compare e.g. an abstract against already published articles. As these services can guide a researcher’s decision, their fairness and neutrality are critical qualities. However, the impact of journal characteristics (such as the abstract length) on recommendations is understudied. In this paper, we investigate whether editorial journal characteristics can lead to biased rankings from recommendation services, i.e. if editorial choices can systematically lead to a better ranking of one’s own journal. The performed experiments show that longer abstracts or a higher number of articles per journal can boost the rank of a journal in the recommendations. We apply these insights to an active, open-source journal recommendation system. The adaptation of the algorithm leads to an increased accuracy for smaller journals.

Full text in ACM Digital Library
RESCAPRI-FAIR: Integration of Multi-sided Fairness in Contextual POI Recommendation Framework
by Francis Zac Dela Cruz (University of New South Wales), Flora D. Salim (University of New South Wales), Yonchanok Khaokaew (University of New South Wales) and Jeffrey Chan (RMIT University)

Point-of-interest (POI) recommendation considers spatio-temporal factors like distance, peak hours, and user check-ins. Given their influence on both consumer experience and POI business, it’s crucial to consider fairness from multiple perspectives. Unfortunately, these systems often provide less accurate recommendations to inactive users and less exposure to unpopular POIs. This paper develops a post-filter method that includes provider and consumer fairness in existing models, aiming to balance fairness metrics like item exposure with performance metrics such as precision and distance. Experiments show that a linear scoring model for provider fairness in re-scoring items offers the best balance between performance and long-tail exposure, sometimes without much precision loss. Addressing consumer fairness by recommending more popular POIs to inactive users increased precision in some models and datasets. However, combinations that reached the Pareto front of consumer and provider fairness resulted in the lowest precision values, highlighting that tradeoffs depend greatly on the model and dataset.

Full text in ACM Digital Library
RESComparative Analysis of Pretrained Audio Representations in Music Recommender Systems
by Yan-Martin Tamm (University of Tartu) and Anna Aljanaki (University of Tartu)

Over the years, Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network learning over these models. Our research addresses this gap and evaluates the applicability of six pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in the context of MRS. We assess their performance using three recommendation models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our findings suggest that pretrained audio representations exhibit significant performance variability between traditional MIR tasks and MRS, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.

Full text in ACM Digital Library
RESCoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation
by Jieming Zhu (Huawei Noah’s Ark Lab), Mengqun Jin (Tsinghua University), Qijiong Liu (The HK PolyU), Zexuan Qiu (The Chinese University of Hong Kong), Zhenhua Dong (Huawei Noah’s Ark Lab) and Xiu Li (Tsinghua University)

Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic tokens to index items, while the second stage autoregressively generates semantic tokens of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually employs a vector quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic tokens of items, but this method fails to capture the essential neighborhood relationships that are vital for effective item modeling in recommender systems. In this paper, we propose a contrastive quantization-based semantic tokenization approach, named CoST, which harnesses both item relationships and semantic information to learn semantic tokens. Our experimental results highlight the significant impact of semantic tokenization on generative recommendation performance, with CoST achieving up to a 43% improvement in Recall@5 and 44% improvement in NDCG@5 on the MIND dataset over previous baselines.

Full text in ACM Digital Library
RESData Augmentation using Reverse Prompt for Cost-Efficient Cold-Start Recommendation
by Genki Kusano (NEC)

Recommendation systems that use auxiliary information such as product names and categories have been proposed to address the cold-start problem. However, these methods do not perform well when we only have insufficient warm-start training data. On the other hand, large language models (LLMs) can perform as effective cold-start recommendation systems even with limited warm-start data. However, they require numerous API calls for inferences, which leads to high operational costs in terms of time and money. This is a significant concern in industrial applications. In this paper, we introduce a new method, RevAug, which leverages LLMs as a data augmentation to enhance cost-efficient cold-start recommendation systems. To generate pseudo-samples, we have reversed the commonly used prompt for an LLM from “Would this user like this item?” to “What kind of items would this user like?”. Generated outputs by this reverse prompt are pseudo-auxiliary information utilized to enhance recommendation systems in the training phase. In numerical experiments with four real-world datasets, RevAug demonstrated superior performance in cold-start settings with limited warm-start data compared to existing methods. Moreover, RevAug significantly reduced API fees and processing time compared to an LLM-based recommendation method.

Full text in ACM Digital Library
RESDo Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce
by Yuan Wang (Alibaba Group), Zhiyu Li (Alibaba Group), Changshuo Zhang (Renmin University of China), Sirui Chen (Renmin University of China), Xiao Zhang (Renmin University of China), Jun Xu (Renmin University of China) and Quan Lin (Alibaba Group)

Recommender systems have been widely used in e-commerce, and re-ranking models are playing an increasingly significant role in the domain, which leverages the inter-item influence and determines the final recommendation lists. Online learning methods keep updating a deployed model with the latest available samples to capture the shifting of the underlying data distribution in e-commerce. However, they depend on the availability of real user feedback, which may be delayed by hours or even days, such as item purchases, leading to a lag in model enhancement. In this paper, we propose a novel extension of online learning methods for re-ranking modeling, which we term LAST, an acronym for Learning At Serving Time. It circumvents the requirement of user feedback by using a surrogate model to provide the instructional signal needed to steer model improvement. Upon receiving an online request, LAST finds and applies a model modification on the fly before generating a recommendation result for the request. The modification is request-specific and transient. It means the modification is tailored to and only to the current request to capture the specific context of the request. After a request, the modification is discarded, which helps to prevent error propagation and stabilizes the online learning procedure since the predictions of the surrogate model may be inaccurate. Most importantly, as a complement to feedback-based online learning methods, LAST can be seamlessly integrated into existing online learning systems to create a more adaptive and responsive recommendation experience. Comprehensive experiments, both offline and online, affirm that LAST outperforms state-of-the-art re-ranking models.

Full text in ACM Digital Library
RESDoes It Look Sequential? An Analysis of Datasets for Evaluation of Sequential Recommendations
by Anton Klenitskiy (Sber AI Lab), Anna Volodkevich (Sber AI Lab), Anton Pembek (Lomonosov Moscow State University (MSU)) and Alexey Vasilev (Sber AI Lab)

Sequential recommender systems are an important and demanded area of research. Such systems aim to use the order of interactions in a user’s history to predict future interactions. The premise is that the order of interactions and sequential patterns play an essential role. Therefore, it is crucial to use datasets that exhibit a sequential structure to evaluate sequential recommenders properly.

We apply several methods based on the random shuffling of the user’s sequence of interactions to assess the strength of sequential structure across 15 datasets, frequently used for sequential recommender systems evaluation in recent research papers presented at top-tier conferences. As shuffling explicitly breaks sequential dependencies inherent in datasets, we estimate the strength of sequential patterns by comparing metrics for shuffled and original versions of the dataset. Our findings show that several popular datasets have a rather weak sequential structure.

Full text in ACM Digital Library
RESEfficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items
by Aleksandr Vladimirovich Petrov (University of Glasgow), Craig Macdonald (University of Glasgow) and Nicola Tonellotto (University of Pisa)

Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of items: scaling Transformers beyond a few thousand items is problematic for several reasons, including high model memory consumption and slow inference. In this respect, RecJPQ is a state-of-the-art method of reducing the models’ memory consumption; RecJPQ compresses item catalogues by decomposing item IDs into a small number of shared sub-item IDs. Despite reporting the reduction of memory consumption by a factor of up to 50 ×, the original RecJPQ paper did not report inference efficiency improvements over the baseline Transformer-based models. Upon analysing RecJPQ’s scoring algorithm, we find that its efficiency is limited by its use of score accumulators for each item, which prevents parallelisation. In contrast, LightRec (a non-sequential method that uses a similar idea of sub-ids) reported large inference efficiency improvements using an algorithm we call PQTopK. We show that it is also possible to improve RecJPQ-based models’ inference efficiency using the PQTopK algorithm. In particular, we speed up RecJPQ-enhanced SASRec by a factor of 4.5 × compared to the original SASRec’s inference method and by the factor of 1.56 × compared to the method implemented in RecJPQ code on a large-scale Gowalla dataset with more than million items. Further, using simulated data, we show that PQTopK remains efficient with catalogues of up to tens of millions of items, removing one of the last obstacles to using Transformer-based models in production environments with large catalogues.

Full text in ACM Digital Library
RESEmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations
by Chiyu Zhang (University of British Columbia), Yifei Sun (Meta), Minghao Wu (Monash University), Jun Chen (Meta), Jie Lei (Meta), Muhammad Abdul-Mageed (The University of British Columbia), Rong Jin (Meta), Angli Liu (Meta), Ji Zhu (Meta), Sem Park (Meta), Ning Yao (Meta) and Bo Long (Meta)

Content-based recommendation systems play a crucial role in delivering personalized content to users in the digital world. In this work, we introduce EmbSum, a novel framework that enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. By utilizing the pretrained encoder-decoder model and poly-attention layers, EmbSum derives User Poly-Embedding (UPE) and Content Poly-Embedding (CPE) to calculate relevance scores between users and candidate items. EmbSum actively learns the long user engagement histories by generating user-interest summary with supervision from large language model (LLM). The effectiveness of EmbSum is validated on two datasets from different domains, surpassing state-of-the-art (SoTA) methods with higher accuracy and fewer parameters. Additionally, the model’s ability to generate summaries of user interests serves as a valuable by-product, enhancing its usefulness for personalized content recommendations.

Full text in ACM Digital Library
RESEnhancing Sequential Music Recommendation with Negative Feedback-informed Contrastive Learning
by Pavan Seshadri (Georgia Institute of Technology), Shahrzad Shashaani (TU Wien) and Peter Knees (TU Wien)

Modern music streaming services are heavily based on recommendation engines to serve content to users. Sequential recommendation—continuously providing new items within a single session in a contextually coherent manner—has been an emerging topic in current literature. User feedback—a positive or negative response to the item presented—is used to drive content recommendations by learning user preferences. We extend this idea to session-based recommendation to provide context-coherent music recommendations by modelling negative user feedback, i.e., skips, in the loss function.

We propose a sequence-aware contrastive sub-task to structure item embeddings in session-based music recommendation, such that true next-positive items (ignoring skipped items) are structured closer in the session embedding space, while skipped tracks are structured farther away from all items in the session. This directly affects item rankings using a K-nearest-neighbors search for next-item recommendations, while also promoting the rank of the true next item. Experiments incorporating this task into SoTA methods for sequential item recommendation show consistent performance gains in terms of next-item hit rate, item ranking, and skip down-ranking on three music recommendation datasets, strongly benefiting from the increasing presence of user feedback.

Full text in ACM Digital Library
RESEvaluation and simplification of text difficulty using LLMs in the context of recommending texts in French to facilitate language learning
by Henri Jamet (University of Lausanne), Maxime Manderlier (University of Mons (UMONS)), Yash Raj Shrestha (University of Lausanne) and Michalis Vlachos (University of Lausanne)

Learning a new language can be challenging. To help learners, we built a recommendation system that suggests texts and videos based on the learners’ skill level of the language and topic interests. Our system analyzes content to determine its difficulty and topic, and, if needed, can simplify complex texts while maintaining semantics. Our work explores the holistic use of Large Language Models (LLMs) for the various sub-tasks involved for accurate recommendations: difficulty estimation and simplification, graph recommender engine, topic estimation. We present a comprehensive evaluation comparing zero-shot and fine-tuned LLMs, demonstrating significant improvements in French content difficulty prediction (18−56%), topic prediction accuracy (27%), and recommendation relevance (up to 18% NDCG increase).

Full text in ACM Digital Library
RESFairness Matters: A look at LLM-generated group recommendations
by Antonela Tommasel (CONICET-UNCPBA, ISISTAN)

Recommender systems play a crucial role in how users consume information, with group recommendation receiving considerable attention. Ensuring fairness in group recommender systems entails providing recommendations that are useful and relevant to all group members rather than solely reflecting the majority’s preferences, while also addressing fairness concerns related to sensitive attributes (e.g., gender). Recently, the advancements on Large Language Models (LLMs) have enabled the development of new kinds of recommender systems. However, LLMs can perpetuate social biases present in training data, posing risks of unfair outcomes and harmful impacts. We investigated LLMs impact on group recommendation fairness, establishing and instantiating a framework that encompasses group definition, sensitive attribute combinations, and evaluation methodology. Our findings revealed the interaction patterns between sensitive attributes and LLMs and how they affected recommendation. This study advances the understanding of fairness considerations in group recommendation systems, laying the groundwork for future research.

Full text in ACM Digital Library
RESGLAMOR: Graph-based LAnguage MOdel embedding for citation Recommendation
by Zafar Ali (Southeast University), Guilin Qi (Southeast University), Irfan Ullah (Shaheed Benazir Bhutto University), Adam A. Q. Mohammed (Southeast University), Pavlos Kefalas (Aristotle University of Thessaloniki) and Khan Muhammad (Sungkyunkwan University)

Digital publishing’s exponential growth has created vast scholarly collections. Guiding researchers to relevant resources is crucial, and knowledge graphs (KGs) are key tools for unlocking hidden knowledge. However, current methods focus on external links between concepts, ignoring the rich information within individual papers. Challenges like insufficient multi-relational data, name ambiguity, and cold-start issues further limit existing KG-based methods, failing to capture the intricate attributes of diverse entities. To solve these issues, we propose GLAMOR, a robust KG framework encompassing entities e.g., authors, papers, fields of study, and concepts, along with their semantic interconnections. GLAMOR uses a novel random walk-based KG text generation method and then fine-tunes the language model using the generated text. Subsequently, the acquired context-preserving embeddings facilitate superior top@k predictions. Evaluation results on two public benchmark datasets demonstrate our GLAMOR’s superiority against state-of-the-art methods especially in solving the cold-start problem.

Full text in ACM Digital Library
RESIt’s (not) all about that CTR: A Multi-Stakeholder Perspective on News Recommender Metrics
by Hanne Vandenbroucke (imec-SMIT Vrije Universiteit Brussel) and Annelien Smets (imec-SMIT Vrije Universiteit Brussel)

Recommender systems are increasingly used by news media organizations. Existing literature examines various aspects of news recommender systems (NRS) from a computational, user-centric, or normative perspective. Yet research advocates studying the complexities of real-world applications around NRS. Recently, a multi-stakeholder approach to NRS has been adopted, allowing to understand different stakeholder perspectives on NRS development and evaluation within the news organization. However, little research has been done on the different key performance indicators (KPIs) and metrics considered valuable by different stakeholders. Based on 11 interviews with professionals from two commercial news publishers, this paper demonstrates that stakeholders prioritize distinct KPIs and metrics related to the reach-engagement-conversion-retention funnel. The evaluation of NRS performance is often limited to short-term metrics like CTR, overlooking the multiplicity of stakeholders involved. Our findings reveal how different purposes, KPIs, and metrics are valued from the journalistic, commercial, and tech logic. In doing so, this paper contributes to the multi-stakeholder approach to NRS, advancing our understanding of the real-world complexity of NRS development and evaluation.

Full text in ACM Digital Library
RESIt’s Not You, It’s Me: The Impact of Choice Models and Ranking Strategies on Gender Imbalance in Music Recommendation
by Andres Ferraro (Pandora/SiriusXM), Michael D. Ekstrand (Drexel University) and Christine Bauer (Paris Lodron University Salzburg)

As recommender systems are prone to various biases, mitigation approaches are needed to ensure that recommendations are fair to various stakeholders. One particular concern in music recommendation is artist gender fairness. Recent work has shown that the gender imbalance in the sector translates to the output of music recommender systems, creating a feedback loop that can reinforce gender biases over time.

In this work, we examine that feedback loop to study whether algorithmic strategies or user behavior are a greater contributor to ongoing improvement (or loss) in fairness as models are repeatedly re-trained on new user feedback data. We simulate user interaction and re-training to investigate the effects of ranking strategies and user choice models on gender fairness metrics. We find re-ranking strategies have a greater effect than user choice models on recommendation fairness over time.

Full text in ACM Digital Library
RESKnowledge-Enhanced Multi-Behaviour Contrastive Learning for Effective Recommendation
by Zeyuan Meng (University of Glasgow), Zixuan Yi (University of Glasgow) and Iadh Ounis (University of Glasgow)

Real-world recommendation scenarios usually need to handle diverse user-item interaction behaviours, including page views, adding items into carts, and purchasing activities. The interactions that precede the actual target behaviour (e.g., purchasing an item) allow to capture the user’s preferences from different angles, and are used as auxiliary information (e.g., page views) to enrich the system’s knowledge about the users’ preferences, thereby helping to enhance recommendation for the target behaviour. Despite efforts in modelling the users’ multi-behaviour interaction information, the existing multi-behaviour recommenders still face two challenges: (1) Data sparsity across multiple user behaviours is a common issue that limits the recommendation performance, particularly for the target behaviour, which typically exhibits fewer interactions compared to other auxiliary behaviours; (2) Noisy auxiliary interactive behaviours where the information in the auxiliary behaviours might be non-relevant for recommendation. In this case, a direct adoption of contrastive learning between the target behaviour and the auxiliary behaviours will amplify the noise in the auxiliary behaviours, thereby negatively impacting the real semantics that can be derived from the target behaviour. To address these two challenges, we propose a new model called Knowledge-Enhanced Multi-behaviour Contrastive Learning for Recommendation (KEMCL). In particular, to address the problem of sparse user multi-behaviour interaction information, we leverage a dual-perspective knowledge encoding componentthat enriches the semantic representations of items, and generate supervision signals through self-supervised learning so as to enhance recommendation. In addition, we develop a cross-behaviour learning component, which includes two contrastive learning (CL) methods, inter CL and intra CL, to alleviate the problem of noisy auxiliary interactions. Extensive experiments on three public recommendation datasets show that our proposed KEMCL model significantly outperforms seven existing state-of-the-art methods. In particular, our KEMCL model outperforms the best baseline, namely KMCLR, by 5.42% on the large Tmall dataset.

Full text in ACM Digital Library
RESLearned Ranking Function: From Short-term Behavior Predictions to Long-term User Satisfaction
by Yi Wu (Google), Daryl Chang (Google), Jennifer She (Google), Zhe Zhao (Google), Li Wei (Google) and Lukasz Heldt (Google)

We present the Learned Ranking Function (LRF), a system that takes short-term user-item behavior predictions as input and outputs a slate of recommendations that directly optimizes for long-term user satisfaction. Most previous work is based on optimizing the hyperparameters of a heuristic function. We propose to model the problem directly as a slate optimization problem with the objective of maximizing long-term user satisfaction. We also develop a novel constraint optimization algorithm that stabilizes objective tradeoffs for multi-objective optimization. We evaluate our approach with live experiments and describe its deployment on YouTube.

Full text in ACM Digital Library
RESLLMs for User Interest Exploration in Large-scale Recommendation Systems
by Jianling Wang (Google DeepMind), Haokai Lu (Google DeepMind), Yifan Liu (Google), He Ma (Google), Yueqi Wang (Google), Yang Gu (Google), Shuzhou Zhang (Google), Ningren Han (Google), Shuchao Bi (Google), Lexi Baugher (Google), Ed H. Chi (Google DeepMind) and Minmin Chen (Google DeepMind)

Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing between the LLMs and the classic recommendation models through “interest clusters”, the granularity of which can be explicitly determined by algorithm designers. It recommends the next novel interests by first representing “interest clusters” using language, and employs a fine-tuned LLM to generate novel interest descriptions that are strictly within these predefined clusters. At the low level, it grounds these generated interests to an item-level policy by restricting classic recommendation models, in this case a transformer-based sequence recommender to return items that fall within the novel clusters generated at the high level. We showcase the efficacy of this approach on an industrial-scale commercial platform serving billions of users. Live experiments show a significant increase in both exploration of novel interests and overall user enjoyment of the platform.

Full text in ACM Digital Library
RESMAWI Rec: Leveraging Severe Weather Data in Recommendation
by Brendan Andrew Duncan (UC San Diego), Surya Kallumadi (Lowe’s Companies, Inc.), Taylor Berg-Kirkpatrick (UC San Diego) and Julian Mcauley (University of California San Diego)

Inferring user intent in recommender systems can help performance but is difficult because intent is personal and not directly observable. Previous work has leveraged signals to stand as a proxy for intent (e.g. user interactions with resource pages), but such signals are not always available. In this paper, we instead recognize that certain events, which are observable, directly influence user intent. For example, after a flood, home improvement customers are more likely to undertake a renovation project to dry out their basement. We introduce MAWI Rec, a recommender system that leverages severe weather data to improve recommendation. Our weather-aware system achieves a significant improvement over a state-of-the-art baseline for online and in-store datasets of home improvement customers. This gain is most significant for weather-related product categories such as roof panels and flashings.

Full text in ACM Digital Library
RESMODEM: Decoupling User Behavior for Shared-Account Video Recommendations on Large Screen Devices
by Jiang Li (University of Science and Technology of China), Zhen Zhang (Kuaishou Technology Co., Ltd.), Xiang Feng (Kuaishou Technology Co., Ltd.), Muyang Li (Kuaishou Technology Co., Ltd.), Yongqi Liu (Kuaishou Technology Co., Ltd.) and Lantao Hu (Kuaishou Technology Co., Ltd.)

In scenarios involving sequence recommendations on large screen devices, such as tablets or TVs, the equipment is often shared among multiple users. This sharing leads to a mixture of behaviors from different users, posing significant challenges to recommendation systems, especially when clear supervisory signals for distinguishing among users are absent. Current solutions tend to either operate in an unsupervised manner or rely on constructed supervisory signals that are not entirely reliable. Moreover, the peculiarities of short video recommendations in this context have not been thoroughly explored in existing research. In response to these challenges, this paper introduces Multi-User Contrastive Decoupling Model (MODEM), a novel short video recommendation model specifically designed for large screen devices. MODEM leverages an attention mechanism, grounded in session segmentation, to disentangle the intertwined user behavior histories. It also discriminates between the impacts of long and short viewing behaviors on short video recommendations by cross-analyzing sequences of both. Furthermore, we have developed a contrastive learning method to oversee the decoupling of user behaviors effectively. Our evaluations demonstrate noticeable improvements through both offline assessments within public datasets and online A/B testing within Kuaishou’s short video recommendation environment on large screen devices. Specifically, our online A/B tests resulted in a 0.55% increase in watch time. These results underscore MODEM’s efficacy in enhancing recommendation quality in shared account contexts.

Full text in ACM Digital Library
RESMulti-Behavioral Sequential Recommendation
by Shereen Elsayed (University of Hildesheim), Ahmed Rashed (Volkswagen Financial Services AG) and Lars Schmidt-Thieme (University of Hildesheim)

Sequential recommendation models are crucial for next-item prediction tasks in various online platforms, yet many focus on a single behavior, neglecting valuable implicit interactions. While multi-behavioral models address this using graph-based approaches, they often fail to capture sequential patterns simultaneously. Our proposed Multi-Behavioral Sequential Recommendation framework (MBSRec) captures the multi-behavior dependencies between the heterogeneous historical interactions via multi-head self-attention. Furthermore, we utilize a weighted binary cross-entropy loss for precise behavior control. Experimental results on four datasets demonstrate MBSRec’s significant outperformance of state-of-the-art approaches. The implementation code is available here.

Full text in ACM Digital Library
RESNeighborhood-Based Collaborative Filtering for Conversational Recommendation
by Zhouhang Xie (University of California San Diego), Junda Wu (University of California San Diego), Hyunsik Jeon (University of California San Diego), Zhankui He (University of California San Diego), Harald Steck (Netflix Inc.), Rahul Jha (Netflix Inc.), Dawen Liang (Netflix Inc.), Nathan Kallus (Cornell University) and Julian Mcauley (University of California San Diego)

Conversational recommender systems (CRS) should understand users’ expressed interests, which are frequently semantically rich and knowledge-intensive. Prior works attempt to address this challenge by using external knowledge bases or parametric knowledge in large language models (LLMs). In this paper, we study a complementary solution, exploiting item knowledge in the training data. We hypothesize that many inference-time user requests can be answered by reusing popular crowd-written answers associated with similar training queries. Following this intuition, we define a class of neighborhood-based CRS that makes recommendations by identifying items commonly associated with similar training dialogue contexts. Experiments on Inspired, Redial, and Reddit-Movie benchmarks show our method outperforms state-of-the-art LLMs with 2 billion parameters, and offers on-par performance to 7 billion parameter models while using over 170 times less GPU memory. We also show neighborhood and model-based predictions can be combined to achieve further performance improvements.

Full text in ACM Digital Library
RESOh, Behave! Country Representation Dynamics Created by Feedback Loops in Music Recommender Systems
by Oleg Lesota (Johannes Kepler University Linz and Linz Institute of Technology), Jonas Geiger (Johannes Kepler University Linz and Linz Institute of Technology), Max Walder (Johannes Kepler University Linz and Linz Institute of Technology), Dominik Kowald (Know-Center GmbH and TU Graz) and Markus Schedl (Johannes Kepler University Linz and Linz Institute of Technology)

Recent work suggests that music recommender systems are prone to disproportionally frequent recommendations of music from countries more prominently represented in the training data, notably the US. However, it remains unclear to what extent feedback loops in music recommendation influence the dynamics of such imbalance. In this work, we investigate the dynamics of representation of local (i.e., country-specific) and US-produced music in user profiles and recommendations. To this end, we conduct a feedback loop simulation study using the LFM-2b dataset. The results suggest that most of the investigated recommendation models decrease the proportion of music from local artists in their recommendations. Furthermore, we find that models preserving average proportions of US and local music do not necessarily provide country-calibrated recommendations. We also look into popularity calibration and, surprisingly, find that the most popularity-calibrated model in our study (ItemKNN) provides the least country-calibrated recommendations. In addition, users from less represented countries (e.g., Finland) are, in the long term, most affected by the under-representation of their local music in recommendations.

Full text in ACM Digital Library
RESOn Interpretability of Linear Autoencoders
by Martin Spišák (Recombee), Radek Bartyzal (GLAMI), Antonín Hoskovec (GLAMI; Czech Technical University in Prague) and Ladislav Peška (Charles University)

We derive a novel graph-based interpretation of linear autoencoder models ease r, slim, and their approximate variants. Contrary to popular belief, we reveal that the weights of these models should not be interpreted as dichotomic item similarity but merely as its magnitude. Consequently, we propose a simple modification that considerably improves retrieval ability in sparse domains and yields interpretable inference with negative inputs, as demonstrated by both offline and online experiments. Experiment codes and extended results are available at https://osf.io/bjmuv/.

Full text in ACM Digital Library
RESOne-class recommendation systems with the hinge pairwise distance loss and orthogonal representations
by Ramin Raziperchikolaei (Rakuten Group, Inc.) and Young-joo Chung (Rakuten Group, Inc.)

In one-class recommendation systems, the goal is to learn a model from a small set of interacted users and items and then identify the positively-related (i.e., similar) user-item pairs among a large number of pairs with unknown interactions. Most loss functions in the literature rely on dissimilar pairs of users and items, which are selected from the ones with unknown interactions, to obtain better prediction performance. The main issue with this strategy is that it needs a large number of dissimilar pairs, which increases the training time significantly. In this paper, our goal is to only use the similar set to train the models and discard the dissimilar set. We highlight three trivial solutions that the recommendation system models converge to when they are trained only on similar pairs: collapsed and dimensional collapsed solutions. We propose a hinge pairwise loss and an orthogonality term that can be added to the objective functions in the literature to avoid these trivial solutions. We conduct experiments on various tasks on public and real-world datasets, which show that our approach using only similar pairs can be trained several times faster than the state-of-the-art methods while achieving competitive results.

Full text in ACM Digital Library
RESPay Attention to Attention for Sequential Recommendation
by Yuli Liu (Qinghai University), Min Liu (Qinghai University) and Xiaojing Liu (Qinghai University)

Transformer-based approaches have demonstrated remarkable success in various sequence-based tasks. However, traditional self-attention models may not sufficiently capture the intricate dependencies within items in sequential recommendation scenarios. This is due to the lack of explicit emphasis on attention weights, which play a critical role in allocating attention and understanding item-to-item correlations. To better exploit the potential of attention weights and improve the capability of sequential recommendation in learning high-order dependencies, we propose a novel sequential recommendation (SR) approach called attention weight refinement (AWRSR). AWRSR enhances the effectiveness of self-attention by additionally paying attention to attention weights, allowing for more refined attention distributions of correlations among items. We conduct comprehensive experiments on multiple real-world datasets, demonstrating that our approach consistently outperforms state-of-the-art SR models. Moreover, we provide a thorough analysis of AWRSR’s effectiveness in capturing higher-level dependencies. These findings suggest that AWRSR offers a promising new direction for enhancing the performance of self-attention architecture in SR tasks, with potential applications in other sequence-based problems as well.

Full text in ACM Digital Library
RESPositive-Sum Impact of Multistakeholder Recommender Systems for Urban Tourism Promotion and User Utility
by Pavel Merinov (Free University of Bozen-Bolzano) and Francesco Ricci (Free University of Bozen-Bolzano)

When a multistakeholder recommender system (MRS) is designed to produce sustainable urban tourism promotion, two conflicting goals are of practical interest: (i) to cut down the number of visitors at popular sites and (ii) to satisfy tourists’ preferences, often biased towards popular sites. By modelling the tourists’ limited knowledge of the visited city — an important but often overlooked detail — we simulate interactions between tourists and an MRS that jointly optimises tourist’s utility and promotes less popular sites. Experiments based on data logs collected in three tourist cities reveal that such an MRS can lift tourist’s utility and at the same time reduce the number of visitors at popular sites, manifesting a so-called positive-sum impact. However, a delicate balance is crucial; under- or over-promotion of unpopular sites in the recommendation lists can be adverse to both destination and tourist’s utility.

Full text in ACM Digital Library
RESPromoting Two-sided Fairness with Adaptive Weights for Providers and Customers in Recommendation
by Lanling Xu (Renmin University of China), Zihan Lin (KuaiShou Inc.), Jinpeng Wang (Meituan Group), Sheng Chen (Meituan Group), Wayne Xin Zhao (Renmin University of China) and Ji-Rong Wen (Renmin University of China)

At present, most recommender systems involve two stakeholders, providers and customers. Apart from maximizing the recommendation accuracy, the fairness issue for both sides should also be considered. Most of previous studies try to improve two-sided fairness with post-processing algorithms or fairness-aware loss constraints, which are highly dependent on the heuristic adjustments without respect to the optimization goal of accuracy. In contrast, we propose a novel training framework, adaptive weighting towards two-sided fairness-aware recommendation (named Ada2Fair), which lies in the extension of the accuracy-focused objective to a controllable preference learning loss over the interaction data. Specifically, we adjust the optimization scale of an interaction sample with an adaptive weight generator, and estimate the two-sided fairness-aware weights within model training. During the training process, the recommender is trained with two-sided fairness-aware weights to boost the utility of niche providers and inactive customers in a unified way. Extensive experiments on three public datasets verify the effectiveness of Ada2Fair, which can achieve Pareto efficiency in two-sided fairness-aware recommendation.

Full text in ACM Digital Library
RESRecommending Healthy and Sustainable Meals exploiting Food Retrieval and Large Language Models
by Alessandro Petruzzelli (University of Bari Aldo Moro), Cataldo Musto (University of Bari Aldo Moro), Michele Ciro Di Carlo (University of Bari Aldo Moro), Giovanni Tempesta (University of Bari Aldo Moro) and Giovanni Semeraro (University of Bari Aldo Moro)

Given the rising global concerns about healthy nutrition and environmental sustainability, individuals need more and more support in making good choices concerning their daily meals. To this end, in this paper we introduce HeaSE, a framework for Healthy And Sustainable Eating. Given an input recipe, HeaSE identifies healthier and more sustainable meals by exploiting retrieval techniques and large language models. The framework works in two steps. First, it uses food retrieval strategies based on macro-nutrient information to identify candidate alternative meals. This ensures that the substitutions maintain a similar nutritional profile. Next, HeaSE employs large language models to re-rank these potential replacements while considering factors beyond just nutrition, such as the recipe’s environmental impact. In the experimental evaluation, we showed the capabilities of LLMs in identifying more sustainable and healthier alternatives within a set of candidate options. This highlights the potential of these models to guide users towards food choices that are both nutritious and environmentally responsible.

Full text in ACM Digital Library
RESRecommending Personalised Targeted Training Adjustments for Marathon Runners
by Ciara Feely (University College Dublin), Brian Caulfield (University College Dublin), Aonghus Lawlor (University College Dublin) and Barry Smyth (University College Dublin)

Preparing for the marathon involves many weeks of dedicated training. Achieving the right balance between building strength and endurance and the need for rest and recovery is a must, if a runner is to arrive at the start-line injury-free and ready to achieve their desired finish-time. However, because most recreational runners rely on generic training plans, they can struggle to find this balance, which can impact their motivation, health, and performance. In this paper, we describe a novel case-based reasoning approach to fine-tuning a runner’s training by recommending training adjustments based on the patterns of similar runners at corresponding points in their marathon training. The approach is designed to target training adjustments that are based on similar runners but with varying race goals, to allow runners to adjust their training for slower or faster finish-times, as their training progresses and motivations change. We evaluate the recommendations produced using a large-scale real-world dataset according to several factors: (i) the plausibility of the recommended training adjustment, (ii) the effectiveness of the adjustment when it comes to achieving a particular performance goal, and (iii) the safety of the adjustment in terms of the degree of risk that it will lead to an injury or otherwise disrupt training. Our findings suggest that plausible, effective, and safe recommendations can be generated for runners when evaluated against a range of race goals.

Full text in ACM Digital Library
RESRevisiting LightGCN: Unexpected Inflexibility, Inconsistency, and A Remedy Towards Improved Recommendation
by Geon Lee (KAIST), Kyungho Kim (KAIST) and Kijung Shin (KAIST)

Graph Neural Networks (GNNs) have emerged as effective tools in recommender systems. Among various GNN models, LightGCN is distinguished by its simplicity and outstanding performance. Its efficiency has led to widespread adoption across different domains, including social, bundle, and multimedia recommendations. In this paper, we thoroughly examine the mechanisms of LightGCN, focusing on its strategies for scaling embeddings, aggregating neighbors, and pooling embeddings across layers. Our analysis reveals that, contrary to expectations based on its design, LightGCN suffers from inflexibility and inconsistency when applied to real-world data.

We introduce LightGCN++, an enhanced version of LightGCN designed to address the identified limitations. LightGCN++ incorporates flexible scaling of embedding norms and neighbor weighting, along with a tailored approach for pooling layer-wise embeddings to resolve the identified inconsistencies. Despite its remarkably simple remedy, extensive experimental results demonstrate that LightGCN++ significantly outperforms LightGCN, achieving an improvement of up to 17.81% in terms of NDCG@20. Furthermore, state-of-the-art models utilizing LightGCN as a backbone for item, bundle, multimedia, and knowledge-graph-based recommendations exhibit improved performance when equipped with LightGCN++.

Full text in ACM Digital Library
RESSelf-Attentive Sequential Recommendations with Hyperbolic Representations
by Evgeny Frolov (AIRI), Tatyana Matveeva (HSE University), Leyla Mirvakhabova (Skolkovo Institute of Science and Technology) and Ivan Oseledets (AIRI)

In recent years, self-attentive sequential learning models have surpassed conventional collaborative filtering techniques in next-item recommendation tasks. However, Euclidean geometry utilized in these models may not be optimal for capturing a complex structure of behavioral data. Building on recent advances in the application of hyperbolic geometry to collaborative filtering tasks, we propose a novel approach that leverages hyperbolic geometry in the sequential learning setting. Our approach replaces final output of the Euclidean models with a linear predictor in the non-linear hyperbolic space, which increases the representational capacity and improves recommendation quality.

Full text in ACM Digital Library
RESSocietal Sorting as a Systemic Risk of Recommenders
by Luke Thorburn (King’s College London), Maria Polukarov (King’s College London) and Carmine Ventre (King’s College London)

Political scientists distinguish between polarization (loosely, people moving further apart along a single dimension) and sorting (an increase in the probabilistic dependence between multiple dimensions of individual difference). Among other harms, sorting can increase the risk of conflict escalation by reinforcing us-and-them group identities and reducing the prevalence of cross-cutting affiliations. In this paper, we (i) review normative arguments for high or low sortedness, (ii) summarize the mechanisms by which sortedness can change, and (iii) show that under a simple model of social media recommender-driven preference change, personalized engagement-based ranking creates a systematic tendency towards sorting, while ranking by diverse engagement (sometimes called “bridging-based ranking”) mitigates this tendency. We conclude by considering the implications for those conducting systemic risk assessments of very large online platforms under the EU Digital Services Act.

Full text in ACM Digital Library
RESThe MovieLens Beliefs Dataset: Collecting Pre-Choice Data for Online Recommender Systems
by Guy Aridor (Northwestern University), Duarte Goncalves (University College London), Ruoyan Kong (University of Minnesota), Daniel Kluver (University of Minnesota) and Joseph Konstan (University of Minnesota)

An increasingly important aspect of designing recommender systems involves considering how recommendations will influence consumer choices. This paper addresses this issue by introducing a method for collecting user beliefs about un-experienced goods – a critical predictor of choice behavior. We implemented this method on the MovieLens platform, resulting in a rich dataset that combines user ratings, beliefs, and observed recommendations. We document challenges to such data collection, including selection bias in response and limited coverage of the product space. This unique resource empowers researchers to delve deeper into user behavior and analyze user choices absent recommendations, measure the effectiveness of recommendations, and prototype algorithms that leverage user belief data, ultimately leading to more impactful recommender systems. The dataset can be found at https://grouplens.org/datasets/movielens/ml_belief_2024/.

Full text in ACM Digital Library
RESTowards Green Recommender Systems: Investigating the Impact of Data Reduction on Carbon Footprint and Algorithm Performances
by Giuseppe Spillo (University of Bari Aldo Moro), Allegra De Filippo (DISI Università di Bologna), Cataldo Musto (University of Bari Aldo Moro), Michela Milano (DISI Università di Bologna) and Giovanni Semeraro (University of Bari Aldo Moro)

This work investigates the path toward green recommender systems by examining the impact of data reduction on both model performance and carbon footprint. In the pursuit of developing energy-efficient recommender systems, we investigated whether and how reducing the training data impacts the performances of several representative recommendation models. In order to obtain a fair comparison, all the models were run based on the implementations available in a popular recommendation library, i.e., RecBole, and used the same experimental settings. Results indicate that: (a) data reduction can be a promising strategy to make recommender systems more sustainable, at the cost of a lower accuracy; (b) training recommender systems with less data makes the suggestions more diverse and less biased. Overall, this study contributes to the ongoing discourse on the development of recommendation models that meet the principles of SDGs, laying the groundwork for the adoption of more sustainable practices in the field.

Full text in ACM Digital Library
RESΔ-OPE: Off-Policy Estimation with Pairs of Policies
by Olivier Jeunen (ShareChat) and Aleksei Ustimenko (ShareChat)

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning procedures that directly optimise online success. Nevertheless, the high variance that comes with unbiasedness is typically the crux that complicates practical applications. An important insight is that the difference between policy values can often be estimated with significantly reduced variance, if said policies have positive covariance. This allows us to formulate a pairwise off-policy estimation task: Δ-OPE.

Δ-OPE subsumes the common use-case of estimating improvements of a learnt policy over a production policy, using data collected by a stochastic logging policy. We introduce Δ-OPE methods based on the widely used Inverse Propensity Scoring estimator and its extensions. Moreover, we characterise a variance-optimal additive control variate that further enhances efficiency. Simulated, offline, and online experiments show that our methods significantly improve performance for both evaluation and learning tasks.

Full text in ACM Digital Library

List of all reproducibility papers accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.

REPRA Comparative Analysis of Text-Based Explainable Recommender Systems
by Alejandro Ariza-Casabona (University of Barcelona), Ludovico Boratto (University of Cagliari) and Maria Salamó (University of Barcelona)

One way to increase trust among users towards recommender systems is to provide the recommendation along with a textual explanation. In the literature, extraction-based, generation-based, and, more recently, hybrid solutions based on retrieval-augmented generation have been proposed to tackle the problem of text-based explainable recommendation. However, the use of different datasets, preprocessing steps, target explanations, baselines, and evaluation metrics complicates the reproducibility and state-of-the-art assessment of previous work among different model categories for successful advancements in the field. Our aim is to provide a comprehensive analysis of text-based explainable recommender systems by setting up a well-defined benchmark that accommodates generation-based, extraction-based, and hybrid approaches. Also, we enrich the existing evaluation of explainability and text quality of the explanations with a novel definition of feature hallucination. Our experiments on three real-world datasets unveil hidden behaviors and confirm several claims about model patterns. Our source code and preprocessed datasets are available at https://github.com/alarca94/text-exp-recsys24.

Full text in ACM Digital Library
REPRA Novel Evaluation Perspective on GNNs-based Recommender Systems through the Topology of the User-Item Graph
by Daniele Malitesta (Université Paris-Saclay, CentraleSupélec, Inria), Claudio Pomo (Politecnico di Bari), Vito Walter Anelli (Politecnico di Bari), Alberto Carlo Maria Mancino (Politecnico di Bari), Tommaso Di Noia (Politecnico di Bari) and Eugenio Di Sciascio (Politecnico di Bari)

Recently, graph neural networks (GNNs)-based recommender systems have encountered great success in recommendation. As the number of GNNs approaches rises, some works have started questioning the theoretical and empirical reasons behind their superior performance. Nevertheless, this investigation still disregards that GNNs treat the recommendation data as a topological graph structure. Building on this assumption, in this work, we provide a novel evaluation perspective on GNNs-based recommendation, which investigates the impact of the graph topology on the recommendation performance. To this end, we select some (topological) properties of the recommendation data and three GNNs-based recommender systems (i.e., LightGCN, DGCF, and SVD-GCN). Then, starting from three popular recommendation datasets (i.e., Yelp2018, Gowalla, and Amazon-Book) we sample them to obtain 1,800 size-reduced datasets that still resemble the original ones but can encompass a wider range of topological structures. We use this procedure to build a large pool of samples for which data characteristics and recommendation performance of the selected GNNs models are measured. Through an explanatory framework, we find strong correspondences between graph topology and GNNs performance, offering a novel evaluation perspective on these models.

Full text in ACM Digital Library
REPRAMBAR: A dataset for Assessing Multiple Beyond-Accuracy Recommenders
by Elizabeth Gómez (Universitat de Barcelona), David Contreras (Universidad Arturo Prat), Ludovico Boratto (University of Cagliari) and Maria Salamo (Universitat de Barcelona)

Nowadays a recommendation model should exploit additional information from both the user and item perspectives, in addition to utilizing user-item interaction data. Datasets are central in offering the required information for evaluating new models or algorithms. Although there are many datasets in the literature with user and item properties, there are several issues not covered yet: (i) it is difficult to perform cross-analysis of properties at user and item level as they are not related in most cases; and (ii) on top of that, in many occasions datasets do not allow analysis at different granularity levels. In this paper, we propose a new dataset in the music domain, named AMBAR, that tackles the above-mentioned issues. Besides detailing in depth the structure of the new dataset, we also show its application in contexts (i.e., multi-objective, fair, and calibrated recommendations) where both the effectiveness and the beyond-accuracy perspectives of recommendation are assessed.

Full text in ACM Digital Library
REPRContext-based Entity Recommendation for Knowledge Workers: Establishing a Benchmark on Real-life Data
by Mahta Bakhshizadeh (German Research Center for Artificial Intelligence (DFKI); University of Kaiserslautern-Landau (RPTU)), Heiko Maus (German Research Center for Artificial Intelligence (DFKI)) and Andreas Dengel (German Research Center for Artificial Intelligence (DFKI); University of Kaiserslautern-Landau (RPTU))

In recent decades, Recommender Systems (RS) have undergone significant advancements, particularly in popular domains like movies, music, and product recommendations. Yet, progress has been notably slower in leveraging these systems for personal information management and knowledge assistance. In addition to challenges that complicate the adoption of RS in this domain (such as privacy concerns, heterogeneous recommendation items, and frequent context switching), a significant barrier to progress in this area has been the absence of a standardized benchmark for researchers to evaluate their approaches. In response to this gap, this paper presents a benchmark built upon a publicly available dataset of Real-Life Knowledge Work in Context (RLKWiC). This benchmark focuses on evaluating context-based entity recommendation, a use case for leveraging RS to support knowledge workers in their daily digital tasks. By providing this benchmark, it is aimed to facilitate and accelerate research efforts in enhancing personal knowledge assistance through RS.

Full text in ACM Digital Library
REPRDo Recommender Systems Promote Local Music? A Reproducibility Study Using Music Streaming Data
by Kristina Matrosova (Geographie-Cités), Lilian Marey (Télécom Paris), Guillaume Salha-Galvan (Deezer Research), Thomas Louail (Geographie-Cités), Olivier Bodini (Université Sorbonne Paris Nord) and Manuel Moussallam (Deezer Research)

This paper examines the influence of recommender systems on local music representation, discussing prior findings from an empirical study on the LFM-2b public dataset. This prior study argued that different recommender systems exhibit algorithmic biases shifting music consumption either towards or against local content. However, LFM-2b users do not reflect the diverse audience of music streaming services. To assess the robustness of this study’s conclusions, we conduct a comparative analysis using proprietary listening data from a global music streaming service, which we publicly release alongside this paper. We observe significant differences in local music consumption patterns between our dataset and LFM-2b, suggesting that caution should be exercised when drawing conclusions on local music based solely on LFM-2b. Moreover, we show that the algorithmic biases exhibited in the original work vary in our dataset, and that several unexplored model parameters can significantly influence these biases and affect the study’s conclusion on both datasets. Finally, we discuss the complexity of accurately labeling local music, emphasizing the risk of misleading conclusions due to unreliable, biased, or incomplete labels. To encourage further research and ensure reproducibility, we have publicly shared our dataset and code.

Full text in ACM Digital Library
REPRFair Augmentation for Graph Collaborative Filtering
by Ludovico Boratto (University of Cagliari), Francesco Fabbri (Spotify), Gianni Fenu (University of Cagliari), Mirko Marras (University of Cagliari) and Giacomo Medda (University of Cagliari)

Recent developments in recommendation have harnessed the collaborative power of graph neural networks (GNNs) in learning users’ preferences from user-item networks. Despite emerging regulations addressing fairness of automated systems, unfairness issues in graph collaborative filtering remain underexplored, especially from the consumer’s perspective. Despite numerous contributions on consumer unfairness, only a few of these works have delved into GNNs. A notable gap exists in the formalization of the latest mitigation algorithms, as well as in their effectiveness and reliability on cutting-edge models. This paper serves as a solid response to recent research highlighting unfairness issues in graph collaborative filtering by reproducing one of the latest mitigation methods. The reproduced technique adjusts the system fairness level by learning a fair graph augmentation. Under an experimental setup based on 11 GNNs, 5 non-GNN models, and 5 real-world networks across diverse domains, our investigation reveals that fair graph augmentation is consistently effective on high-utility models and large datasets. Experiments on the transferability of the fair augmented graph open new issues for future recommendation studies. Source code: https://github.com/jackmedda/FA4GCF.

Full text in ACM Digital Library
REPRFrom Clicks to Carbon: The Environmental Toll of Recommender Systems
by Tobias Vente (University of Siegen), Lukas Wegmeth (University of Siegen), Alan Said (University of Gothenburg) and Joeran Beel (University of Siegen)

As global warming soars, the need to assess the environmental impact of research is becoming increasingly urgent. Despite this, few recommender systems research papers address their environmental impact. In this study, we estimate the environmental impact of recommender systems research by reproducing typical experimental pipelines. Our analysis spans 79 full papers from the 2013 and 2023 ACM RecSys conferences, comparing traditional “good old-fashioned AI’’ algorithms with modern deep learning algorithms. We designed and reproduced representative experimental pipelines for both years, measuring energy consumption with a hardware energy meter and converting it to CO2 equivalents. Our results show that papers using deep learning algorithms emit approximately 42 times more CO2 equivalents than papers using traditional methods. On average, a single deep learning-based paper generates 3,297 kilograms of CO2 equivalents—more than the carbon emissions of one person flying from New York City to Melbourne or the amount of CO2 one tree sequesters over 300 years.

Full text in ACM Digital Library
REPRInformfully – Research Platform for Reproducible User Studies
by Lucien Heitz (University of Zurich), Julian Andrea Croci (University of Zurich), Madhav Sachdeva (University of Zurich) and Abraham Bernstein (University of Zurich)

This paper presents Informfully, a research platform for content distribution and user studies. Informfully allows to push algorithmically curated text, image, audio, and video content to users and automatically generates a detailed log of their consumption history. As such, it serves as an open-source platform for conducting user experiments to investigate the impact of item recommendations on users’ consumption behavior. The platform was designed to accommodate different experiment types through versatility, ease of use, and scalability. It features three core components: 1) a front end for displaying and interacting with recommended items, 2) a back end for researchers to create and maintain user experiments, and 3) a simple JSON-based exchange format for ranked item recommendations to interface with third-party frameworks. We provide a system overview and outline the three core components of the platform. A sample workflow is shown for conducting field studies incorporating multiple user groups, personalizing recommendations, and measuring the effect of algorithms on user engagement. We present evidence for the versatility, ease of use, and scalability of Informfully by showcasing previous studies that used our platform.

Full text in ACM Digital Library
REPRLarge Language Models as Evaluators for Recommendation Explanations
by Xiaoyu Zhang (Tsinghua University), Yishan Li (Tsinghua University), Jiayin Wang (Tsinghua Univeristy), Bowen Sun (Tsinghua Univeristy), Weizhi Ma (Tsinghua University), Peijie Sun (Tsinghua University) and Min Zhang (Tsinghua University)

The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification, information extraction), as they perform strong capabilities in instruction following and common-sense reasoning. However, evaluating recommendation explanatory texts is different from these NLG tasks, as its criteria are related to human perceptions and are usually subjective.

In this paper, we investigate whether LLMs can serve as evaluators of recommendation explanations. To answer the question, we utilize real user feedback on explanations given from previous work and additionally collect third-party annotations and LLM evaluations. We design and apply a 3-level meta-evaluation strategy to measure the correlation between evaluator labels and the ground truth provided by users. Our experiments reveal that LLMs, such as GPT4, can provide comparable evaluations with appropriate prompts and settings. We also provide further insights into combining human labels with the LLM evaluation process and utilizing ensembles of multiple heterogeneous LLM evaluators to enhance the accuracy and stability of evaluations. Our study verifies that utilizing LLMs as evaluators can be an accurate, reproducible and cost-effective solution for evaluating recommendation explanation texts. Our code is available here.

Full text in ACM Digital Library
REPROne-class Matrix Factorization: Point-Wise Regression-Based or Pair-Wise Ranking-Based?
by Sheng-Wei Chen (National Taiwan University) and Chih-Jen Lin (National Taiwan University)

One-class matrix factorization (MF) is an important technique for recommender systems with implicit feedback. In one widely used setting, a regression function is fit in a point-wise manner on observed and some unobserved (user, item) entries. Recently, in AAAI 2019, Chen et al. [2] proposed a pair-wise ranking-based approach for observed (user, item) entries to be compared against unobserved ones. They concluded that the pair-wise setting performs consistently better than the more traditional point-wise setting. However, after some detailed investigation, we explain by mathematical derivations that their method may perform only similar to the point-wise ones. We also identified some problems when reproducing their experimental results. After considering suitable settings, we rigorously compare point-wise and pair-wise one-class MFs, and show that the pair-wise method is actually not better. Therefore, for one-class MF, the more traditional and mature point-wise setting should still be considered. Our findings contradict the conclusions in [2] and serve as a call for caution when researchers are comparing between two machine learning methods.

Full text in ACM Digital Library
REPRReChorus2.0: A Modular and Task-Flexible Recommendation Library
by Jiayu Li (Tsinghua University), Hanyu Li (Tsinghua University), Zhiyu He (Tsinghua University), Weizhi Ma (Tsinghua University), Peijie Sun (Tsinghua University), Min Zhang (Tsinghua University) and Shaoping Ma (Tsinghua University)

With the applications of recommendation systems rapidly expanding, an increasing number of studies have focused on every aspect of recommender systems with different data inputs, models, and task settings. Therefore, a flexible library is needed to help researchers implement the experimental strategies they require. Existing open libraries for recommendation scenarios have enabled reproducing various recommendation methods and provided standard implementations. However, these libraries often impose certain restrictions on data and seldom support the same model to perform different tasks and input formats, limiting users from customized explorations. To fill the gap, we propose ReChorus2.0, a modular and task-flexible library for recommendation researchers. Based on ReChorus, we upgrade the supported input formats, models, and training&evaluation strategies to help realize more recommendation tasks with more data types. The main contributions of ReChorus2.0 include: (1) Realization of complex and practical tasks, including re-ranking and CTR prediction tasks; (2) Inclusion of various context-aware and re-ranking recommenders; (3) Extension of existing and new models to support different tasks with the same models; (4) Support of highly-customized input with impression logs, negative items, or click labels, as well as user, item, and situation contexts. To summarize, ReChorus2.0 serves as a comprehensive and flexible library that better addresses the practical problems in the recommendation scenario and caters to more diverse research needs. The implementation and detailed tutorials of ReChorus2.0 can be found at https://github.com/THUwangcy/ReChorus.

Full text in ACM Digital Library
REPRReproducibility and Analysis of Scientific Dataset Recommendation Methods
by Ornella Irrera (University of Padua), Matteo Lissandrini (University of Verona), Daniele Dell’Aglio (Aalborg University) and Gianmaria Silvello (University of Padua)

Datasets play a central role in scholarly communications. However, scholarly graphs are often incomplete, particularly due to the lack of connections between publications and datasets. Therefore, the importance of dataset recommendation—identifying relevant datasets for a scientific paper, an author, or a textual query—is increasing. Although various methods have been proposed for this task, their reproducibility remains unexplored, making it difficult to compare them with new approaches. We reviewed current recommendation methods for scientific datasets, focusing on the most recent and competitive approaches, including an SVM-based model, a bi-encoder retriever, a method leveraging co-authors and citation network embeddings, and a heterogeneous variational graph autoencoder. These approaches underwent a comprehensive analysis under consistent experimental conditions. Our reproducibility efforts show that three methods can be reproduced, while the graph variational autoencoder is challenging due to unavailable code and test datasets. Hence, we re-implemented this method and performed a component-based analysis to examine its strengths and limitations. Furthermore, our study indicated that three out of four considered methods produce subpar results when applied to real-world data instead of specialized datasets with ad-hoc features.

Full text in ACM Digital Library
REPRReproducibility of LLM-based Recommender Systems: the Case Study of P5 Paradigm
by Pasquale Lops (University of Bari Aldo Moro), Antonio Silletti (University of Bari Aldo Moro), Marco Polignano (University of Bari Aldo Moro), Cataldo Musto (University of Bari Aldo Moro) and Giovanni Semeraro (University of Bari Aldo Moro)

Recommender systems can significantly benefit from the availability of pre-trained large language models (LLMs), which can serve as a basic mechanism for generating recommendations based on detailed user and item data, such as text descriptions, user reviews, and metadata. On the one hand, this new generation of LLM-based recommender systems paves the way for dealing with traditional limitations, such as cold-start and data sparsity. Still, on the other hand, this poses fundamental challenges for their accountability. Reproducing experiments in the new context of LLM-based recommender systems is challenging for several reasons. New approaches are published at an unprecedented pace, which makes difficult to have a clear picture of the main protocols and good practices in the experimental evaluation. Moreover, the lack of proper frameworks for LLM-based recommendation development and evaluation makes the process of benchmarking models complex and uncertain.

In this work, we discuss the main issues encountered when trying to reproduce P5 (Pretrain, Personalized Prompt, and Prediction Paradigm), one of the first works unifying different recommendation tasks in a shared language modeling and natural language generation framework. Starting from this study, we have developed LaikaLLM, a framework for training and evaluating LLMs, specifically for the recommendation task. It has been used to perform several experiments to assess the impact of using different LLMs, different personalization strategies, and a novel set of more informative prompts on the overall performance of recommendations in a fully reproducible environment.

Full text in ACM Digital Library
REPRRevisiting BPR: A Replicability Study of a Common Recommender System Baseline
by Aleksandr Milogradskii (National Research University Higher School of Economics; TBank), Oleg Lashinin (Moscow Institute of Physics and Technology; TBank), Alexander P (Independent Researcher), Marina Ananyeva (National Research University Higher School of Economics; TBank) and Sergey Kolesnikov (TBank)

Bayesian Personalized Ranking (BPR), a collaborative filtering approach based on matrix factorization, frequently serves as a benchmark for recommender systems research. However, numerous studies often overlook the nuances of BPR implementation, claiming that it performs worse than newly proposed methods across various tasks. In this paper, we thoroughly examine the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. Furthermore, through extensive experiments on real-world datasets under modern evaluation settings, we demonstrate that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets. Specifically, on the Million Song Dataset, the BPR model with hyperparameters tuning statistically significantly outperforms Mult-VAE by 10% in NDCG@100 with binary relevance function.

Full text in ACM Digital Library

List of all Late-Breaking Result (LBR) papers accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.

LBRAre We Explaining the Same Recommenders? Incorporating Recommender Performance for Evaluating Explainers
by Amir Reza Mohammadi (University of Innsbruck), Andreas Peintner (University of Innsbruck), Michael Müller (University of Innsbruck) and Eva Zangerle (University of Innsbruck)

Explainability in recommender systems is both crucial and challenging. Among the state-of-the-art explanation strategies, counterfactual explanation provides intuitive and easily understandable insights into model predictions by illustrating how a small change in the input can lead to a different outcome. Recently, this approach has garnered significant attention, with various studies employing different metrics to evaluate the performance of these explanation methods. In this paper, we investigate the metrics used for evaluating counterfactual explainers for recommender systems. Through extensive experiments, we demonstrate that the performance of recommenders has a direct effect on counterfactual explainers and ignoring it results in inconsistencies in the evaluation results of explainer methods. Our findings highlight an additional challenge in evaluating counterfactual explainer methods and underscore the need to report the recommender performance or consider it in evaluation metrics.

Full text in ACM Digital Library
LBRBalancing Habit Repetition and New Activity Exploration: A Longitudinal Micro-Randomized Trial in Physical Activity Recommendations
by Ine Coppens (WAVES – imec – Ghent University), Toon De Pessemier (WAVES – imec – Ghent University) and Luc Martens (WAVES – imec – Ghent University)

As repetition of activities can establish habits and exploration of new ones can provide a healthy variety, we investigate how a recommender system for physical activities can optimally balance these two approaches. We conducted an eight-week user study with 62 physically inactive participants who receive personalized repetition and exploration recommendations in a random order. We distinguish between location, workout, and general activities, and collect participants’ subjective perceptions. Our findings indicate that participants initially preferred exploring general activities, but rated repeating recommendations higher after two weeks. By exploring the optimal transition point from exploration to repetition in personalized recommendations, this study contributes to designing more effective recommender systems for health improvement and healthy habit formation.

Full text in ACM Digital Library
LBRbeeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems
by Vojtěch Vančura (Czech Technical University), Pavel Kordík (Czech Technical University) and Milan Straka (Charles University)

Recommender systems often use text-side information to improve their predictions, especially in cold-start or zero-shot recommendation scenarios, where traditional collaborative filtering approaches cannot be used. Many approaches to text-mining side information for recommender systems have been proposed over recent years, with sentence Transformers being the most prominent one. However, these models are trained to predict semantic similarity without utilizing interaction data with hidden patterns specific to recommender systems. In this paper, we propose beeFormer, a framework for training sentence Transformer models with interaction data. We demonstrate that our models trained with beeFormer can transfer knowledge between datasets while outperforming not only semantic similarity sentence Transformers but also traditional collaborative filtering methods. We also show that training on multiple datasets from different domains accumulates knowledge in a single model, unlocking the possibility of training universal, domain-agnostic sentence Transformer models to mine text representations for recommender systems. We release the source code, trained models, and additional details allowing replication of our experiments at https://github.com/recombee/beeformer.

Full text in ACM Digital Library
LBRDemocratizing Urban Mobility Through an Open-Source, Multi-Criteria Route Recommendation System
by Alexander Eggerth (ETH Zurich), Javier Argota Sánchez-Vaquerizo (ETH Zurich), Dirk Helbing (ETH Zurich) and Sachit Mahajan (ETH Zurich)

Urban navigation systems traditionally optimize for efficiency, thereby overlooking environmental factors and personal preferences. This paper introduces Routify, a novel multi-criteria recommender system for personalized urban route selection. Routify advances the state-of-the-art in route recommendation through: (1) integration of diverse environmental data (air quality, noise levels, green spaces) with real-time IoT inputs; (2) a dynamic weighting algorithm allowing fine-grained user control over routing parameters; (3) transparent decision-making via an interactive interface displaying edge-level information; and (4) a community-driven feedback mechanism that continuously refines recommendations. We have implemented Routify as an open-source platform, facilitating further research and development. Experimental results from 1000 origin-destination pairs demonstrate significant improvements in user-defined environmental metrics (up to 32.98% increase in green index for walking and 16.91% reduction in noise levels for cycling) compared to traditional routing systems. These improvements come with slight trade-offs in distance, especially in walking mode. Our research contributes a comprehensive framework for multi-criteria route recommendation, balancing individual preferences, environmental factors, and community insights in urban navigation.

Full text in ACM Digital Library
LBREnhancing Sequential Music Recommendation with Personalized Popularity Awareness
by Davide Abbattista (Politecnico di Bari), Vito Walter Anelli (Politecnico di Bari), Tommaso Di Noia (Politecnico di Bari), Craig Macdonald (University of Glasgow) and Aleksandr Vladimirovich Petrov (University of Glasgow)

In the realm of music recommendation, sequential recommender systems have shown promise in capturing the dynamic nature of music consumption. Nevertheless, traditional Transformer-based models, such as SASRec and BERT4Rec, while effective, encounter challenges due to the unique characteristics of music listening habits. In fact, existing models struggle to create a coherent listening experience due to rapidly evolving preferences. Moreover, music consumption is characterized by a prevalence of repeated listening, i.e. users frequently return to their favourite tracks, an important signal that could be framed as individual or personalized popularity. This paper addresses these challenges by introducing a novel approach that incorporates personalized popularity information into sequential recommendation. By combining user-item popularity scores with model-generated scores, our method effectively balances the exploration of new music with the satisfaction of user preferences. Experimental results demonstrate that a Personalized Most Popular recommender, a method solely based on user-specific popularity, outperforms existing state-of-the-art models. Furthermore, augmenting Transformer-based models with personalized popularity awareness yields superior performance, showing improvements ranging from 25.2% to 69.8%. The code for this paper is available at https://github.com/sisinflab/personalized-popularity-awareness.

Full text in ACM Digital Library
LBRExploratory Analysis of Recommending Urban Parks for Health-Promoting Activities
by Linus W. Dietz (King’s College London), Sanja Šćepanović (Nokia Bell Labs), Ke Zhou (Nokia Bell Labs) and Daniele Quercia (Nokia Bell Labs)

Parks are essential spaces for promoting urban health, and recommender systems could assist individuals in discovering parks for leisure and health-promoting activities. This is particularly important in large cities like London, which has over 1,500 named parks, making it challenging to understand what each park offers. Due to the lack of datasets and the diverse health-promoting activities parks can support (e.g., physical, social, nature-appreciation), it is unclear which recommendation algorithms are best suited for this task. To explore the dynamics of recommending parks for specific activities, we created two datasets: one from a survey of over 250 London residents, and another by inferring visits from over 1 million geotagged Flickr images taken in London parks. Analyzing the geographic patterns of these visits revealed that recommending nearby parks is ineffective, suggesting that this recommendation task is distinct from Point of Interest recommendation. We then tested various recommendation models, identifying a significant popularity bias in the results. Additionally, we found that personalized models have advantages in recommending parks beyond the most popular ones. The data and findings from this study provide a foundation for future research on park recommendations.

Full text in ACM Digital Library
LBRExploring Coresets for Efficient Training and Consistent Evaluation of Recommender Systems
by Zheng Ju (University College Dublin), Honghui Du (University College Dublin), Elias Tragos (University College Dublin), Neil Hurley (University College Dublin) and Aonghus Lawlor (University College Dublin)

Recommender systems have achieved remarkable success in various web applications, such as e-commerce, online advertising, and social media, harnessing the power of big data. To attain optimal model performance, recommender systems are typically trained on very large datasets, with substantial numbers of users and items. However, large datasets often present challenges in terms of processing time and computational resources. Coreset selection offers a method for obtaining a reduced yet representative subset from vast datasets, thereby enhancing the efficiency of training machine learning algorithms. Nevertheless, little research has been conducted to explore the practical implications of different coreset selection approaches on the performance of recommender systems algorithms. In this paper, we systematically investigate the impact of various coreset selection techniques. We evaluate the performance of the resulting coresets using inductive recommendation models which allow for consistent evaluations to be performed. The experimental results demonstrate that coreset methods are a powerful and useful approach for obtaining reduced datasets which preserve the properties of the large original dataset and have competitive performance compared to the time required to train with the full dataset.

Full text in ACM Digital Library
LBRInformed Dataset Selection with ‘Algorithm Performance Spaces’
by Joeran Beel (University of Siegen), Lukas Wegmeth (University of Siegen), Lien Michiels (University of Antwerp) and Steffen Schulz (University of Siegen)

When designing recommender-systems experiments, a key question that has been largely overlooked is the choice of datasets. In a brief survey of ACM RecSys papers, we found that authors typically justified their dataset choices by labelling them as public, benchmark, or ‘real-world’ without further explanation. We propose the Algorithm Performance Space (APS) as a novel method for informed dataset selection. The APS is an n-dimensional space where each dimension represents the performance of a different algorithm. Each dataset is depicted as an n-dimensional vector, with greater distances indicating higher diversity. In our experiment, we ran 29 algorithms on 95 datasets to construct an actual APS. Our findings show that many datasets, including most Amazon datasets, are clustered closely in the APS, i.e. they are not diverse. However, other datasets, such as MovieLens and Docear, are more dispersed. The APS also enables the grouping of datasets based on the solvability of the underlying problem. Datasets in the top right corner of the APS are considered ’solved problems’ because all algorithms perform well on them. Conversely, datasets in the bottom left corner lack well-performing algorithms, making them ideal candidates for new recommender-system research due to the challenges they present.

Full text in ACM Digital Library
LBRIs It Really Complementary? Revisiting Behavior-based Labels for Complementary Recommendation
by Kai Sugahara (The University of Electro-Communications), Chihiro Yamasaki (The University of Electro-Communications) and Kazushi Okamoto (The University of Electro-Communications)

Complementary recommendation is a type of item-to-item recommendation that recommends what should be purchased together for an item. Previous studies have traditionally used behavior-based labels (BBLs) that are constructed from the co-purchase logs of users for training and evaluation because rigorous label construction for complements is inefficient. However, the fact that many item pairs in BBLs are not functionally complementary, even though they are frequently co-purchased, has been overlooked. This study aimed to re-evaluate the validity of BBLs through functional relationships and provide directions for their improvement. Quantitative analysis using manually annotated function-based labels (FBLs) as correct labels revealed that the accuracy of the complementary recommendations generated by BBLs was below 50%, suggesting potential functional incompatibility within BBLs. Existing models that were trained on BBLs were similarly inaccurate, indicating the unreliability of the evaluations in existing studies. Finally, we proposed a label correction method for BBLs using a small set of FBLs, thereby providing a direction for reliable complementary recommendations.

Full text in ACM Digital Library
LBRKGGLM: A Generative Language Model for Generalizable Knowledge Graph Representation Learning in Recommendation
by Giacomo Balloccu (University of Cagliari), Ludovico Boratto (University of Cagliari), Gianni Fenu (University of Cagliari), Mirko Marras (University of Cagliari) and Alessandro Soccol (University of Cagliari)

Current recommendation methods based on knowledge graphs rely on entity and relation representations for several steps along the pipeline, with knowledge completion and path reasoning being the most influential. Despite their similarities, the most effective representation methods for these steps differ, leading to inefficiencies, limited representativeness, and reduced interpretability. In this paper, we introduce KGGLM, a decoder-only Transformer model designed for generalizable knowledge representation learning to support recommendation. The model is trained on generic paths sampled from the knowledge graph to capture foundational patterns, and then fine-tuned on paths specific of the downstream step (knowledge completion and path reasoning in our case). Experiments on ML1M and LFM1M show that KGGLM beats twenty-two baselines in effectiveness under both knowledge completion and recommendation. Source code and pre-processed data sets are available at https://github.com/mirkomarras/kgglm.

Full text in ACM Digital Library
LBRLess is More: Towards Sustainability-Aware Persuasive Explanations in Recommender Systems
by Thi Ngoc Trang Tran (Graz University of Technology), Seda Polat Erdeniz (Graz University of Technology), Alexander Felfernig (Graz University of Technology), Sebastian Lubos (Graz University of Technology), Merfat El Mansi (Graz University of Technology) and Viet-Man Le (Graz University of Technology)

Recommender systems play an important role in supporting the achievement of the United Nations sustainable development goals (SDGs). In recommender systems, explanations can support different goals, such as increasing a user’s trust in a recommendation, persuading a user to purchase specific items, or increasing the understanding of the reasons behind a recommendation. In this paper, we discuss the concept of “sustainability-aware persuasive explanations” which we regard as a major concept to support the achievement of the mentioned SDGs. Such explanations are orthogonal to most existing explanation approaches since they focus on a “less is more” principle, which per se is not included in existing e-commerce platforms. Based on a user study in three item domains, we analyze the potential impacts of sustainability-aware persuasive explanations. The study results are promising regarding user acceptance and the potential impacts of such explanations.

Full text in ACM Digital Library
LBRLeveraging Monte Carlo Tree Search for Group Recommendation
by Antonela Tommasel (CONICET-UNCPBA, ISISTAN) and J. Andres Diaz-Pace (CONICET-UNCPBA, ISISTAN)

Group recommenders aim to provide recommendations that satisfy the collective preferences of multiple users, a challenging task due to the diverse individual tastes and conflicting interests to be balanced. This is often accomplished by using aggregation techniques that select items on which the group can agree. Traditional aggregators struggle with these complexities, as items are chosen independently, leading to sub-optimal recommendations lacking diversity, novelty, or fairness. In this paper, we propose an aggregation technique that leverages Monte Carlo Tree Search (MCTS) to enhance group recommendations. MCTS is used to explore and evaluate candidate recommendation sequences to optimize overall group satisfaction. We also investigate the integration of MCTS with LLMs aiming at better understanding interactions between user preferences and recommendation sequences to inform the search. Experimental evaluations, although preliminary, showed that our proposal outperforms existing aggregation techniques in terms of relevance and beyond-accuracy aspects of recommendations. The LLM integration achieved positive results for recommendations’ relevance. Overall, this work highlights the potential of heuristic search techniques to tackle the complexities of group recommendations.

Full text in ACM Digital Library
LBRRecommender Systems Algorithm Selection for Ranking Prediction on Implicit Feedback Datasets
by Lukas Wegmeth (University of Siegen), Tobias Vente (University of Siegen) and Joeran Beel (University of Siegen)

The recommender systems algorithm selection problem for ranking prediction on implicit feedback datasets is under-explored. Traditional approaches in recommender systems algorithm selection focus predominantly on rating prediction on explicit feedback datasets, leaving a research gap for ranking prediction on implicit feedback datasets. Algorithm selection is a critical challenge for nearly every practitioner in recommender systems. In this work, we take the first steps toward addressing this research gap.

We evaluate the NDCG@10 of 24 recommender systems algorithms, each with two hyperparameter configurations, on 72 recommender systems datasets. We train four optimized machine-learning meta-models and one automated machine-learning meta-model with three different settings on the resulting meta-dataset.

Our results show that the predictions of all tested meta-models exhibit a median Spearman correlation ranging from 0.857 to 0.918 with the ground truth. We show that the median Spearman correlation between meta-model predictions and the ground truth increases by an average of 0.124 when the meta-model is optimized to predict the ranking of algorithms instead of their performance. Furthermore, in terms of predicting the best algorithm for an unknown dataset, we demonstrate that the best optimized traditional meta-model, e.g., XGBoost, achieves a recall of 48.6%, outperforming the best tested automated machine learning meta-model, e.g., AutoGluon, which achieves a recall of 47.2%.

Full text in ACM Digital Library
LBRSocial Choice for Heterogeneous Fairness in Recommendation
by Amanda Aird (University of Colorado Boulder), Elena Štefancová (Comenius University Bratislava), Cassidy All (University of Colorado Boulder), Amy Voida (University of Colorado Boulder), Martin Homola (Comenius University Bratislava), Nicholas Mattei (Tulane University) and Robin Burke (University of Colorado Boulder)

Algorithmic fairness in recommender systems requires close attention to the needs of a diverse set of stakeholders that may have competing interests. Previous work in this area has often been limited by fixed, single-objective definitions of fairness, built into algorithms or optimization criteria that are applied to a single fairness dimension or, at most, applied identically across dimensions. These narrow conceptualizations limit the ability to adapt fairness-aware solutions to the wide range of stakeholder needs and fairness definitions that arise in practice. Our work approaches recommendation fairness from the standpoint of computational social choice, using a multi-agent framework. In this paper, we explore the properties of different social choice mechanisms and demonstrate the successful integration of multiple, heterogeneous fairness definitions across multiple data sets.

Full text in ACM Digital Library
LBRTLRec: A Transfer Learning Framework to Enhance Large Language Models for Sequential Recommendation Tasks
by Jiaye Lin (Tsinghua University), Shuang Peng (Zhejiang Lab), Zhong Zhang (Tencent AI Lab) and Peilin Zhao (Tencent AI Lab)

Recently, Large Language Models (LLMs) have garnered significant attention in recommendation systems, improving recommendation performance through in-context learning or parameter-efficient fine-tuning. However, cross-domain generalization, i.e., model training in one scenario (source domain) but inference in another (target domain), is underexplored. In this paper, we present TLRec, a transfer learning framework aimed at enhancing LLMs for sequential recommendation tasks. TLRec specifically focuses on text inputs to mitigate the challenge of limited transferability across diverse domains, offering promising advantages over traditional recommendation models that heavily depend on unique identities (IDs) like user IDs and item IDs. Moreover, we leverage the source domain data to further enhance LLMs’ performance in the target domain. Initially, we employ powerful closed-source LLMs (e.g., GPT-4) and chain-of-thought techniques to construct instruction tuning data from the third-party scenario (source domain). Subsequently, we apply curriculum learning to fine-tune LLMs for effective knowledge injection and perform recommendations in the target domain. Experimental results demonstrate that TLRec achieves superior performance under the zero-shot and few-shot settings.

Full text in ACM Digital Library
LBRUnderstanding Fairness in Recommender Systems: A Healthcare Perspective
by Veronica Kecki (University of Gothenburg) and Alan Said (University of Gothenburg)

Fairness in AI-driven decision-making systems has become a critical concern, especially when these systems directly affect human lives. This paper explores the public’s comprehension of fairness in healthcare recommendations. We conducted a survey where participants selected from four fairness metrics – Demographic Parity, Equal Accuracy, Equalized Odds, and Positive Predictive Value – across different healthcare scenarios to assess their understanding of these concepts. Our findings reveal that fairness is a complex and often misunderstood concept, with a generally low level of public understanding regarding fairness metrics in recommender systems. This study highlights the need for enhanced information and education on algorithmic fairness to support informed decision-making in using these systems. Furthermore, the results suggest that a one-size-fits-all approach to fairness may be insufficient, pointing to the importance of context-sensitive designs in developing equitable AI systems.

Full text in ACM Digital Library
LBRUser knowledge prompt for sequential recommendation
by Yuuki Tachioka (Denso IT Laboratory)

The large language model (LLM) based recommendation system is effective for sequential recommendation, because general knowledge of popular items is included in the LLM. To add domain knowledge of items, the conventional method uses a knowledge prompt obtained from the item knowledge graphs and has achieved SOTA performance. However, for personalized recommendation, it is necessary to consider user knowledge, which the conventional method does not fully consider because user knowledge is not included in the item knowledge graphs; thus, we propose a user knowledge prompt, which converts a user knowledge graph into a prompt using the relationship template. The existing prompt denoising framework is extended to prevent hallucination caused by undesirable interactions between knowledge graph prompts. We propose user knowledge prompts of user traits and user preferences and associate relevant items. Experiments on three types of dataset (movie, music, and book) show the significant and consistent improvement of our proposed user knowledge prompt.

Full text in ACM Digital Library
LBRWhat to compare? Towards understanding user sessions on price comparison platforms
by Ahmadou Wagne (TU Wien) and Julia Neidhardt (TU Wien)

E-commerce and online shopping have become integral to the lives of many, with various user behavior types historically identified. Beyond deciding what to buy, determining where to make a purchase has led to the importance of price comparison platforms. However, user behavior on these platforms remains underexplored. Furthermore, web analytics often struggle with tracking users over time and deriving meaningful user types from data. This paper addresses these gaps by defining session types through the analysis and clustering of user logs from a major price comparison platform. The study identifies six distinct session clusters: quick peek, major purchase, constraint-based browsing, knowledge seeking, search and browse and heavy browsing. These findings are intended to inform the design and development of a conversational recommender system (CRS). Often, CRS development occurs without adequate consideration of the existing system into which it will be integrated. The study’s findings, derived from both quantitative analysis and expert interviews, provide valuable contributions, including identified session clusters, their interpretation and indicators on which users might benefit from a CRS on these platforms.

Full text in ACM Digital Library

List of all Demo papers accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.

DEMOA Tool for Explainable Pension Fund Recommendations using Large Language Models
by Eduardo Alves da Silva (IComp – Institute of Computing, Federal University of Amazonas; University of Vale do Itajaí; Saks Global), Leandro Balby Marinho (Federal University of Campina Grande), Edleno Silva de Moura (IComp – Institute of Computing, Federal University of Amazonas) and Altigran Soares da Silva (IComp – Institute of Computing, Federal University of Amazonas)

In this demo, we present a prototype tool designed to help financial advisors recommend private pension funds to investors based on their preferences, offering personalized investment suggestions. The tool leverages Large Language Models (LLMs), which enhance explainability by providing clear and understandable rationales for recommendations and effectively handles both sequential and cold-start scenarios. We outline the design, implementation, and results of a user-based evaluation using real-world data. The evaluation shows a high recommendation acceptance rate among financial advisors, highlighting the tool’s potential to improve decision-making in financial advisory services.

Full text in ACM Digital Library
DEMOGenUI(ne) CRS: UI Elements and Retrieval-Augmented Generation in Conversational Recommender Systems with LLMs
by Ulysse Maes (Vrije Universiteit Brussel), Lien Michiels (Vrije Universiteit Brussel; University of Antwerp) and Annelien Smets (Vrije Universiteit Brussel)

Previous research has used Large Language Models (LLMs) to develop personalized Conversational Recommender Systems (CRS) with text-based user interfaces (UIs). However, the potential of LLMs to generate interactive graphical elements that enhance user experience remains largely unexplored. To address this gap, we introduce “GenUI(ne) CRS,” a novel framework designed to leverage LLMs for adaptive and interactive UIs. Our framework supports domain-specific graphical elements such as buttons and cards, in addition to text-based inputs. It also addresses the common LLM issue of outdated knowledge, known as the “knowledge cut-off,” by implementing Retrieval-Augmented Generation (RAG). To illustrate its potential, we developed a prototype movie CRS. This work demonstrates the feasibility of LLM-powered interactive UIs and paves the way for future CRS research, including user experience validation, transparent explanations, and addressing LLM biases.

Full text in ACM Digital Library
DEMOMulti-Preview Recommendation via Reinforcement Learning
by Yang Xu (North Carolina State University), Kuan-Ting Lai (Microsoft), Pengcheng Xiong (Microsoft) and Zhong Wu (Microsoft)

Preview recommendations serve as a crucial shortcut for attracting users’ attention on various systems, platforms, and webpages, significantly boosting user engagement. However, the variability of preview types and the flexibility of preview duration make it challenging to use an integrated framework for multi-preview recommendations under resource constraints. In this paper, we present an approach that incorporates constrained Q-learning into a notification recommendation system, effectively handling both multi-preview ranking and duration orchestration by targeting long-term user retention. Our method bridges the gap between combinatorial reinforcement learning, which often remains too theoretical for practical use, and segmented modules in production, where model performance is typically compromised due to over-simplification. We demonstrate the superiority of our approach through off-policy evaluation and online A/B testing using Microsoft data.

Full text in ACM Digital Library
DEMORePlay: a Recommendation Framework for Experimentation and Production Use
by Alexey Vasilev (Sber AI Lab), Anna Volodkevich (Sber AI Lab), Denis Kulandin (Sber AmazMe), Tatiana Bysheva (Sber AmazMe) and Anton Klenitskiy (Sber AI Lab)

Using a single tool to build and compare recommender systems significantly reduces the time to market for new models. In addition, the comparison results when using such tools look more consistent. This is why many different tools and libraries for researchers in the field of recommendations have recently appeared. Unfortunately, most of these frameworks are aimed primarily at researchers and require modification for use in production due to the inability to work on large datasets or an inappropriate architecture. In this demo, we present our open-source toolkit RePlay – a framework containing an end-to-end pipeline for building recommender systems, which is ready for production use. RePlay also allows you to use a suitable stack for the pipeline on each stage: Pandas, Polars, or Spark. This allows the library to scale computations and deploy to a cluster. Thus, RePlay allows data scientists to easily move from research mode to production mode using the same interfaces.

Full text in ACM Digital Library
DEMORs4rs: Semantically Find Recent Publications from Top Recommendation System-Related Venues
by Tri Kurniawan Wijaya (Huawei Ireland Research Centre), Edoardo D’Amico (Huawei Ireland Research Centre), Gabor Fodor (Huawei Ireland Research Centre) and Manuel V. Loureiro (Huawei Ireland Research Centre)

Rs4rs is a web application designed to perform semantic search on recent papers from top conferences and journals related to Recommender Systems. Current scholarly search engine tools like Google Scholar, Semantic Scholar, and ResearchGate often yield broad results that fail to target the most relevant high-quality publications. Moreover, manually visiting individual conference and journal websites is a time-consuming process that primarily supports only syntactic searches. Rs4rs addresses these issues by providing a user-friendly platform where researchers can input their topic of interest and receive a list of recent, relevant papers from top Recommender Systems venues. Utilizing semantic search techniques, Rs4rs ensures that the search results are not only precise and relevant but also comprehensive, capturing papers regardless of
variations in wording. This tool significantly enhances research efficiency and accuracy, thereby benefitting the research community and public by facilitating access to high-quality, pertinent academic resources in the field of Recommender Systems. Rs4rs is available at https://rs4rs.com.

Full text in ACM Digital Library
DEMOStalactite: toolbox for fast prototyping of vertical federated learning systems
by Anastasiia Zakharova (ITMO University), Dmitriy Alexandrov (ITMO University), Maria Khodorchenko (ITMO University), Nikolay Butakov (ITMO University), Alexey Vasilev (Sber AI Lab), Maxim Savchenko (Sber AI Lab) and Alexander Grigorievskiy (Independent Researcher)

Machine learning (ML) models trained on datasets owned by different organizations and physically located in remote databases offer benefits in many real-world use cases. State regulations or business requirements often prevent data transfer to a central location, making it difficult to utilize standard machine learning algorithms. Federated Learning (FL) is a technique that enables models to learn from distributed datasets without revealing the original data. Vertical Federated learning (VFL) is a type of FL where data samples are divided by features across several data owners. For instance, in a recommendation task, a user can interact with various sets of items, and the logs of these interactions are stored by different organizations. In this demo paper, we present Stalactite – an open-source framework for VFL that provides the necessary functionality for building prototypes of VFL systems. It has several advantages over the existing frameworks. In particular, it allows researchers to focus on the algorithmic side rather than engineering and to easily deploy learning in a distributed environment. It implements several VFL algorithms and has a built-in homomorphic encryption layer. We demonstrate its use on a real-world recommendation datasets.

Full text in ACM Digital Library

List of all doctoral symposium papers accepted for RecSys 2024 (in alphabetical order).
Check the Doctoral Symposium Guidelines for information about organization and presentation.
If you need to print your poster in Bari, follow these instructions.

DSA New Perspective in Health Recommendations: Integration of Human Pose Estimation
by Gaetano Dibenedetto (University of Bari Aldo Moro)

In recent years, there has been a growing interest in multimodal and multi-source data due to their ability to introduce heterogeneous information. Studies have demonstrated that combining such information enhances the performance of Recommender Systems across various scenarios. In the context of Health Recommendation Systems (HRS), different types of data are utilized, primarily focusing on patient-based information, but data from Pose Estimations (PE) are not incorporated.

The objective of my Ph.D. is to investigate methods to design and develop HRS that treat the PE as one of the input sources, taking into account aspects such as privacy concerns and balancing the trade-off between system quality and responsiveness. By leveraging the combination of diverse information sources, I intend to create a new model in the area of HRS capable of providing more precise and explainable recommendations.

Full text in ACM Digital Library
DSAI-based Human-Centered Recommender Systems: Empirical Experiments and Research Infrastructure
by Ruixuan Sun (University of Minnesota)

This is a dissertation plan built around human-centered empirical experiments evaluating recommender systems (RecSys). We see this as an important research theme since many AI-based RecSys algorithmic studies lack real human assessment. Therefore, we do not know how they work in the wild that only human experiments can tell us. We split this extended abstract into two parts – 1) A series of individual studies focusing on open questions about different human values or recommendation algorithms. Our completed works include user control over content diversity, user appreciation on DL-RecSys algorithms, and human-LLMRec interaction study. We also propose three future works to understand news recommendation depolarization, personalized news podcast, and interactive user representation; 2) An experimentation infrastructure named POPROX. As a personalized news recommendation platform, it aims to support the longitudinal study needs from the general AI and RecSys research community.

Full text in ACM Digital Library
DSBias in Book Recommendation
by Savvina Daniil (CWI)

Recommender systems are prevalent in many applications, but hide risks; issues like bias propagation have been on the focus of related studies in recent years. My own research revolves around tracking bias in the book recommendation domain. Specifically, I am interested in whether the incorporation of recommender systems in a library’s loaning system serves their social responsibility and purpose, with bias being the main point of concern. To this end, I engage with the topic in three ways; by mapping the area of ethics in book recommendation, by investigating and reflecting on challenges with studying bias in recommender systems in general, and by showcasing a set of social implication of statistical bias in the book recommendation domain in particular. In this doctoral symposium paper, I further elaborate on the problem at hand, the outline of my thesis, the progress I have made so far, as well as my plans for future work along with specific questions that have arisen from my research efforts.

Full text in ACM Digital Library
DSBridging Viewpoints in News with Recommender Systems
by Jia Hua Jeng (MediaFutures, University of Bergen)

News Recommender systems (NRSs) aid in decision-making in news media. However, undesired effects can emerge. Among these are selective exposures that may contribute to polarization, potentially reinforcing existing attitudes through belief perseverance—discounting contrary evidence due to their opposing attitudinal strength. This can be unsafe for people, making it difficult to accept information objectively. A crucial issue in news recommender system research is how to mitigate these undesired effects by designing recommender interfaces and machine learning models that enable people to consider to be more open to different perspectives. Alongside accurate models, the user experience is an equally important measure. Indeed, the core statistics are based on users’ behaviors and experiences in this research project. Therefore, this research agenda aims to steer the choices of readers’ based on altering their attitudes. The core methods plan to concentrate on the interface design and ML model building involving manipulations of cues, users’ behaviors prediction, NRSs algorithm and changing the nudges. In sum, the project aims to provide insight in the extent to which news recommender systems can be effective in mitigating polarized opinions.

Full text in ACM Digital Library
DSCEERS: Counterfactual Evaluations of Explanations in Recommender Systems
by Mikhail Baklanov (Tel Aviv University)

The increasing focus on explainability within ethical AI, mandated by frameworks such as GDPR, highlights the critical need for robust explanation mechanisms in Recommender Systems (RS). A fundamental aspect of advancing such methods involves developing reproducible and quantifiable evaluation metrics. Traditional evaluation approaches involving human subjects are inherently non-reproducible, costly, subjective, and context-dependent. Furthermore, the complexity of AI models often transcends human comprehension capabilities, rendering it challenging for evaluators to ascertain the accuracy of explanations. Consequently, there is an urgent need for objective and scalable metrics that can accurately assess explanation methods in RS. Drawing inspiration from established practices in computer vision, this research introduces a counterfactual methodology to evaluate the accuracy of explanations in RS.

Although counterfactual methods are well recognized in other fields, they remain relatively unexplored within the domain of recommender systems. This study aims to establish quantifiable metrics that objectively evaluate the correctness of local explanations. In this work, we wish to adopt these methods for recommendation systems, thereby enabling an easy and reproducible approach to evaluating the correctness of explanations for recommendation systems.

Full text in ACM Digital Library
DSEnhancing Cross-Domain Recommender Systems with LLMs: Evaluating Bias and Beyond-Accuracy Measures
by Thomas Elmar Kolb (TU Wien)

The research domain of recommender systems is rapidly evolving. Initially, optimization efforts focused primarily on accuracy. However, recent research has highlighted the importance of addressing bias and beyond-accuracy measures such as novelty, diversity, and serendipity. With the rise of multi-domain recommender systems, the need to re-examine bias and beyond-accuracy measures in cross-domain settings has become crucial. Traditional methods face challenges such as cold-start problems, which can potentially be mitigated by leveraging LLMs. This proposed work investigates how LLM-based recommendation methods can enhance cross-domain recommender systems, focusing on identifying, measuring, and mitigating bias while evaluating the impact of beyond-accuracy measures. We aim to provide new insights by comparing traditional and LLM-based systems within a real-world environment encompassing the domains of news, books, and various lifestyle areas. Our research seeks to address the outlined gaps and develop effective evaluation strategies for the unique challenges posed by LLMs in cross-domain recommender systems.

Full text in ACM Digital Library
DSEnhancing Privacy in Recommender Systems through Differential Privacy Techniques
by Angela Di Fazio (Politecnico di Bari)

Recommender systems have become essential tools for addressing information overload in the digital age. However, the collection and usage of user data for personalized recommendations raise significant privacy concerns. This research focuses on enhancing privacy in recommender systems through the application of differential privacy techniques, particularly in the domain of privacy-preserving data publishing.

Our study aims to address three key research questions: (1) developing standardized metrics to characterize and compare recommendation datasets in the context of privacy-preserving data publishing, (2) designing differential privacy algorithms for private data publishing that preserve recommendation quality, and (3) examining the impact of differential privacy on beyond-accuracy objectives in recommender systems.

We propose to develop domain-specific metrics for evaluating the similarity between recommendation datasets, analogous to those used in other domains such as trajectory data publication. Additionally, we will investigate methods to balance the trade-off between privacy guarantees and recommendation accuracy, considering the potential disparate impacts on different user subgroups. Finally, we aim to assess the broader implications of implementing differential privacy on beyond-accuracy objectives such as diversity, popularity bias, and fairness.

By addressing these challenges, our research seeks to contribute to the advancement of privacy-preserving techniques in recommender systems, facilitating the responsible and secure use of recommendation data while maintaining the utility of personalized suggestions. The outcomes of this study have the potential to significantly benefit the field by enabling the reuse of existing algorithms with minimal adjustments while ensuring robust privacy guarantees.

Full text in ACM Digital Library
DSEvaluating the Pros and Cons of Recommender Systems Explanations
by Kathrin Wardatzky (University of Zurich)

Despite the growing interest in explainable AI in the RecSys community, the evaluation of explanations is still an open research topic. Typically, explanations are evaluated using offline metrics, with a case study, or through a user study. In my research, I will have a closer look at the evaluation of the effects of explanations on users. I investigate two possible factors that can impact the effects reported in recent publications, namely the explanation design and content as well as the users themselves. I further address the problem of determining promising explanations for an application scenario from a seemingly endless pool of options. Lastly, I propose a user study to close some of the research gaps established in the surveys and investigate how recommender systems explanations impact the understanding of users with different backgrounds.

Full text in ACM Digital Library
DSExplainability in Music Recommender System
by Shahrzad Shashaani (TU Wien)

Recommendation systems play a crucial role in our daily lives, influencing many of our significant and minor decisions. These systems also have become integral to the music industry, guiding users to discover new content based on their tastes. However, the lack of transparency in these systems often leaves users questioning the rationale behind recommendations. To address this issue, adding transparency and explainability to recommender systems is a promising solution. Enhancing the explainability of these systems can significantly improve user trust and satisfaction. This research focuses on exploring transparency and explainability in the context of recommendation systems, focusing on the music domain. This research can help to understand the gaps in explainability in music recommender systems to create more engaging and trustworthy music recommendations.

Full text in ACM Digital Library
DSExplainable and Faithful Educational Recommendations through Causal Language Modelling via Knowledge Graphs
by Neda Afreen (University of Cagliari)

The rapid expansion of digital education has significantly increased the need for recommender systems to help learners navigate the extensive variety of available learning resources. Recent advancements in these systems have notably improved the personalization of course recommendations. However, many existing systems fail to provide clear explanations for their recommendations, making it difficult for learners to understand why a particular suggestion was made. Researchers have emphasized the importance of explanations in various domains such as e-commerce, media, and entertainment, demonstrating how explanations can enhance system transparency, foster user trust, and improve decision-making processes. Despite these insights, such approaches have been rarely applied to the educational domain, and their effectiveness in practical use remains largely unexamined. My research focuses on developing explainable recommender systems for digital education. First, I aim to design knowledge graphs that can support high-quality recommendations in the educational context. Second, I will create models backed by these knowledge graphs that not only deliver accurate recommendations but also provide faithful explanations for each suggestion. Third, I will evaluate the effectiveness of these explainable recommender systems in real-world educational environments. Ultimately, this research aims to advance the development of more transparent and user-centric educational technologies.

Full text in ACM Digital Library
DSExplainable Multi-Stakeholder Job Recommender Systems
by Roan Schellingerhout (Maastricht University)

Public opinion on recommender systems has become increasingly wary in recent years. In line with this trend, lawmakers have also started to become more critical of such systems, resulting in the introduction of new laws focusing on aspects such as privacy, fairness, and explainability for recommender systems and AI at large. These concepts are especially crucial in high-risk domains such as recruitment. In recruitment specifically, decisions carry substantial weight, as the outcomes can significantly impact individuals’ careers and companies’ success. Additionally, there is a need for a multi-stakeholder approach, as these systems are used by job seekers, recruiters, and companies simultaneously, each with its own requirements and expectations. In this paper, I summarize my current research on the topic of explainable, multi-stakeholder job recommender systems and set out a number of future research directions.

Full text in ACM Digital Library
DSFairness and Transparency in Music Recommender Systems: Improvements for Artists
by Karlijn Dinnissen (Utrecht University)

Music streaming services have become one of the main sources of music consumption in the last decade, with recommender systems playing a crucial role. Since these systems partially determine which songs listeners hear, they significantly influence the artists behind the music. However, when assessing the performance and fairness of music recommender systems, the perspectives of artists and others working in the music industry are often overlooked. Additionally, artists express a desire for greater transparency regarding why certain songs are recommended while others are not. This research project adopts a multi-stakeholder approach to close the gap between music recommender systems and the artists whose music they recommend. First, we gather insights from artists and music industry professionals through interviews and questionnaires. Building on those insights, we then aim to improve matching between end users and music from lesser-known artists by generating rich item and user representations. Results will be evaluated both quantitatively and qualitatively. Lastly, we plan to effectively communicate music recommender system fairness by increasing transparency for both end users and artists.

Full text in ACM Digital Library
DSFairness Explanations in Recommender Systems
by Luan Souza (University of São Paulo)

Fairness in recommender systems is an emerging area that aims to study and mitigate discriminations against individuals or/and groups of individuals in recommendation engines. These mitigation strategies rely on bias detection, which is a non-trivial task that requires complex analysis and interventions to ensure fairness in these engines. Furthermore, fairness interventions in recommender systems involve a trade-off between fairness and performance of the recommendation lists, impacting the user experience with less potentially accurate lists. In this context, fairness interventions with explanations have been proposed recently in the literature, mitigating discrimination in recommendation lists and providing explainability about the recommendation process and the impact of the fairness interventions in the outcomes. However, in spite of the different approaches it is still not clear how these proposals compare with each other, even those that propose to mitigate the same kind of bias. In addition, the contribution of these different explainable algorithmic fairness approaches to users’ fairness perceptions was not explored until the moment. Looking at these gaps, our doctorate project aims to investigate how these explainable fairness proposals compare to each other and how they are perceived by the users, in order to identify which fairness interventions and explanation strategies are most promising to increase transparency and fairness perceptions of recommendation lists.

Full text in ACM Digital Library
DSHow to Evaluate Serendipity in Recommender Systems: the Need for a Serendiptionnaire
by Brett Binst (imec-SMIT, Vrije Universiteit Brussel)

Recommender systems can assist in various user tasks and serve diverse values, including exploring the item space. Serendipity has recently received considerable attention, often seen as a way to broaden users’ tastes and counteract filter bubbles. However, the field of research on serendipity is fragmented regarding its evaluation methods, which impedes the progress of knowledge accumulation. This research plan proposes two studies to address these issues. First, a systematic literature review will be conducted to provide insights into how serendipity is currently studied in the field. This review will serve as a reference for novice researchers and help mitigate fragmentation by presenting a thorough overview of the field. This systematic literature review has already revealed a significant gap: the lack of a validated, widely accepted method for evaluating serendipity. Therefore, the second part of this research plan is to develop a validated questionnaire, the serendiptionnaire, to measure serendipity. This tool will provide a ground truth for evaluating serendipity, aiding in answering fundamental questions within the field and validating offline metrics.

Full text in ACM Digital Library
DSIntegrating Matrix Factorization with Graph based Models
by Rachana Mehta (Pandit Deendayal Energy University)

Graph based Recommender models make use of user-item rating and user-user social relationships to elicit recommendation performance by extracting inherent geometrical knowledge. In a social graph scenario, user-user trust plays a significant role in reducing sparsity and has varied characteristics that can be exploited. Existing models limit themselves to learning from either a high-order interaction graph of user-item ratings or a user-user social graph from trust value. They explore other trust characteristics in a very limited setting. The graph based model, designed using entire user-user social information, impacts performance and escalates complexities in model learning. To alleviate these issues of graph learning, graph recommender seeks assistance from matrix factorization techniques. Incorporating graph based model with matrix factorization brings its own set of challenges of model integration, leveraging trust, graph learning, and optimization. This article presents the existing work in that line and future possibilities and challenges to be catered to through novel developments

Full text in ACM Digital Library
DSLearning Personalized Health Recommendations via Offline Reinforcement Learning
by Larry Preuett (University of Washington)

The healthcare industry is strained and would benefit from personalized treatment plans for treating various health conditions (e.g., HIV and diabetes). Reinforcement Learning is a promising approach to learning such sequential recommendation systems. However, applying reinforcement learning in the medical domain is challenging due to the lack of adequate evaluation metrics, partial observability, and the inability to explore due to safety concerns. In this line of work, we identify three research directions to improve the applicability of treatment plans learned using offline reinforcement learning.

Full text in ACM Digital Library
DSMultimodal Representation Learning for High-Quality Recommendations in Cold-Start and Beyond-Accuracy
by Marta Moscati (Johannes Kepler University Linz)

Recommender systems (RS) traditionally leverage the large amount of user–item interaction data. This exposes RS to a lower recommendation quality in cold-start scenarios, as well as to a low recommendation quality in terms of beyond-accuracy evaluation metrics. State-of-the-art (SotA) models for cold-start scenarios rely on the use of side information on the items or the users, therefore relating recommendation to multimodal machine learning (ML). However, the most recent techniques from multimodal ML are often not applied to the domain of recommendation. Additionally, the evaluation of SotA multimodal RS often neglects beyond-accuracy aspects of recommendation. In this work, we outline research into designing novel multimodal RS based on SotA multimodal ML architectures for cold-start recommendation, and their evaluation and benchmark with preexisting multimodal RS in terms of accuracy and beyond-accuracy aspects of recommendation quality.

Full text in ACM Digital Library
DSPersonal Values and Community-Centric Environmental Recommender Systems: Enhancing Sustainability Through User Engagement
by Bianca Maria Deconcini (University of Turin)

The concept of sustainability has become a central focus across multiple sectors, driven by the urgent need to address climate change and protect the environment. Technological advancements and capabilities, together with the emergence of new ecological issues [25], are leading to growing awareness and influencing shifts in multiple areas such as energy, transportation, and waste management. Within this context, the roles of recommender systems represent a promising solution, since people need guidance and occasionally a gentle push to translate their intentions into actions or to bring goals to life [9]. However, existing literature reveals a fragmented landscape, with solutions often addressing specific aspects or recommendation contribution in isolation. Many sustainability interventions focus solely on providing consumption data and environmental insights, while others emphasize learning and behavior change strategies. My doctoral project aims to address this gap by leveraging various approaches to recommender systems and applying them in sustainability contexts, with the goal to build a holistic system that maximizes the contributions of these diverse methods, also integrating user-centric and value-driven perspectives. This research project delves into two distinct facets: energy sustainability and sustainable mobility. The first case centers on enhancing energy efficiency within energy communities through personalized recommendations and engagement strategies. The second facet focuses on reshaping user commuting patterns towards sustainable alternatives, by recommending suitable and more sustainable modes of transportation, such as cycling, carpooling, and public transportation. Both cases share the same objective: align user behaviors with sustainability goals, thereby reducing individual environmental impact and enhancing the sense of belonging to a community, whether this is confined to a group of individuals or pertains to society at large. An innovative comprehensive recommendation system approach is highly beneficial since it can take advantage of all the existing contributions combined in a framework that makes at the same time different types of recommendations: explainable, educative, behavioral and social-aware, addressing the complexities of this multifaceted domain.

Full text in ACM Digital Library
DSSupporting Knowledge Workers through Personal Information Assistance with Context-aware Recommender Systems
by Mahta Bakhshizadeh (German Research Center for Artificial Intelligence – DFKI)

Recommender systems are extensively employed across various domains to mitigate information overload by providing personalized content. Despite their widespread use in sectors such as streaming, e-commerce, and social networks, utilizing them for personal information assistance is a comparatively novel application. This emerging application aims to develop intelligent systems capable of proactively providing knowledge workers with the most relevant information based on their context to enhance productivity. In this paper, we explore this innovative application by first defining the scope of our study, outlining the key objectives, and introducing the main challenges. We then present our current results and progress, including a comprehensive literature review, the proposal of a framework, the collection of a pioneering dataset, and the establishment of a benchmark for evaluating a recommendation scenario on our published dataset. We also discuss our ongoing efforts and future research directions.

Full text in ACM Digital Library
DSTowards Sustainable Recommendations in Urban Tourism
by Pavel Merinov (Free University of Bozen-Bolzano)

Full text in ACM Digital Library
DSTowards Symbiotic Recommendations: Leveraging LLMs for Conversational Recommendation Systems
by Alessandro Petruzzelli (University of Bari Aldo Moro)

Traditional recommender systems (RSs) generate suggestions by relying on user preferences and item characteristics. However, they do not to properly involve the user in the decision-making process. This gap is particularly evident in Conversational Recommender Systems (CRSs), where existing methods struggle to facilitate meaningful dialogue and dynamic user interactions.

To address this limitation, in my Ph.D. project I will ground on the principles of Symbiotic AI (SAI) to propose a novel approach for CRS. Rather than treating users as passive recipients, this approach aims to engage them in an adaptive dialogue based on their preferences, previous interactions, and personal characteristics, thus fostering collaborative decision-making. To achieve this objective, my research unfolds in three phases. First, I will adapt Large Language Models (LLMs) to effectively handle recommendation tasks in a number of different domains, by also introducing knowledge injection techniques. Second, I will develop a CRS that not only provides accurate recommendations but also offers natural language explanations and responds to user queries, thereby promoting transparency and building user trust. Finally, I will consider users’ personal characteristics to personalize the CRS’s response strategy, ensuring adaptive and effective communication in line with SAI principles.

Full text in ACM Digital Library

List of all industry track contributions accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.

INDA Hybrid Multi-Agent Conversational Recommender System with LLM and Search Engine in E-commerce
by Guangtao Nie (JD.com), Rong Zhi (JD.com), Xiaofan Yan (JD.com), Yufan Du (JD.com), Xiangyang Zhang (JD.com), Jianwei Chen (JD.com), Mi Zhou (JD.com), Hongshen Chen (JD.com), Tianhao Li (JD.com), Ziguang Cheng (JD.com), Sulong Xu (JD.com) and Jinghe Hu (JD.com)

Multi-agent collaboration is the latest trending method to build conversational recommender systems (CRS), especially with the widespread use of Large Language Models (LLMs) recently. Typically, these systems employ several LLM agents, each serving distinct roles to meet user needs. In an industrial setting, it’s essential for a CRS to exhibit low first token latency (i.e., the time taken from a user’s input until the system outputs its first response token.) and high scalability—for instance, minimizing the number of LLM inferences per user request—to enhance user experience and boost platform profit. For example, JD.com’s baseline CRS features two LLM agents and a search API but suffers from high first token latency and requires two LLM inferences per request (LIPR), hindering its performance. To address these issues, we introduce a Hybrid Multi-Agent Collaborative Recommender System (Hybrid-MACRS). It includes a central agent powered by a fine-tuned proprietary LLM and a search agent combining a related search module with a search engine. This hybrid system notably reduces first token latency by about 70% and cuts the LIPR from 2 to 1. We conducted thorough online A/B testing to confirm this approach’s efficiency.

Full text in ACM Digital Library
INDAI-assisted Coding with Cody: Lessons from Context Retrieval and Evaluation for Code Recommendations
by Jan Hartman (Sourcegraph), Hitesh Sagtani (Sourcegraph), Julie Tibshirani (Sourcegraph) and Rishabh Mehrotra (Sourcegraph)

In this work, we discuss a recently popular type of recommender system: an LLM-based coding assistant. Connecting the task of providing code recommendations in multiple formats to traditional RecSys challenges, we outline several similarities and differences due to domain specifics. We emphasize the importance of providing relevant context to an LLM for this use case and discuss lessons learned from context enhancements & offline and online evaluation of such AI-assisted coding systems.

Full text in ACM Digital Library
INDAnalyzing User Preferences and Quality Improvement on Bing’s WebPage Recommendation Experience with Large Language Models
by Jaidev Shah (Microsoft AI), Gang Luo (Microsoft AI), Jialin Liu (Microsoft AI), Amey Barapatre (Microsoft AI), Fan Wu (Microsoft AI), Chuck Wang (Microsoft AI) and Hongzhi Li (Microsoft AI)

Explore Further @ Bing (Web Recommendations) is a web-scale query independent webpage-to-webpage recommendation system with an index size of over 200 billion webpages. Due to the significant variability in webpage quality across the web and the reliance of our system on learning soleley user behavior (clicks), our production system was susceptible to serving clickbait and low-quality recommendations. Our team invested several months in developing and shipping several improvements that utilize LLM-generated recommendation quality labels to enhance our ranking stack to improve the nature of the recommendations we show to our users. Another key motivation behind our efforts was to go beyond merely surfacing relevant webpages, focusing instead on prioritizing more useful and authoritative content that delivers value to users based on their implied intent. We demonstrate how large language models (LLMs) offer a powerful tool for product teams to gain deeper insights into shifts in product experience and user behavior following significant improvements or changes to a production system. In this work, to enable our analysis, we also showcase the use of a small language model (SLM) to generate better-quality webpage text features and summaries at scale and describe our approach to mitigating position bias in user interaction logs.”

Full text in ACM Digital Library
INDBootstrapping Conditional Retrieval for User-to-Item Recommendations
by Hongtao Lin (Pinterest), Haoyu Chen (Pinterest), Jaewon Yang (Pinterest) and Jiajing Xu (Pinterest)

User-to-item retrieval has been an active research area in recommendation system, and two tower models are widely adopted due to model simplicity and serving efficiency. In this work, we focus on a variant called conditional retrieval, where we expect retrieved items to be relevant to a condition (e.g. topic). We propose a method that uses the same training data as standard two tower models but incorporates item-side information as conditions in query. This allows us to bootstrap new conditional retrieval use cases and encourages feature interactions between user and condition. Experiments show that our method can retrieve highly relevant items and outperforms standard two tower models with filters on engagement metrics. The proposed model is deployed to power a topic-based notification feed at Pinterest and led to +0.26% weekly active users.

Full text in ACM Digital Library
INDBridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
by Nikhil Khani (Google LLC), Li Wei (Google LLC), Aniruddh Nath (Google LLC), Shawn Andrews (Google LLC), Shuo Yang (Google LLC), Yang Liu (Google LLC), Pendo Abbo (Google LLC), Maciej Kula (Google LLC), Jarrod Kahn (Google LLC), Zhe Zhao (University of California), Lichan Hong (Google LLC) and Ed Chi (Google LLC)

Knowledge Distillation (KD) is a powerful approach for compressing a large model into a smaller, more efficient model, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring consistent and reliable generation of high-quality teacher labels from a continuous data stream of data.

Full text in ACM Digital Library
INDCo-optimize Content Generation and Consumption in a Large Scale Video Recommendation System
by Zhen Zhang (Google Inc.), Qingyun Liu (Google DeepMind), Yuening Li (Google Inc.), Sourabh Bansod (Google Inc.), Mingyan Gao (Google Inc.), Yaping Zhang (Google Inc.), Zhe Zhao (Google DeepMind), Lichan Hong (Google DeepMind), Ed H. Chi (Google DeepMind), Shuchao Bi (Google Inc.) and Liang Liu (Google Inc.)

Multi-task prediction models and value models are the de-facto standard ranking components in modern large-scale content recommendation systems. However, they are typically optimized to model users’ passive consumption behaviors, and rank content in a way to grow only consumption-centric values. In this talk, we discuss the key insight that it is possible to model sparse participatory content-generation actions as well and grow ecosystem value through a new ranking system. We made the following key technical contributions in this system: (1) introducing ranking for content generation based on a categorization of user participation actions of different sparsity, including proxy intent action or access point clicks. (2) improving sparse task prediction quality and stability by causal task relationship modeling, conditional loss modeling and ResNet based shared bottom network. (3) personalizing the value model to minimize conflicts between different values, through e.g. ranking inspiring content higher for users who actively generate content. (4) conducting systematic evaluation of proposed approach in a large short-form video UGC (User-Generated Content) platform.

Full text in ACM Digital Library
INDCountry-diverted experiments for mitigation of network effects
by Lina Lin (Google), Changping Meng (Google), Jennifer Brennan (Google), Jean Pouget-Abadie (Google), Ningren Han (Google), Shuchao Bi (Google) and Yajun Peng (Google)

Full text in ACM Digital Library
INDDynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce
by Ádám Tibor Czapp (Taboola Budapest), Mátyás Jani (Taboola Budapest), Bálint Domián (Taboola Budapest) and Balázs Hidasi (Taboola Budapest)

Coupling latent diffusion based image generation with contextual bandits enables the creation of eye-catching personalized product images at scale that was previously either impossible or too expensive. In this paper we showcase how we utilized these technologies to increase user engagement with recommendations in online retargeting campaigns for e-commerce.

Full text in ACM Digital Library
INDEmbedding based retrieval for long tail search queries in ecommerce
by Akshay Kekuda (Best Buy), Yuyang Zhang (Best Buy) and Arun Udayashankar (Best Buy)

In this abstract we present a series of optimizations we performed on the two-tower model architecture, training and evaluation datasets to implement semantic product search at Best Buy. Search queries on bestbuy.com follow the pareto distribution whereby a minority of them account for most searches. This leaves us with a long tail of search queries that have low frequency of issuance. The queries in the long tail suffer from very spare interaction signals. Our current work focuses on building a model to serve the long tail queries. We present a series of optimizations we have done to this model to maximize conversion for the purpose of retrieval from the catalog.

The first optimization we present is using a large language model to improve the sparsity of conversion signals. The second optimization is pretraining an off-the-shelf transformer-based model on the Best Buy catalog data. The third optimization we present is on the finetuning front. We use query-to-query pairs in addition to query-to-product pairs and combining the above strategies for finetuning the model. We also demonstrate how merging the weights of these finetuned models improves the evaluation metrics. Finally, we provide a recipe for curating an evaluation dataset for continuous monitoring of model performance with human-in-the-loop evaluation. We found that adding this recall mechanism to our current term match-based recall improved conversion by 3% in an online A/B test.

Full text in ACM Digital Library
INDEncouraging Exploration in Spotify Search through Query Recommendations
by Henrik Lindstrom (Spotify), Humberto Jesus Corona Pampin (Spotify), Enrico Palumbo (Spotify) and Alva Liu (Spotify)

At Spotify, search has been traditionally seen as a tool for retrieving content, with the search system optimized for when the user has a specific target in mind. In particular we have relied on an instant search system providing results for each keystroke, which works well for known-item search, when queries are straightforward, and the catalog is small. However, as Spotify’s catalog grows in size and variety, it becomes increasingly difficult for users to define their search intents accurately. Furthermore, as we expand the offering, we need to help users discover more content both when it comes to new content types, e.g. audiobooks, as well as for new content/creators within existing content types. To solve this we have introduced a hybrid Query Recommendation system (QR) that helps the user formulate more complex exploratory search intents, while still serving known-item lookups efficiently. This experience has been rolled out worldwide to all mobile users resulting in an increase in exploratory intent queries of 9% in A/B tests.

Full text in ACM Digital Library
INDEnhancing Performance and Scalability of Large-Scale Recommendation Systems with Jagged Flash Attention
by Rengan Xu (Meta Platforms), Junjie Yang (Meta Platforms), Yifan Xu (Meta Platforms), Hong Li (Meta Platforms), Xing Liu (Meta Platforms), Devashish Shankar (Meta Platforms), Haoci Zhang (Meta Platforms), Meng Liu (Meta Platforms), Boyang Li (Meta Platforms), Yuxi Hu (Meta Platforms), Mingwei Tang (Meta Platforms), Zehua Zhang (Meta Platforms), Tunhou Zhang (Meta Platforms), Dai Li (Meta Platforms), Sijia Chen (Meta Platforms), Gian-Paolo Musumeci (Meta Platforms), Jiaqi Zhai (Meta Platforms), Bill Zhu (Meta Platforms), Hong Yan (Meta Platforms) and Srihari Reddy (Meta Platforms)

The integration of hardware accelerators has significantly advanced the capabilities of modern recommendation systems, enabling the exploration of complex ranking paradigms previously deemed impractical. However, the GPU-based computational costs present substantial challenges. In this paper, we demonstrate our development of an efficiency-driven approach to explore these paradigms, moving beyond traditional reliance on native PyTorch modules. We address the specific challenges posed by ranking models’ dependence on categorical features, which vary in length and complicate GPU utilization. We introduce Jagged Feature Interaction Kernels, a novel method designed to extract fine-grained insights from long categorical features through efficient handling of dynamically sized tensors. We further enhance the performance of attention mechanisms by integrating Jagged tensors with Flash Attention. Our novel Jagged Flash Attention achieves up to 9 × speedup and 22 × memory reduction compared to dense attention. Notably, it also outperforms dense flash attention, with up to 3 × speedup and 53% more memory efficiency. In production models, we observe 10% QPS improvement and 18% memory savings, enabling us to scale our recommendation systems with longer features and more complex architectures.

Full text in ACM Digital Library
INDEnhancing Recommendation Quality of the SASRec Model by Mitigating Popularity Bias
by Venkata Harshit Koneru (ZDF), Xenija Neufeld (Accso – Accelerated Solutions GmbH), Sebastian Loth (ZDF) and Andreas Grün (ZDF)

ZDF is a Public Service Media (PSM) broadcaster in Germany that uses recommender systems on its streaming service platform ZDFmediathek. One of the main use cases within the ZDFmediathek is Next Video, which is currently based on a Self-Attention based Sequential Recommendation model (SASRec). For this use case, we modified the loss function, the sampling method of negative items, and introduced the top-k negative sampling strategy and compared this to the vanilla SASRec model. We show that this not only reduces popularity bias, but also increases clicks and viewing volume compared to that of the vanilla version.

Full text in ACM Digital Library
INDEntity-Aware Collections Ranking: A Joint Scoring Approach
by Sihao Chen (Shopee Pte. Ltd.), Sheng Li (Shopee Pte. Ltd.), Youhe Chen (Shopee Pte. Ltd.) and Dong Yang (Shopee Pte. Ltd.)

Recommender systems in academia and industry typically predict Click-Through Rate (CTR) at the item or entity level. In practical scenarios, products can take on various forms and designs. We present a novel joint scoring framework that supports the listwise ranking of a collection composed of multiple entities. It learns the best combination of entities to be displayed to the user. We also introduce a novel dual attention mechanism that better captures the user’s interest in the collection. Our approach demonstrated superior performance through offline and online experiments and has been deployed to Shopee’s Shop Ads across all markets.

Full text in ACM Digital Library
INDExplore versus repeat: insights from an online supermarket
by Mariagiorgia Agnese Tandoi (Picnic Technologies) and Daniela Solis Morales (Picnic Technologies)

At online supermarket Picnic, we implemented both traditional collaborative filtering and a hybrid method to provide recipe recommendations at scale. This case study presents findings from the online evaluation of these algorithms, focusing on the repeat-explore trade-off. Our findings allow other online retailers to gain insights into the importance of thoughtful model design in navigating this important balance. We argue that even when exploiting known preferences proves highly beneficial in the short term, prioritizing exploratory content is essential for long-term customer satisfaction and sustained growth. Our research lays the groundwork for a compelling discussion on defining success in balancing the familiar and the novel in online grocery shopping.

Full text in ACM Digital Library
INDImproving Data Efficiency for Recommenders and LLMs
by Noveen Sachdeva (Google DeepMind), Benjamin Coleman (Google DeepMind), Wang-Cheng Kang (Google DeepMind), Jianmo Ni (Google DeepMind), James Caverlee (Texas A&M University), Lichan Hong (Google DeepMind), Ed Chi (Google DeepMind) and Derek Zhiyuan Cheng (Google DeepMind)

In recent years, massive transformer-based architectures have driven breakthrough performance in practical applications like autoregressive text-generation (LLMs) and click-prediction (recommenders). A common recipe for success is to train large models on massive web-scale datasets, e.g., modern recommenders are trained on billions of user-item click events, and LLMs are trained on trillions of tokens extracted from the public internet. We are close to hitting the computational and economical limits of scaling up the size of these models, and we expect the next frontier of gains to come from improving the: (i) data quality of the training dataset, and (ii) data efficiency of the extremely expensive training procedure. Inspired by this shift, we present a set of “data-centric” techniques for recommendation and language models that summarizes a dataset into a terse data summary, which is both (i) high-quality, i.e., trains better quality models, and (ii) improves the data-efficiency of the overall training procedure. We propose techniques from two disparate data frameworks: (i) data selection (a.k.a., coreset construction) methods that sample portions of the dataset using grounded heuristics, and (ii) data distillation techniques that generate synthetic examples which are optimized to retain the signals needed for training high-quality models. Overall, this work sheds light on the challenges and opportunities offered by data optimization in web-scale systems, a particularly relevant focus as the recommendation community grapples with the grand challenge of leveraging LLMs.

Full text in ACM Digital Library
INDJoint Modeling of Search and Recommendations Via an Unified Contextual Recommender (UniCoRn)
by Moumita Bhattacharya (Netflix), Vito Ostuni (Netflix) and Sudarshan Lamkhede (Netflix)

Full text in ACM Digital Library
INDLeveraging LLM generated labels to reduce bad matches in job recommendations
by Yingchi Pei (Indeed.com), Yi Wei Pang (Indeed.com) and Warren Cai (Indeed.com),
Nilanjan Sengupta (Indeed.com) and Dheeraj Toshniwal (Indeed.com)

Negative signals are increasingly employed to enhance recommendation quality. However, explicit negative feedback is often sparse and may disproportionately reflect the preferences of more vocal users. Commonly used implicit negative feedback, such as impressions without positive interactions, has the limitation of not accurately capturing users’ true negative preferences because users mainly pursue information they consider interesting. In this work, we present an approach that leverages fine-tuned Large Language Models (LLMs) to evaluate recommendation quality and generate negative signals at scale while maintaining cost efficiency. We demonstrate significant improvements in our recommendation systems by deploying a traditional classifier trained using LLM-generated labels.

Full text in ACM Digital Library
INDLyricLure: Mining Catchy Hooks in Song Lyrics to Enhance Music Discovery and Recommendation
by Siddharth Sharma (Amazon Inc.), Akshay Shukla (Amazon Inc.), Ajinkya Walimbe (Amazon Inc), Tarun Sharma (Amazon Inc) and Joaquin Delgado (Amazon)

Music Search encounters a significant challenge as users increasingly rely on catchy lines from lyrics to search for both new releases and other popular songs. Integrating lyrics into existing lexical search index or using lyrics vector index pose difficulties due to lyrics text length. While lexical scoring mechanisms like BM25 are inadequate and necessitates complex query planning and index schema for long text, text embedding similarity based techniques often retrieve noisy near-similar meaning lyrics, resulting in low precision. This paper introduces a proactive approach to extract catchy phrases from song lyrics, overcoming the limitations of conventional graph-based phrase extractors and deep learning models, which are primarily designed for extractive summarization or task-specific key phrase extraction from domain-specific corpora. Additionally, we employ a multi-step mechanism to mine search query logs for potential unresolved user queries containing catchy phrases from lyrics. This involves creation of word and character k-gram index for lyric chunks, careful query and lyrics domain-centric normalization (and expansion) and a re-ranking layer incorporating lexical and well as semantic similarity. Together these strategies helped us create a high retrieval source specifically for serving lyrics intent queries with high recall.

Full text in ACM Digital Library
INDMore to Read at the Los Angeles Times: Solving a Cold Start Problem with LLMs to Improve Story Discovery
by Franklin Horn (Los Angeles Times), Aurelia Alston (Los Angeles Times) and Won J. You (Los Angeles Times)

News publishers, who are seeking to grow their digital audience, face a challenge in providing relevant content recommendations for unregistered users arriving directly to article pages. In these cold start scenarios, classic techniques, like asking a user to register and select topics of interest, fall short. We present a contextual targeting approach that leverages the user’s current article choice as an implicit signal of user interests. We designed and developed an interface with recommendations to help users discover more articles. Our A/B testing showed that our models increased click-through rates by 39.4% over a popularity baseline. One of them, a large language model (LLM), generates relevant recommendations that balance immersion and novelty. We discuss the implications of using LLMs for responsibly enhancing user experiences while upholding editorial standards. We identify key opportunities in detecting nuanced user preferences and identifying and interrupting filter bubbles on news publisher sites.

Full text in ACM Digital Library
INDOff-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)
by Katarzyna Siudek-Tkaczuk (Samsung R&D Institute Poland), Sławomir Kapka (Samsung R&D Institute Poland), Jędrzej Alchimowicz (Samsung R&D Institute Poland), Bartłomiej Swoboda (Samsung R&D Institute Poland) and Michał Romaniuk (Samsung R&D Institute Poland)

Off-Policy Selection (OPS) aims to select the best policy from a set of policies trained using offline Reinforcement Learning. In this work, we describe our custom OPS method and its successful application in Samsung Instant Plays for optimizing ad delivery timings. The motivation behind proposing our custom OPS method is the fact that traditional Off-Policy Evaluation (OPE) methods often exhibit enormous variance leading to unreliable results. We applied our OPS method to initialize policies for our custom pseudo-online training pipeline. The final policy resulted in a substantial 49% lift in the number of watched ads while maintaining similar retention rate.

Full text in ACM Digital Library
INDOptimizing for Participation in Recommender System
by Yuan Shao (Google), Bibang Liu (Google), Sourabh Bansod (Google), Arnab Bhadury (Google), Mingyan Gao (Google) and Yaping Zhang (Google)

Full text in ACM Digital Library
INDPareto Front Approximation for Multi-Objective Session-Based Recommender Systems
by Timo Wilm (OTTO (GmbH & Co KG)), Philipp Normann (OTTO (GmbH & Co KG)) and Felix Stepprath (OTTO (GmbH & Co KG))

This work introduces MultiTRON, an approach that adapts Pareto front approximation techniques to multi-objective session-based recommender systems using a transformer neural network. Our approach optimizes trade-offs between key metrics such as click-through and conversion rates by training on sampled preference vectors. A significant advantage is that after training, a single model can access the entire Pareto front, allowing it to be tailored to meet the specific requirements of different stakeholders by adjusting an additional input vector that weights the objectives. We validate the model’s performance through extensive offline and online evaluation. For broader application and research, the source code1 is made available. The results confirm the model’s ability to manage multiple recommendation objectives effectively, offering a flexible tool for diverse business needs.

Full text in ACM Digital Library
INDPlaylist Search Reinvented: LLMs Behind the Curtain
by Geetha Sai Aluri (Amazon), Siddharth Sharma (Amazon), Tarun Sharma (Amazon) and Joaquin Delgado (Amazon)

Improving search functionality poses challenges such as data scarcity for model training, metadata enrichment for comprehensive document indexing, and the labor-intensive manual annotation for evaluation. Traditionally, iterative methods relying on human annotators and customer feedback have been used. However, recent advancements in Large Language Models (LLMs) offer new solutions. This paper focuses on applying LLMs to playlist search. Leveraging LLMs’ contextual understanding and generative capabilities automates metadata enrichment, reducing manual efforts and expediting training. LLMs also address data scarcity by generating synthetic training data and serve as scalable judges for evaluation, enhancing search performance assessment. We demonstrate how these innovations enhance playlist search, overcoming traditional limitations to improve search result accuracy and relevance.

Full text in ACM Digital Library
INDPowerful A/B-Testing Metrics and Where to Find Them
by Olivier Jeunen (ShareChat), Shubham Baweja (ShareChat), Neeti Pokharna (ShareChat) and Aleksei Ustimenko (ShareChat)

Online controlled experiments, colloquially known as A/B-tests, are the bread and butter of real-world recommender system evaluation. Typically, end-users are randomly assigned some system variant, and a plethora of metrics are then tracked, collected, and aggregated throughout the experiment. A North Star metric (e.g. long-term growth or revenue) is used to assess which system variant should be deemed superior. As a result, most collected metrics are supporting in nature, and serve to either (i) provide an understanding of how the experiment impacts user experience, or (ii) allow for confident decision-making when the North Star metric moves insignificantly (i.e. a false negative or type-II error). The latter is not straightforward: suppose a treatment variant leads to fewer but longer sessions, with more views but fewer engagements; should this be considered a positive or negative outcome?

The question then becomes: how do we assess a supporting metric’s utility when it comes to decision-making using A/B-testing? Online platforms typically run dozens of experiments at any given time. This provides a wealth of information about interventions and treatment effects that can be used to evaluate metrics’ utility for online evaluation. We propose to collect this information and leverage it to quantify type-I, type-II, and type-III errors for the metrics of interest, alongside a distribution of measurements of their statistical power (e.g. z-scores and p-values). We present results and insights from building this pipeline at scale for two large-scale short-video platforms: ShareChat and Moj; leveraging hundreds of past experiments to find online metrics with high statistical power.

Full text in ACM Digital Library
INDPrivacy Preserving Conversion Modeling in Data Clean Room
by Kungang Li (Pinterest), Xiangyi Chen (Pinterest), Ling Leng (Pinterest), Jiajing Xu (Pinterest), Jiankai Sun (Pinterest) and Behnam Rezaei (Pinterest)

In the realm of online advertising, accurately predicting the conversion rate (CVR) is crucial for enhancing advertising efficiency and user satisfaction. This paper addresses the challenge of CVR prediction while adhering to user privacy preferences and advertiser requirements. Traditional methods face obstacles such as the reluctance of advertisers to share sensitive conversion data and the limitations of model training in secure environments like data clean rooms. We propose a novel model training framework that enables collaborative model training without sharing sample-level gradients with the advertising platform. Our approach introduces several innovative components: (1) utilizing batch-level aggregated gradients instead of sample-level gradients to minimize privacy risks; (2) applying adapter-based parameter-efficient fine-tuning and gradient compression to reduce communication costs; and (3) employing de-biasing techniques to train the model under label differential privacy, thereby maintaining accuracy despite privacy-enhanced label perturbations. Our experimental results, conducted on industrial datasets, demonstrate that our method achieves competitive ROC-AUC performance while significantly decreasing communication overhead and complying with both advertisers’ privacy requirements and user privacy choices. This framework establishes a new standard for privacy-preserving, high-performance CVR prediction in the digital advertising landscape.

Full text in ACM Digital Library
INDRanking Across Different Content Types: The Robust Beauty of Multinomial Blending
by Jan Malte Lichtenberg (Amazon), Giuseppe Di Benedetto (Amazon) and Matteo Ruffini (Albatross AI)

An increasing number of media streaming services have expanded their offerings to include entities of multiple content types. For instance, audio streaming services that started by offering music only, now also offer podcasts, merchandise items, and videos. Ranking items across different content types into a single slate poses a significant challenge for traditional learning-to-rank (LTR) algorithms due to differing user engagement patterns for different content types. We explore a simple method for cross-content-type ranking, called multinomial blending (MB), which can be used in conjunction with most existing LTR algorithms. We compare MB to existing baselines not only in terms of ranking quality but also from other industry-relevant perspectives such as interpretability, ease-of-use, and stability in dynamic environments with changing user behavior and ranking model retraining. Finally, we report the results of an A/B test from an Amazon Music ranking use-case.

Full text in ACM Digital Library
INDScale-Invariant Learning-to-Rank
by Alessio Petrozziello (Expedia Group), Christian Sommeregger (Expedia Group) and Ye-Sheen Lim (Expedia Group)

At Expedia, learning-to-rank (LTR) models plays a key role on our website in sorting and presenting information more relevant to users, such as search filters, property rooms, amenities, and images. A major challenge in deploying these models is ensuring consistent feature scaling between training and production data, as discrepancies can lead to unreliable rankings when deployed. Normalization techniques like feature standardization and batch normalization could address these issues but are impractical in production due to latency impacts and the difficulty of distributed real-time inference. To address consistent feature scaling issue, we introduce a scale-invariant LTR framework which combines a deep and a wide neural network to mathematically guarantee scale-invariance in the model at both training and prediction time. We evaluate our framework in simulated real-world scenarios with injected feature scale issues by perturbing the test set at prediction time, and show that even with inconsistent train-test scaling, using framework achieves better performance than without.

Full text in ACM Digital Library
INDSelf-Auxiliary Distillation for Sample Efficient Learning in Google-Scale Recommenders
by Yin Zhang (Google DeepMind), Ruoxi Wang (Google DeepMind), Xiang Li (Google, Inc), Tiansheng Yao (Google, Inc), Andrew Evdokimov (Google, Inc), Jonathan Valverde (Google DeepMind), Yuan Gao (Google, Inc), Jerry Zhang (Google, Inc), Evan Ettinger (Google, Inc), Ed H. Chi (Google DeepMind) and Derek Zhiyuan Cheng (Google DeepMind)

Industrial recommendation systems process billions of daily user feedback which are complex and noisy. Efficiently uncovering user preference from these signals becomes crucial for high-quality recommendation. We argue that those signals are not inherently equal in terms of their informative value and training ability, which is particularly salient in industrial applications with multi-stage processes (e.g., augmentation, retrieval, ranking). Considering that, in this work, we propose a novel self-auxiliary distillation framework that prioritizes training on high-quality labels, and improves the resolution of low-quality labels through distillation by adding a bilateral branch-based auxiliary task. This approach enables flexible learning from diverse labels without additional computational costs, making it highly scalable and effective for Google-scale recommenders. Our framework consistently improved both offline and online key business metrics across three Google major products. Notably, self-auxiliary distillation proves to be highly effective in addressing the severe signal loss challenge posed by changes such as Apple iOS policy. It further delivered significant improvements in both offline (+17% AUC) and online metrics for a Google Apps recommendation system. This highlights the opportunities of addressing real-world signal loss problems through self-auxiliary distillation techniques.

Full text in ACM Digital Library
INDShort-form Video Needs Long-term Interests: An Industrial Solution for Serving Large User Sequence Models
by Yuening Li (Google), Diego Uribe (Google), Chuan He (Google), Jiaxi Tang (Google DeepMind), Qingyun Liu (Google DeepMind), Junjie Shan (Google), Ben Most (Google), Kaushik Kalyan (Google), Shuchao Bi (Google), Xinyang Yi (Google DeepMind), Lichan Hong (Google DeepMind), Ed Chi (Google DeepMind) and Liang Liu (Google)

Sequential models are invaluable for powering personalized recommendation systems. In the context of short-form video (SFV) feeds, where user behavior history is typically longer, systems must be able to understand users’ long-term interests. However, deploying large sequence models to extensive web-scale applications faces challenges due to high serving cost. To address this, we propose an industrial framework designed for efficiently serving large user sequence models. Specifically, the proposed infrastructure decouples serving of the user sequence model and the main recommendation model, with the user sequence model being served offline (asynchronously) with periodical refresh. The proposed infrastructure is also model-agnostic; thus, it can be used to support any type of user sequence models (even LLMs) with controllable costs. Empirical results show that large user models deployed with our framework significantly and consistently enhance the quality of the main recommendation model with minimal serving costs increase.

Full text in ACM Digital Library
INDSliding Window Training – Utilizing Historical Recommender Systems Data for Foundation Models
by Swanand Joshi (Netflix), Yesu Feng (Netflix), Ko-Jen Hsiao (Netflix), Zhe Zhang (Netflix) and Sudarshan Lamkhede (Netflix)

Long-lived recommender systems (RecSys) often encounter lengthy user-item interaction histories that span many years. To effectively learn long term user preferences, Large RecSys foundation models (FM) need to encode this information in pretraining. Usually, this is done by either generating a long enough sequence length to take all history sequences as input at the cost of large model input dimension or by dropping some parts of the user history to accommodate model size and latency requirements on the production serving side. In this paper, we introduce a sliding window training technique to incorporate long user history sequences during training time without increasing the model input dimension. We show the quantitative & qualitative improvements this technique brings to the RecSys FM in learning user long term preferences. We additionally show that the average quality of items in the catalog learnt in pretraining also improves.

Full text in ACM Digital Library
INDTaming the One-Epoch Phenomenon in Online Recommendation System by Two-stage Contrastive ID Pre-training
by Yi-Ping Hsu (Pinterest), Po-Wei Wang (Pinterest), Chantat Eksombatchai (Pinterest) and Jiajing Xu (Pinterest)

ID-based embeddings are widely used in web-scale online recommendation systems. However, their susceptibility to overfitting, particularly due to the long-tail nature of data distributions, often limits training to a single epoch, a phenomenon known as the “one-epoch problem.” This challenge has driven research efforts to optimize performance within the first epoch by enhancing convergence speed or feature sparsity. In this study, we introduce a novel two-stage training strategy that incorporates a pre-training phase using a minimal model with contrastive loss, enabling broader data coverage for the embedding system. Our offline experiments demonstrate that multi-epoch training during the pre-training phase does not lead to overfitting, and the resulting embeddings improve online generalization when fine-tuned for more complex downstream recommendation tasks. We deployed the proposed system in live traffic at Pinterest, achieving significant site-wide engagement gains.

Full text in ACM Digital Library
INDToward 100TB Recommendation Models with Embedding Offloading
by Intaik Park (Meta), Ehsan Ardestani (Meta), Damian Reeves (Meta), Sarunya Pumma (Meta), Henry Tsang (Meta), Levy Zhao (Meta), Jian He (Meta), Joshua Deng (Meta), Dennis Van der Staay (Meta), Yu Guo (Meta) and Paul Zhang (Meta)

Training recommendation models become memory-bound with large embedding tables, and fast GPU memory is scarce. In this paper, we explore embedding caches and prefetch pipelines to effectively leverage large but slow host memory for embedding tables. We introduce Locality-Aware Sharding and iterative planning that automatically size caches optimally and produce effective sharding plans. Embedding Offloading, a system that combines all of these components and techniques, is implemented on top of Meta’s open-source libraries, FBGEMM GPU and TorchRec, and it is used to improve scalability and efficiency of industry-scale production models. Embedding Offloading achieved 37x model scale to 100TB model size with only 26% training speed regression.

Full text in ACM Digital Library
INDTowards Understanding The Gaps of Offline And Online Evaluation Metrics: Impact of Series vs. Movie Recommendations
by Bora Edizel (Warner Bros. Discovery), Tim Sweetser (StubHub), Ashok Chandrashekar (Warner Bros. Discovery), Kamilia Ahmadi (Warner Bros. Discovery) and Puja Das (Warner Bros. Discovery)

Full text in ACM Digital Library
INDWhy the Shooting in the Dark Method Dominates Recommender Systems Practice
by David Rohde (Criteo)

The introduction of A/B Testing represented a great leap forward in recommender systems research. Like the randomized control trial for evaluating drug efficacy; A/B Testing has equipped recommender systems practitioners with a protocol for measuring performance as defined by actual business metrics and with minimal assumptions. While A/B testing provided a way to measure the performance of two or more candidate systems, it provides no guide for determining what policy we should test. The focus of this industry talk is to better understand, why the development of A/B testing was the last great leap forward in the development of reward optimizing recommender systems despite more than a decade of efforts in both industry and academia. The talk will survey: industry best practice, standard theories and tools including: collaborative filtering (MovieLens RecSys), contextual bandits, attribution, off-policy estimation, causal inference, click through rate models and will explain why we have converged on a fundamentally heuristic solution or guess and check type method. The talk will offer opinions about which of these theories are useful, and which are not and make a concrete proposal to make progress based on a non-standard use of deep learning tools.

Full text in ACM Digital Library

Accepted Contributions

RecSys 2024 (Bari)

Sapphire Supporter

Diamond Supporter

Platinum Supporter

Gold Supporter

Silver Supporter

Bronze Supporter

Women in RecSys’s Event Supporter

Challenge Sponsor

Special Supporters

About this site

RecSys 2026