Accepted Contributions

 

List of all long papers accepted for RecSys 2024 (in alphabetical order).
Check the Presenter Instructions for information about every type of oral presentation.
If you need to print your poster in Bari, follow these instructions.
 

  • RESA Multi-modal Modeling Framework for Cold-start Short-video Recommendation
    by Gaode Chen (Kuaishou Technology), Ruina Sun (Kuaishou Technology), Yuezihan Jiang (Kuaishou Technology), Jiangxia Cao (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Han Li (Kuaishou Technology), Kun Gai (Kuaishou Technology) and Xinghua Zhang (Chinese Academy of Sciences)

    Short video has witnessed rapid growth in the past few years in multimedia platforms. To ensure the freshness of the videos, platforms receive a large number of user-uploaded videos every day, making collaborative filtering-based recommender methods suffer from the item cold-start problem (e.g., the new-coming videos are difficult to compete with existing videos). Consequently, increasing efforts tackle the cold-start issue from the content perspective, focusing on modeling the multi-modal preferences of users, a fair way to compete with new-coming and existing videos. However, recent studies ignore the existing gap between multi-modal embedding extraction and user interest modeling as well as the discrepant intensities of user preferences for different modalities. In this paper, we propose M3CSR, a multi-modal modeling framework for cold-start short video recommendation. Specifically, we preprocess content-oriented multi-modal features for items and obtain trainable category IDs by performing clustering. In each modality, we combine modality-specific cluster ID embedding and the mapped original modality feature as modality-specific representation of the item to address the gap. Meanwhile, M3CSR measures the user modality-specific intensity based on the correlation between modality-specific interest and behavioral interest and employs pairwise loss to further decouple user multi-modal interests. Extensive experiments on four real-world datasets demonstrate the superiority of our proposed model. The framework has been deployed on a billion-user scale short video application and has shown improvements in various commercial metrics within cold-start scenarios.

    Full text in ACM Digital Library

  • RESA multimodal single-branch embedding network for recommendation in cold-start and missing modality scenarios
    by Christian Ganhör (Johannes Kepler University Linz), Marta Moscati (Johannes Kepler University Linz), Anna Hausberger (Johannes Kepler University Linz), Shah Nawaz (Johannes Kepler University Linz) and Markus Schedl (Johannes Kepler University Linz; Linz Institute of Technology)

    Most recommender systems adopt collaborative filtering (CF) and provide recommendations based on past collective interactions. Therefore, the performance of CF algorithms degrades when few or no interactions are available, a scenario referred to as cold-start. To address this issue, previous work relies on models leveraging both collaborative data and side information on the users or items. Similar to multimodal learning, these models aim at combining collaborative and content representations in a shared embedding space. In this work we propose a novel technique for multimodal recommendation, relying on a multimodal Single-Branch embedding network for Recommendation (SiBraR). Leveraging weight-sharing, SiBraR encodes interaction data as well as multimodal side information using the same single-branch embedding network on different modalities. This makes SiBraR effective in scenarios of missing modality, including cold start. Our extensive experiments on large-scale recommendation datasets from three different recommendation domains (music, movie, and e-commerce) and providing multimodal content information (audio, text, image, labels, and interactions) show that SiBraR significantly outperforms CF as well as state-of-the-art content-based RSs in cold-start scenarios, and is competitive in warm scenarios. We show that SiBraR’s recommendations are accurate in missing modality scenarios, and that the model is able to map different modalities to the same region of the shared embedding space, hence reducing the modality gap.

    Full text in ACM Digital Library

  • RESA Pre-trained Zero-shot Sequential Recommendation Framework via Popularity Dynamics
    by Junting Wang (Urbana-Champaign), Praneet Rathi (Urbana-Champaign) and Hari Sundaram (Urbana-Champaign)

    This paper proposes a novel pre-trained framework for zero-shot cross-domain sequential recommendation without auxiliary information. While using auxiliary information (e.g., item descriptions) seems promising for cross-domain transfer, a cross-domain adaptation of sequential recommenders can be challenging when the target domain differs from the source domain—item descriptions are in different languages; metadata modalities (e.g., audio, image, and text) differ across source and target domains. If we can learn universal item representations independent of the domain type (e.g., groceries, movies), we can achieve zero-shot cross-domain transfer without auxiliary information. Our critical insight is that user interaction sequences highlight shifting user preferences via the popularity dynamics of interacted items. We present a pre-trained sequential recommendation framework: PrepRec, which utilizes a novel popularity dynamics-aware transformer architecture. Through extensive experiments on five real-world datasets, we show that PrepRec, without any auxiliary information, can zero-shot adapt to new application domains and achieve competitive performance compared to state-of-the-art sequential recommender models. In addition, we show that PrepRec complements existing sequential recommenders. With a simple post-hoc interpolation, PrepRec improves the performance of existing sequential recommenders on average by 11.8% in Recall@10 and 22% in NDCG@10. We provide an anonymized implementation of PrepRec at https://github.com/CrowdDynamicsLab/preprec.

    Full text in ACM Digital Library

  • RESA Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendation
    by Zixuan Yi (University of Glasgow) and Iadh Ounis (University of Glasgow)

    With the rapid development of online multimedia services, especially in e-commerce platforms, there is a pressing need for personalised recommender systems that can effectively encode the diverse multi-modal content associated with each item. However, we argue that existing multi-modal recommender systems typically use isolated processes for both feature extraction and modality encoding. Such isolated processes can harm the recommendation performance. Firstly, an isolated extraction process underestimates the importance of effective feature extraction in multi-modal recommendations, potentially incorporating non-relevant information, which is harmful to item representations. Second, an isolated modality encoding process produces disjoint embeddings for item modalities due to the individual processing of each modality, which leads to a suboptimal fusion of user/item representations for an effective user preferences prediction. We hypothesise that the use of a unified model for addressing both aforementioned isolated processes will enable the consistent extraction and cohesive fusion of joint multi-modal features, thereby enhancing the effectiveness of multi-modal recommender systems. In this paper, we propose a novel model, called Unified multi-modal Graph Transformer (UGT), which firstly leverages a multi-way transformer to extract aligned multi-modal features from raw data for top-k recommendation. Subsequently, we build a unified graph neural network in our UGT model to jointly fuse the multi-modal user/item representations derived from the output of the multi-way transformer. Using the graph transformer architecture of our UGT model, we show that the UGT model achieves significant effectiveness gains, especially when jointly optimised with the commonly used recommendation losses. Our extensive experiments conducted on three benchmark datasets demonstrate that our proposed UGT model consistently outperforms nine existing state-of-the-art recommendation approaches and by up to 13.97% over the best baseline.

    Full text in ACM Digital Library

  • RESAccelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems
    by Yunfan Wu (Chinese Academy of Sciences), Qi Cao (Chinese Academy of Sciences), Shuchang Tao (Chinese Academy of Sciences), Kaike Zhang (Chinese Academy of Sciences), Fei Sun (Chinese Academy of Sciences) and Huawei Shen (Chinese Academy of Sciences)

    Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repetitive retraining is highly time-consuming, hindering the efficient assessment and optimization of fake users. To mitigate this computational bottleneck and develop a more effective attack in an affordable time, we analyze the retraining process and find that a change in the representation of one user/item will cause a cascading effect through the user-item interaction graph. Under theoretical guidance, we introduce Gradient Passing (GP), a novel technique that explicitly passes gradients between interacted user-item pairs during backpropagation, thereby approximating the cascading effect and accelerating retraining. With just a single update, GP can achieve effects comparable to multiple original training iterations. Under the same number of retraining epochs, GP enables a closer approximation of the surrogate recommender to the victim. This more accurate approximation provides better guidance for optimizing fake users, ultimately leading to enhanced data poisoning attacks. Extensive experiments on real-world datasets demonstrate the efficiency and effectiveness of our proposed GP.

    Full text in ACM Digital Library

  • RESAdaptive Fusion of Multi-View for Graph Contrastive Recommendation
    by Mengduo Yang (Zhejiang University), Yi Yuan (Zhejiang University), Jie Zhou (Zhejiang University), Meng Xi (Zhejiang University), Xiaohua Pan (Zhejiang University), Ying Li (Zhejiang University), Yangyang Wu (Zhejiang University), Jinshan Zhang (Zhejiang University) and Jianwei Yin (Zhejiang University)

    Recommendation is a key mechanism for modern users to access items of their interests from massive entities and information. Recently, graph contrastive learning (GCL) has demonstrated satisfactory results on recommendation, due to its ability to enhance representation by integrating graph neural networks (GNNs) with contrastive learning. However, those methods often generate contrastive views by performing random perturbation on edges or embeddings, which is likely to bring noise in representation learning. Besides, in all these methods, the degree of user preference on items is omitted during the representation learning process, which may cause incomplete user/item modeling. To address these limitations, we propose the Adaptive Fusion of Multi-View Graph Contrastive Recommendation (AMGCR) model. Specifically, to generate the informative and less noisy views for better contrastive learning, we design four view generators to learn the edge weights focusing on weight adjustment, feature transformation, neighbor aggregation, and attention mechanism, respectively. Then, we employ an adaptive multi-view fusion module to combine different views from both the view-shared and the view-specific levels. Moreover, to make the model capable of capturing preference information during the learning process, we further adopt a preference refinement strategy on the fused contrastive view. Experimental results on three real-world datasets demonstrate that AMGCR consistently outperforms the state-of-the-art methods, with average improvements of over 10% in terms of Recall and NDCG. Our code is available on https://github.com/Du-danger/AMGCR.

    Full text in ACM Digital Library

  • RESAIE: Auction Information Enhanced Framework for CTR Prediction in Online Advertising
    by Yang Yang (Huawei Noah’s Ark Lab), Bo Chen (Huawei Noah’s Ark Lab), Chenxu Zhu (Huawei Noah’s Ark Lab), Menghui Zhu (Huawei Noah’s Ark Lab), Xinyi Dai (Huawei Noah Ark’s Lab), Huifeng Guo (Huawei Noah Ark’s Lab), Muyu Zhang (Huawei Noah Ark’s Lab), Zhenhua Dong (Huawei Noah Ark’s Lab) and Ruiming Tang (Huawei Noah Ark’s Lab)

    Click-Through Rate (CTR) prediction is a fundamental technique for online advertising recommendation and the complex online competitive auction process also brings many difficulties to CTR optimization. Recent studies have shown that introducing posterior auction information contributes to the performance of CTR prediction. However, existing work doesn’t fully capitalize on the benefits of auction information and overlooks the data bias brought by the auction, leading to biased and suboptimal results. To address these limitations, we propose Auction Information Enhanced Framework (AIE) for CTR prediction in online advertising, which delves into the problem of insufficient utilization of auction signals and first reveals the auction bias. Specifically, AIE introduces two pluggable modules, namely Adaptive Market-price Auxiliary Module (AM2) and Bid Calibration Module (BCM), which work collaboratively to excavate the posterior auction signals better and enhance the performance of CTR prediction. Furthermore, the two proposed modules are lightweight, model-agnostic, and friendly to inference latency. Extensive experiments are conducted on a public dataset and an industrial dataset to demonstrate the effectiveness and compatibility of AIE. Besides, a one-month online A/B test in a large-scale advertising platform shows that AIE improves the base model by 5.76% and 2.44% in terms of eCPM and CTR, respectively.

    Full text in ACM Digital Library

  • RESBayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation
    by David Austin (University of Waterloo), Anton Korikov (University of Toronto), Armin Toroghi (University of Toronto) and Scott Sanner (University of Toronto)

    Designing preference elicitation (PE) methodologies that can quickly ascertain a user’s top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) enable fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but cannot generate arbitrary NL queries or reason over content in NL item descriptions – requiring users to express preferences via ratings or comparisons of unfamiliar items. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to actively elicit NL feedback to identify the best recommendation. Key challenges in generalizing BO to deal with natural language feedback include determining: (a) how to leverage LLMs to model the likelihood of NL preference feedback as a function of item utilities, and (b) how to design an acquisition function for NL BO that can elicit preferences in the infinite space of language. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses: 1) Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain Bayesian preference beliefs, and 2) BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to guide LLM query generation. We numerically evaluate our methods in controlled simulations, finding that after 10 turns of dialogue, PEBOL can achieve an MRR@10 of up to 0.27 compared to the best monolithic LLM baseline’s MRR@10 of 0.17, despite relying on earlier and smaller LLMs.

    Full text in ACM Digital Library

  • RESBiased User History Synthesis for Personalized Long-Tail Item Recommendation
    by Keshav Balasubramanian (University of Southern California), Abdulla Alshabanah (University of Southern California), Elan Markowitz (University of Southern California), Greg Ver Steeg (University of California Riverside) and Murali Annavaram (University of Southern California)

    Recommendation systems connect users to items and create value chains in the internet economy. Recommendation systems learn from past user-item interaction histories. As such, items that have short interaction histories, either because they are new or not popular, have been shown to be disproportionately under-recommended. This long-tail item problem can exacerbate model bias, and reinforce poor recommendation of tail items. In this paper, we propose biased user history synthesis, to not only address this problem but also achieve better personalization in recommendation systems. As a result, we concurrently improve tail and head item recommendation performance. Our approach is built on a tail item biased User Interaction History (UIH) sampling strategy and a synthesis model that produces an augmented user representation from the sampled user history. We provide a theoretical justification for our approach using information theory and demonstrate through extensive experimentation, that our model outperforms state-of-the-art baselines on tail, head, and overall recommendation. The source code is available at https://github.com/lkp411/BiasedUserHistorySynthesis.

    Full text in ACM Digital Library

  • RESBridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other?
    by Gustavo Penha (Spotify), Ali Vardasbi (Spotify), Enrico Palumbo (Spotify), Marco De Nadai (Spotify) and Hugues Bouchard (Spotify)

    Generative retrieval for search and recommendation is a promising paradigm for retrieving items, offering an alternative to traditional methods that depend on external indexes and nearest-neighbor searches. Instead, generative models directly associate inputs with item IDs. Given the breakthroughs of Large Language Models (LLMs), these generative systems can play a crucial role in centralizing a variety of Information Retrieval (IR) tasks in a single model that performs tasks such as query understanding, retrieval, recommendation, explanation, re-ranking, and response generation. Despite the growing interest in such a unified generative approach for IR systems, the advantages of using a single, multi-task model over multiple specialized models are not well established in the literature. This paper investigates whether and when such a unified approach can outperform task-specific models in the IR tasks of search and recommendation, broadly co-existing in multiple industrial online platforms, such as Spotify, YouTube, and Netflix. Previous work shows that (1) the latent representations of items learned by generative recommenders are biased towards popularity, and (2) content-based and collaborative-filtering-based information can improve an item’s representations. Motivated by this, our study is guided by two hypotheses: [H1] the joint training regularizes the estimation of each item’s popularity, and [H2] the joint training regularizes the item’s latent representations, where search captures content-based aspects of an item and recommendation captures collaborative-filtering aspects. Our extensive experiments with both simulated and real-world data support both [H1] and [H2] as key contributors to the effectiveness improvements observed in the unified search and recommendation generative models over the single-task approaches.

    Full text in ACM Digital Library

  • RESCALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation
    by Yaoyiran Li (University of Cambridge), Xiang Zhai (Google), Moustafa Alzantot (Google Research), Keyi Yu (Google), Ivan Vulić (University of Cambridge), Anna Korhonen (University of Cambridge) and Mohamed Hammad (Google)

    Traditional recommender systems such as matrix factorization methods have primarily focused on learning a shared dense embedding space to represent both items and user preferences. Subsequently, sequence models such as RNN, GRUs, and, recently, Transformers have emerged and excelled in the task of sequential recommendation. This task requires understanding the sequential structure present in users’ historical interactions to predict the next item they may like. Building upon the success of Large Language Models (LLMs) in a variety of tasks, researchers have recently explored using LLMs that are pretrained on vast corpora of text for sequential recommendation. To use LLMs for sequential recommendation, both the history of user interactions and the model’s prediction of the next item are expressed in text form. We propose CALRec, a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss: the LLM is first finetuned on a data mixture from multiple domains followed by another round of target domain finetuning. Our model significantly outperforms many state-of-the-art baselines (+37% in Recall@1 and +24% in NDCG@10) and our systematic ablation studies reveal that (i) both stages of finetuning are crucial, and, when combined, we achieve improved performance, and (ii) contrastive alignment is effective among the target domains explored in our experiments.

    Full text in ACM Digital Library

  • RESConFit: Improving Resume-Job Matching using Data Augmentation and Contrastive Learning
    by Xiao Yu (Columbia University), Jinzhong Zhang (Intellipro Group Inc.) and Zhou Yu (Columbia University)

    A reliable resume-job matching system helps a company find suitable candidates from a pool of resumes, and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction records in resume-job datasets are sparse. Different from many prior work that use complex modeling techniques, we tackle this sparsity problem using data augmentations and a simple contrastive learning approach. ConFit first formulates resume-job datasets as a sparse bipartite graph, and creates an augmented dataset by paraphrasing specific sections in a resume or a job post. Then, ConFit finetunes pre-trained encoders with contrastive learning to further increase training samples from B pairs per batch to
    O(B2) per batch. We evaluate ConFit on two real-world datasets and find it outperforms prior methods (including BM25 and OpenAI text-ada-002) by up to 19% and 31% absolute in nDCG@10 for ranking jobs and ranking resumes, respectively. We believe ConFit’s simple yet highly performant approach lays a strong foundation for future research in modeling person-job fit

    Full text in ACM Digital Library

  • RESCross-Domain Latent Factors Sharing via Implicit Matrix Factorization
    by Abdulaziz Samra (Skolkovo Institute of Science and Technology), Evgeny Frolov (AIRI; Skolkovo Institute of Science and Technology), Alexey Vasilev (Sber), Alexander Grigorevskiy (Independent researcher) and Anton Vakhrushev (Independent researcher)

    Data sparsity has been one of the long-standing problems for recommender systems. One of the solutions to mitigate this issue is to exploit knowledge available in other source domains. However, many cross-domain recommender systems introduce a complex architecture that makes them less scalable in practice. On the other hand, matrix factorization methods are still considered to be strong baselines for single-domain recommendations. In this paper, we introduce the CDIMF, a model that extends the standard implicit matrix factorization with ALS to cross-domain scenarios. We apply the Alternating Direction Method of Multipliers to learn shared latent factors for overlapped users while factorizing the interaction matrix. In a dual-domain setting, experiments on industrial datasets demonstrate a competing performance of CDIMF for both cold-start and warm-start. The proposed model can outperform most other recent cross-domain and single-domain models. We also provide the code to reproduce experiments on GitHub.

    Full text in ACM Digital Library

  • RESDiscerning Canonical User Representation for Cross-Domain Recommendation
    by Siqian Zhao (University at Albany – SUNY) and Sherry Sahebi (University at Albany – SUNY)

    Cross-domain recommender systems (CDRs) aim to enhance recommendation outcomes by information transfer across different domains. Existing CDRs have investigated the learning of both domain-specific and domain-shared user preferences to enhance recommendation performance. However, these models typically allow the disparities between shared and distinct user preferences to emerge freely in any space, lacking sufficient constraints to identify differences between two domains and to ensure that both domains are considered simultaneously. Canonical Correlation Analysis (CCA) has shown promise for transferring information between domains. However, CCA only models domain similarities and fails to capture the potential differences between user preferences in different domains. We propose Discerning Canonical User Representation for Cross-Domain Recommendation (DiCUR-CDR) that learns domain-shared and domain-specific user representations simultaneously considering both domains’ latent spaces. DiCUR-CDR introduces Discerning Canonical Correlation (DisCCA) user representation learning, a novel design of non-linear CCA for mapping user representations. Unlike prior CCA models that only model the domain-shared multivariate representations by finding their linear transformations, DisCCA uses the same transformations to discover the domain-specific representations too. We compare DiCUR-CDR against several state-of-the-art approaches using two real-world datasets and demonstrate the significance of separately learning shared and specific user representations via DisCCA.

    Full text in ACM Digital Library

  • RESDistillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Models
    by Yu Cui (Zhejiang University), Feng Liu (OPPO Research Institute), Pengbo Wang (University of Electronic Science and Technology of China), Bohao Wang (Zhejiang University), Heng Tang (Zhejiang University), Yi Wan (OPPO Research Institute), Jun Wang (OPPO Research Institute) and Jiawei Chen (Zhejiang University)

    Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conventional sequential models. It encounters three challenges: 1) the teacher’s knowledge may not always be reliable; 2) the capacity gap between the teacher and student makes it difficult for the student to assimilate the teacher’s knowledge; 3) divergence in semantic space poses a challenge to distill the knowledge from embeddings.

    To tackle these challenges, this work proposes a novel distillation strategy, DLLM2Rec, specifically tailored for knowledge distillation from LLM-based recommendation models to conventional sequential models. DLLM2Rec comprises: 1) Importance-aware ranking distillation, which filters reliable and student-friendly knowledge by weighting instances according to teacher confidence and student-teacher consistency; 2) Collaborative embedding distillation integrates knowledge from teacher embeddings with collaborative signals mined from the data. Extensive experiments demonstrate the effectiveness of the proposed DLLM2Rec, boosting three typical sequential models with an average improvement of 47.97%, even enabling them to surpass LLM-based recommenders in some cases.

    Full text in ACM Digital Library

  • RESDNS-Rec: Data-aware Neural Architecture Search for Recommender Systems
    by Sheng Zhang (City University of Hong Kong), Maolin Wang (City University of Hong Kong), Xiangyu Zhao (City University of Hong Kong), Ruocheng Guo (ByteDance Research), Yao Zhao (Ant Group) and Chenyi Zhuang (Ant Group),
    Jinjie Gu (Ant Group), Zijian Zhang (Jilin University) and Hongzhi Yin (The University of Queensland)

    In the era of data proliferation, efficiently sifting through vast information to extract meaningful insights has become increasingly crucial. This paper addresses the computational overhead and resource inefficiency prevalent in existing Sequential Recommender Systems (SRSs). We introduce an innovative approach combining pruning methods with advanced model designs. Furthermore, we delve into resource-constrained Neural Architecture Search (NAS), an emerging technique in recommender systems, to optimize models in terms of FLOPs, latency, and energy consumption while maintaining or enhancing accuracy. Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec). DNS-Rec is specifically designed to tailor compact network architectures for attention-based SRS models, thereby ensuring accuracy retention. It incorporates data-aware gates to enhance the performance of the recommendation network by learning information from historical user-item interactions. Moreover, DNS-Rec employs a dynamic resource constraint strategy, stabilizing the search process and yielding more suitable architectural solutions. We demonstrate the effectiveness of our approach through rigorous experiments conducted on three benchmark datasets, which highlight the superiority of DNS-Rec in SRSs. Our findings set a new standard for future research in efficient and accurate recommendation systems, marking a significant step forward in this rapidly evolving field.

    Full text in ACM Digital Library

  • RESDynamic Stage-aware User Interest Learning for Heterogeneous Sequential Recommendation
    by Weixin Li (Shenzhen University), Xiaolin Lin (Shenzhen University), Weike Pan (Shenzhen University) and Zhong Ming (Shenzhen Technology University)

    Sequential recommendation has been widely used to predict users’ potential preferences by learning their dynamic user interests, for which most previous methods focus on capturing item-level dependencies. Despite the great success, they often overlook the stage-level interest dependencies. In real-world scenarios, user interests tend to be staged, e.g., following an item purchase, a user’s interests may undergo a transition into the subsequent phase. And there are intricate dependencies across different stages. Meanwhile, users’ behaviors are usually heterogeneous, including auxiliary behaviors (e.g., examinations) and target behaviors (e.g., purchases), which imply more fine-grained user interests. However, existing methods have limitations in explicitly modeling the relationships between the different types of behaviors. To address the above issues, we propose a novel framework, i.e., dynamic stage-aware user interest learning (DSUIL), for heterogeneous sequential recommendation, which is the first solution to model user interests in a cross-stage manner. Specifically, our DSUIL consists of four modules: (1) a dynamic graph construction module transforms a heterogeneous sequence into several subgraphs to model user interests in a stage-wise manner; (2) a dynamic graph convolution module dynamically learns item representations in each subgraph; (3) a behavior-aware subgraph representation learning module learns the heterogeneous dependencies between behaviors and aggregates item representations to represent the staged user interests; and (4) an interest evolving pattern extractor learns the users’ overall interests for the item prediction. Extensive experimental results on two public datasets show that our DSUIL performs significantly better than the state-of-the-art methods.

    Full text in ACM Digital Library

  • RESEffective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
    by Tatsuhiro Shimizu (Independent Researcher) and Koichi Tanaka (Keio Univercity),
    Ren Kishimoto (Tokyo Institute of Technology), Haruka Kiyohara (Cornell University), Masahiro Nomura (CyberAgent, Inc.) and Yuta Saito (Cornell University)

    We explore off-policy evaluation and learning (OPE/L) in contextual combinatorial bandits (CCB), where a policy selects a subset in the action space. For example, it might choose a set of furniture pieces (a bed and a drawer) from available items (bed, drawer, chair, etc.) for interior design sales. This setting is widespread in fields such as recommender systems and healthcare, yet OPE/L of CCB remains unexplored in the relevant literature. Typical OPE/L methods such as regression and importance sampling can be applied to the CCB problem, however, they face significant challenges due to high bias or variance, exacerbated by the exponential growth in the number of available subsets. To address these challenges, we introduce a concept of factored action space, which allows us to decompose each subset into binary indicators. This formulation allows us to distinguish between the “main effect” derived from the main actions, and the “residual effect”, originating from the supplemental actions, facilitating more effective OPE. Specifically, our estimator, called OPCB, leverages an importance sampling-based approach to unbiasedly estimate the main effect, while employing regression-based approach to deal with the residual effect with low variance. OPCB achieves substantial variance reduction compared to conventional importance sampling methods and bias reduction relative to regression methods under certain conditions, as illustrated in our theoretical analysis. Experiments demonstrate OPCB’s superior performance over typical methods in both OPE and OPL.

    Full text in ACM Digital Library

  • RESEmbedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBark
    by Shijie Liu (NVIDIA Corporation), Nan Zheng (NVIDIA Corporation), Hui Kang (NVIDIA Corporation), Xavier Simmons (NVIDIA Corporation), Junjie Zhang (NVIDIA Corporation), Matthias Langer (NVIDIA Corporation), Wenjing Zhu (NVIDIA Corporation), Minseok Lee (NVIDIA Corporation) and Zehuan Wang (NVIDIA Corporation)

    Training large-scale deep learning recommendation models (DLRMs) with embedding tables stretching across multiple GPUs in a cluster presents a unique challenge, demanding the efficient scaling of embedding operations that require substantial memory and network bandwidth within a hierarchical network of GPUs. To tackle this bottleneck, we introduce EMBark—a comprehensive solution aimed at enhancing embedding performance and overall DLRM training throughput at scale. EMBark empowers users to create and customize sharding strategies, and features a highly-automated sharding planner, to accelerate diverse model architectures on different cluster configurations. EMBark groups embedding tables, considering their preferred communication compression method to reduce communication overheads effectively. It embraces efficient data-parallel category distribution, combined with topology-aware hierarchical communication, and pipelining support to maximize the DLRM training throughput. Across four representative DLRM variants (DLRM-DCNv2, T180, T200, and T510), EMBark achieves an average end-to-end training throughput speedup of 1.5 × and up to 1.77 × over traditional table-row-wise sharding approaches.

    Full text in ACM Digital Library

  • RESEnd-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling
    by Zexu Sun (Renmin University of China), Hao Yang (Renmin University of China), Dugang Liu (Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)), Yunpeng Weng (Tencent), Xing Tang (Tencent) and Xiuqiang He (Tencent)

    In modern online platforms, incentives (e.g., discounts, bonus) are essential factors that enhance user engagement and increase platform revenue. Over recent years, uplift modeling has been introduced as a strategic approach to assign incentives to individual customers. Especially in many real-world applications, online platforms can only incentivize customers with specific budget constraints. This problem can be reformulated as the multi-choice knapsack problem (MCKP). The objective of this optimization is to select the optimal incentive for each customer to maximize the return on investment (ROI). Recent works in this field frequently tackle the budget allocation problem using a two-stage approach. However, this solution is confronted with the following challenges: (1) The causal inference methods often ignore the domain knowledge in online marketing, where the expected response curve of a customer should be monotonic and smooth as the incentive increases. (2) There is an optimality gap between the two stages, resulting in inferior sub-optimal allocation performance due to the loss of the incentive recommendation information for the uplift prediction under the limited budget constraint. To address these challenges, we propose a novel End-to-End Cost-Effective Incentive Recommendation (E3IR) model under the budget constraint. Specifically, our methods consist of two modules, i.e., the uplift prediction module and the differentiable allocation module. In the uplift prediction module, we construct prediction heads to capture the incremental improvement between adjacent treatments with the marketing domain constraints (i.e., monotonic and smooth). We incorporate integer linear programming (ILP) as a differentiable layer input in the differentiable allocation module. Furthermore, we conduct extensive experiments on public and real product datasets, demonstrating that our E3IR improves allocation performance compared to existing two-stage approaches.

    Full text in ACM Digital Library

  • RESFair Reciprocal Recommendation in Matching Markets
    by Yoji Tomita (CyberAgent Inc.) and Tomohiko Yokoyama (The University of Tokyo)

    Recommender systems play an increasingly crucial role in shaping people’s opportunities, particularly in online dating platforms. It is essential from the user’s perspective to increase the probability of matching with a suitable partner while ensuring an appropriate level of fairness in the matching opportunities.

    We investigate reciprocal recommendation in two-sided matching markets between agents divided into two sides. In our model, a match is considered successful only when both individuals express interest in each other. Additionally, we assume that agents prefer to appear prominently in the recommendation lists presented to those on the other side. We define each agent’s opportunity to be recommended and introduce its fairness criterion, envy-freeness, from the perspective of fair division theory. The recommendations that approximately maximize the expected number of matches, empirically obtained by heuristic algorithms, are likely to result in significant unfairness of opportunity. Therefore, there can be a trade-off between maximizing the expected matches and ensuring fairness of opportunity. To address this challenge, we propose a method to find a policy that is close to being envy-free by leveraging the Nash social welfare function. Experiments on synthetic and real-world datasets demonstrate the effectiveness of our approach in achieving both relatively high expected matches and fairness for opportunities of both sides in reciprocal recommender systems.

    Full text in ACM Digital Library

  • RESFairCRS: Towards User-oriented Fairness in Conversational Recommendation Systems
    by Qin Liu (Jinan University), Xuan Feng (Jinan University), Tianlong Gu (Jinan University) and Xiaoli Liu (Jinan University)

    Conversational Recommendation Systems (CRSs) enable recommender systems to explicitly acquire user preferences during multi-turn interactions, providing more accurate and personalized recommendations. However, the data imbalance in CRSs, due to inconsistent interaction history among users, may lead to disparate treatment for disadvantaged user groups. In this paper, we investigate the discriminate problems in CRS from the user’s perspective, called as user-oriented fairness. To reveal the unfairness problems of different user groups in CRS, we conduct extensive empirical analyses. To mitigate user unfairness, we propose a user-oriented fairness framework, named FairCRS, which is a model-agnostic framework. In particular, we develop a user-embedding reconstruction mechanism that enriches user embeddings by incorporating more interaction information, and design a user-oriented fairness strategy that optimizes the recommendation quality differences among user groups while alleviating unfairness. Extensive experimental results on English and Chinese datasets show that FairCRS outperforms state-of-the-art CRSs in terms of overall recommendation performance and user fairness.

    Full text in ACM Digital Library

  • RESFedLoCA: Low-Rank Coordinated Adaptation with Knowledge Decoupling for Federated Recommendations
    by Yuchen Ding (University of Science and Technology of China), Siqing Zhang (University of Science and Technology of China), Boyu Fan (University of Helsinki), Wei Sun (University of Science and Technology of China), Yong Liao (University of Science and Technology of China) and Peng Yuan Zhou (Aarhus University)

    Privacy protection in recommendation systems is gaining increasing attention, for which federated learning has emerged as a promising solution. Current federated recommendation systems grapple with high communication overhead due to sharing dense global embeddings, and also poorly reflect user preferences due to data heterogeneity. To overcome these challenges, we propose a two-stage Federated Low-rank Coordinated Adaptation (FedLoCA) framework to decouple global and client-specific knowledge into low-rank embeddings, which significantly reduces communication overhead while enhancing the system’s ability to capture individual user preferences amidst data heterogeneity. Further, to tackle gradient estimation inaccuracies stemming from data sparsity in federated recommendation systems, we introduce an adversarial gradient projected descent approach in low-rank spaces, which significantly boosts model performance while maintaining robustness. Remarkably, FedLoCA also alleviates performance loss even under the stringent constraints of differential privacy. Extensive experiments on various real-world datasets demonstrate that FedLoCA significantly outperforms existing methods in both recommendation accuracy and communication efficiency.

    Full text in ACM Digital Library

  • RESFLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction
    by Hangyu Wang (Shanghai Jiao Tong University), Jianghao Lin (Shanghai Jiao Tong University), Xiangyang Li (Huawei Noah’s Ark Lab), Bo Chen (Huawei Noah’s Ark Lab), Chenxu Zhu (Huawei Noah’s Ark Lab), Ruiming Tang (Huawei Noah’s Ark Lab), Weinan Zhang (Shanghai Jiao Tong University) and Yong Yu (Shanghai Jiao Tong University)

    Click-through rate (CTR) prediction plays as a core function module in various personalized online services. The traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality, which capture the collaborative signals via feature interaction modeling. But the one-hot encoding discards the semantic information included in the textual features. Recently, the emergence of Pretrained Language Models (PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality obtained by hard prompt templates and adopts PLMs to extract the semantic knowledge. However, PLMs often face challenges in capturing field-wise collaborative signals and distinguishing features with subtle textual differences. In this paper, to leverage the benefits of both paradigms and meanwhile overcome their limitations, we propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for CTR prediction. Unlike most methods that solely rely on global views through instance-level contrastive learning, we design a novel jointly masked tabular/language modeling task to learn fine-grained alignment between tabular IDs and word tokens. Specifically, the masked data of one modality (i.e., IDs and tokens) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose to jointly finetune the ID-based model and PLM by adaptively combining the output of both models, thus achieving superior performance in downstream CTR prediction tasks. Extensive experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible with various ID-based models and PLMs. The code is available.

    Full text in ACM Digital Library

  • RESImproving Adversarial Robustness for Recommendation Model via Cross-Domain Distributional Adversarial Training
    by Jingyu Chen (Sichuan University), Lilin Zhang (Sichuan University) and Ning Yang (Sichuan University)

    Recommendation models based on deep learning are fragile when facing adversarial examples (AE). Adversarial training (AT) is the existing mainstream method to promote the adversarial robustness of recommendation models. However, these AT methods often have two drawbacks. First, they may be ineffective due to the ubiquitous sparsity of interaction data. Second, point-wise perturbation used by these AT methods leads to suboptimal adversarial robustness, because not all examples are equally susceptible to such perturbations. To overcome these issues, we propose a novel method called Cross-domain Distributional Adversarial Training (CDAT) which utilizes a richer auxiliary domain to improve the adversarial robustness of a sparse target domain. CDAT comprises a Domain adversarial network (Dan) and a Cross-domain adversarial example generative network (Cdan). Dan learns a domain-invariant preference distribution which is obtained by aligning user embeddings from two domains and paves the way to leverage the knowledge from another domain for the target domain. Then, by adversarially perturbing the domain-invariant preference distribution under the guidance of a discriminator, Cdan captures an aggressive and imperceptible AE distribution. In this way, CDAT can transfer distributional adversarial robustness from the auxiliary domain to the target domain. The extensive experiments conducted on real datasets demonstrate the remarkable superiority of the proposed CDAT in improving the adversarial robustness of the sparse domain. The codes and datasets are available on https://github.com/HymanLoveGIN/CDAT.

    Full text in ACM Digital Library

  • RESImproving the Shortest Plank: Vulnerability-Aware Adversarial Training for Robust Recommender System
    by Kaike Zhang (Chinese Academy of Sciences), Qi Cao (Chinese Academy of Sciences), Yunfan Wu (Chinese Academy of Sciences), Fei Sun (Chinese Academy of Sciences), Huawei Shen (Chinese Academy of Sciences) and Xueqi Cheng (Chinese Academy of Sciences)

    Recommender systems play a pivotal role in mitigating information overload in various fields. Nonetheless, the inherent openness of these systems introduces vulnerabilities, allowing attackers to insert fake users into the system’s training data to skew the exposure of certain items, known as poisoning attacks. Adversarial training has emerged as a notable defense mechanism against such poisoning attacks within recommender systems. Existing adversarial training methods apply perturbations of the same magnitude across all users to enhance system robustness against attacks. Yet, in reality, we find that attacks often affect only a subset of users who are vulnerable. These perturbations of indiscriminate magnitude make it difficult to balance effective protection for vulnerable users without degrading recommendation quality for those who are not affected. To address this issue, our research delves into understanding user vulnerability. Considering that poisoning attacks pollute the training data, we note that the higher degree to which a recommender system fits users’ training data correlates with an increased likelihood of users incorporating attack information, indicating their vulnerability. Leveraging these insights, we introduce the Vulnerability-aware Adversarial Training (VAT), designed to defend against poisoning attacks in recommender systems. VAT employs a novel vulnerability-aware function to estimate users’ vulnerability based on the degree to which the system fits them. Guided by this estimation, VAT applies perturbations of adaptive magnitude to each user, not only reducing the success ratio of attacks but also preserving, and potentially enhancing, the quality of recommendations. Comprehensive experiments confirm VAT’s superior defensive capabilities across different recommendation models and against various types of attacks.

    Full text in ACM Digital Library

  • RESInformation-Controllable Graph Contrastive Learning for Recommendation
    by Zirui Guo (Beijing University of Posts and Telecommunications), Yanhua Yu (Beijing University of Posts and Telecommunications), Yuling Wang (Hangzhou Dianzi University), Kangkang Lu (Beijing University of Posts and Telecommunications), Zixuan Yang (Beijing University of Posts and Telecommunications), Liang Pang (Chinese Academy of Sciences) and Tat-Seng Chua (National University of Singapore)

    In the evolving landscape of recommender systems, Graph Contrastive Learning (GCL) has become a prominent method for enhancing recommendation performance by alleviating the issue of data sparsity. However, existing GCL-based recommendations often overlook the control of shared information between the contrastive views. In this paper, we initially analyze and experimentally demonstrate these methods often lead to the issue of augmented representation collapse, where the representations between views become excessively similar, diminishing their distinctiveness. To address this issue, we propose the Information-Controllable Graph Contrastive Learning (IGCL) framework, a novel approach that focuses on optimizing the shared information between views to include as much relevant information for the recommendation task as possible while maintaining an appropriate level. In particular, we design the Collaborative Signals Enhanced Augmentation module to infuse the augmented representation with rich, task-relevant collaborative signals. Furthermore, the Information-Controllable Contrastive Learning module is designed to direct control over the magnitude of shared information between the contrastive views to avoid over-similarity. Extensive experiments on three public datasets demonstrate the effectiveness of IGCL, showcasing significant improvements in performance and the capability to alleviate augmented representation collapse.

    Full text in ACM Digital Library

  • RESInstructing and Prompting Large Language Models for Explainable Cross-domain Recommendations
    by Alessandro Petruzzelli (University of Bari Aldo Moro), Cataldo Musto (University of Bari), Lucrezia Laraspata (University of Bari), Ivan Rinaldi (University of Bari Aldo Moro), Marco de Gemmis (University of Bari Aldo Moro), Pasquale Lops (University of Bari) and Giovanni Semeraro (University of Bari)

    In this paper, we present a strategy to provide users with explainable cross-domain recommendations (CDR) that exploits large language models (LLMs). Generally speaking, CDR is a task that is hard to tackle, mainly due to data sparsity issues. Indeed, CDR models require a large amount of data labeled in both source and target domains, which are not easy to collect. Accordingly, our approach relies on the intuition that the knowledge that is already encoded in LLMs can be used to more easily bridge the domains and seamlessly provide users with personalized cross-domain suggestions.

    To this end, we designed a pipeline to: (a) instruct a LLM to handle a CDR task; (b) design a personalized prompt, based on the preferences of the user in a source domain, and a list of items to be ranked in target domain; (c) feed the LLM with the prompt, in both zero-shot and one-shot settings, and process the answer in order to extract the recommendations and a natural language explanation. As shown in the experimental evaluation, our approach beats several established state-of-the-art baselines for CDR in most of the experimental settings, thus showing the effectiveness of LLMs also in this novel and scarcely investigated scenario.

    Full text in ACM Digital Library

  • RESLARR: Large Language Model Aided Real-time Scene Recommendation with Semantic Understanding
    by Zhizhong Wan (Meituan), Bin Yin (Meituan), Junjie Xie (Meituan), Fei Jiang (Meituan), Xiang Li (Meituan) and Wei Lin (Meituan)

    Click-Through Rate (CTR) prediction is crucial for Recommendation System(RS), aiming to provide personalized recommendation services for users in many aspects such as food delivery, e-commerce and so on. However, traditional RS relies on collaborative signals, which lacks semantic understanding to real-time scenes. We also noticed that a major challenge in utilizing Large Language Models (LLMs) for practical recommendation purposes is their efficiency in dealing with long text input. To break through the problems above, we propose Large Language Model Aided Real-time Scene Recommendation(LARR), adopt LLMs for semantic understanding, utilizing real-time scene information in RS without requiring LLM to process the entire real-time scene text directly, thereby enhancing the efficiency of LLM-based CTR modeling. Specifically, recommendation domain-specific knowledge is injected into LLM and then RS employs an aggregation encoder to build real-time scene information from separate LLM’s outputs. Firstly, a LLM is continual pretrained on corpus built from recommendation data with the aid of special tokens. Subsequently, the LLM is fine-tuned via contrastive learning on three kinds of sample construction strategies. Through this step, LLM is transformed into a text embedding model. Finally, LLM’s separate outputs for different scene features are aggregated by an encoder, aligning to collaborative signals in RS, enhancing the performance of recommendation model.

    Full text in ACM Digital Library

  • RESLow Rank Field-Weighted Factorization Machines for Low Latency Item Recommendation
    by Alex Shtoff (Yahoo Research), Michael Viderman (Yahoo Research), Naama Haramaty-Krasne, Oren Somekh (Yahoo Research), Ariel Raviv (Meta) and Tularam Ban (Yahoo Research)

    Factorization machine (FM) variants are widely used in recommendation systems that operate under strict throughput and latency requirements, such as online advertising systems. FMs have two prominent strengths. First, is their ability to model pairwise feature interactions while being resilient to data sparsity by learning factorized representations. Second, their computational graphs facilitate fast inference and training. Moreover, when items are ranked as a part of a query for each incoming user, these graphs facilitate computing the portion stemming from the user and context fields only once per query. Thus, the computational cost for each ranked item is proportional only to the number of fields that vary among the ranked items. Consequently, in terms of inference cost, the number of user or context fields is practically unlimited.

    More advanced variants of FMs, such as field-aware and field-weighted FMs, provide better accuracy by learning a representation of field-wise interactions, but require computing all pairwise interaction terms explicitly. In particular, the computational cost during inference is proportional to the square of the number of fields, including user, context, and item. When the number of fields is large, this is prohibitive in systems with strict latency constraints, and imposes a limit on the number of user and context fields for a given computational budget. To mitigate this caveat, heuristic pruning of low intensity field interactions is commonly used to accelerate inference.

    In this work we propose an alternative to the pruning heuristic in field-weighted FMs using a diagonal plus symmetric low-rank decomposition. Our technique reduces the computational cost of inference, by allowing it to be proportional to the number of item fields only. Using a set of experiments on real-world datasets, we show that aggressive rank reduction outperforms similarly aggressive pruning in both accuracy and item recommendation speed. Beyond computational complexity analysis, we corroborate our claim of faster inference experimentally, both via a synthetic test, and by having deployed our solution to a major online advertising system, where we observed significant ranking latency improvements. We have made the code to reproduce the results on public datasets and synthetic tests available at https://github.com/michaelviderman/pytorch-fm.

    Full text in ACM Digital Library

  • RESMARec: Metadata Alignment for cold-start Recommendation
    by Julien Monteil (Amazon), Volodymyr Vaskovych (Amazon), Wentao Lu (Amazon), Anirban Majumder (Amazon) and Anton van den Hengel (University of Adelaide)

    For many recommender systems, the primary data source is a historical record of user clicks. The associated click matrix is often very sparse, as the number of users × products can be far larger than the number of clicks. Such sparsity is accentuated in cold-start settings, which makes the efficient use of metadata information of paramount importance. In this work, we propose a simple approach to address cold-start recommendations by leveraging content metadata, Metadata Alignment for cold-start Recommendation (MARec). We show that this approach can readily augment existing matrix factorization and autoencoder approaches, enabling a smooth transition to top performing algorithms in warmer set-ups. Our experimental results indicate three separate contributions: first, we show that our proposed framework largely beats SOTA results on 4 cold-start datasets with different sparsity and scale characteristics, with gains ranging from +8.4% to +53.8% on reported ranking metrics; second, we provide an ablation study on the utility of semantic features, and proves the additional gain obtained by leveraging such features ranges between +46.8% and +105.5%; and third, our approach is by construction highly competitive in warm set-ups, and we propose a closed-form solution outperformed by SOTA results by only 0.8% on average.

    Full text in ACM Digital Library

  • RESMLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction
    by Zhiming Yang (Northwestern Polytechnical University), Haining Gao (Alibaba Group), Dehong Gao (Northwestern Polytechnical University), Luwei Yang (Alibaba Group), Libin Yang (Northwestern Polytechnical University), Xiaoyan Cai (Northwestern Polytechnical University), Wei Ning (Alibaba Group) and Guannan Zhang (Alibaba Group)

    Click-through rate (CTR) prediction is one of the fundamental tasks in the industry, especially in e-commerce, social media, and streaming media. It directly impacts website revenues, user satisfaction, and user retention. However, real-world production platforms often encompass various domains to cater for diverse customer needs. Traditional CTR prediction models struggle in multi-domain recommendation scenarios, facing challenges of data sparsity and disparate data distributions across domains. Existing multi-domain recommendation approaches introduce specific-domain modules for each domain, which partially address these issues but often significantly increase model parameters and lead to insufficient training. In this paper, we propose a Multi-domain Low-Rank Adaptive network (MLoRA) for CTR prediction, where we introduce a specialized LoRA module for each domain. This approach enhances the model’s performance in multi-domain CTR prediction tasks and is able to be applied to various deep-learning models. We evaluate the proposed method on several multi-domain datasets. Experimental results demonstrate our MLoRA approach achieves a significant improvement compared with state-of-the-art baselines. Furthermore, we deploy it in the production environment of the Alibaba.COM. The online A/B testing results indicate the superiority and flexibility in real-world production environments. The code of our MLoRA is publicly available.

    Full text in ACM Digital Library

  • RESMMGCL: Meta Knowledge-Enhanced Multi-view Graph Contrastive Learning for Recommendations
    by Yuezihan Jiang (Kuaishou Technology), Changyu Li (Kuaishou Technology), Gaode Chen (Chinese Academy of Sciences), Peiyi Li (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Peng Jiang (Kuaishou Inc.), Fei Sun (China) and Wentao Zhang (Peking University)

    Multi-view Graph Learning is popular in recommendations due to its ability to capture relationships and connections across multiple views. Existing multi-view graph learning methods generally involve constructing graphs of views and performing information aggregation on view representations. Despite their effectiveness, they face two data limitations: Multi-focal Multi-source data noise and multi-source Data Sparsity. The former arises from the combination of noise from individual views and conflicting edges between views when information from all views is combined. The latter occurs because multi-view learning exacerbate the negative influence of data sparsity because these methods require more model parameters to learn more view information. Motivated by these issues, we propose MMGCL, a meta knowledge-enhanced multi-view graph contrastive learning framework for recommendations. To tackle the data noise issue, MMGCL extract meta knowledge to preserve important information from all views to form a meta view representation. It then rectifies every view in multi-learning frameworks, thus simultaneously removing the view-private noisy edges and conflicting edges across different views. To address the data sparsity issue, MMGCL performs meta knowledge transfer contrastive learning optimization on all views to reduce the searching space for model parameters and add more supervised signal. Besides, we have deployed MMGCL in a real industrial recommender system in China, and we further evaluate it on three benchmark datasets and a practical industry online application. Extensive experiments on these datasets demonstrate the state-of-the-art recommendation performance of MMGCL.

    Full text in ACM Digital Library

  • RESMulti-Objective Recommendation via Multivariate Policy Learning
    by Olivier Jeunen (ShareChat), Jatin Mandav (ShareChat), Ivan Potapov (ShareChat), Nakul Agarwal (ShareChat), Sourabh Vaid (ShareChat), Wenzhe Shi (ShareChat) and Aleksei Ustimenko (ShareChat)

    Real-world recommender systems often need to balance multiple objectives when deciding which recommendations to present to users. These include behavioural signals (e.g. clicks, shares, dwell time), as well as broader objectives (e.g. diversity, fairness). Scalarisation methods are commonly used to handle this balancing task, where a weighted average of per-objective reward signals determines the final score used for ranking. Naturally, how these weights are computed exactly, is key to success for any online platform.

    We frame this as a decision-making task, where the scalarisation weights are actions taken to maximise an overall North Star reward (e.g. long-term user retention or growth). We extend existing policy learning methods to the continuous multivariate action domain, proposing to maximise a pessimistic lower bound on the North Star reward that the learnt policy will yield. Typical lower bounds based on normal approximations suffer from insufficient coverage, and we propose an efficient and effective policy-dependent correction for this. We provide guidance to design stochastic data collection policies, as well as highly sensitive reward signals. Empirical observations from simulations, offline and online experiments highlight the efficacy of our deployed approach.

    Full text in ACM Digital Library

  • RESNot All Videos Become Outdated: Short-Video Recommendation by Learning to Deconfound Release Interval Bias
    by Lulu Dong (East China Normal University), Guoxiu He (East China Normal University) and Aixin Sun (Nanyang Technological University)

    Short-video recommender systems often exhibit a biased preference to recently released videos. However, not all videos become outdated; certain classic videos can still attract user’s attention. Such bias along temporal dimension can be further aggravated by the matching model between users and videos, because the model learns from preexisting interactions. From real data, we observe that different videos have varying sensitivities to recency in attracting users’ attention. Our analysis, based on a causal graph modeling short-video recommendation, suggests that the release interval serves as a confounder, establishing a backdoor path between users and videos. To address this confounding effect, we propose a model-agnostic causal architecture called Learning to Deconfound the Release Interval Bias (LDRI). LDRI enables jointly learning of the matching model and the video recency sensitivity perceptron. In the inference stage, we apply a backdoor adjustment, effectively blocking the backdoor path by intervening on each video. Extensive experiments on two benchmarks demonstrate that LDRI consistently outperforms backbone models and exhibits superior performance against state-of-the-art models. Additional comprehensive analyses confirm the deconfounding capability of LDRI.

    Full text in ACM Digital Library

  • RESOptimal Baseline Corrections for Off-Policy Contextual Bandits
    by Shashank Gupta (University of Amsterdam), Olivier Jeunen (ShareChat), Harrie Oosterhuis (Radboud University) and Maarten de Rijke (University of Amsterdam)

    The off-policy learning paradigm allows for recommender systems and general ranking applications to be framed as decision-making problems, where we aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric. With unbiasedness comes potentially high variance, and prevalent methods exist to reduce estimation variance. These methods typically make use of control variates, either additive (i.e., baseline corrections or doubly robust methods) or multiplicative (i.e., self-normalisation).

    Our work unifies these approaches by proposing a single framework built on their equivalence in learning scenarios. The foundation of our framework is the derivation of an equivalent baseline correction for all of the existing control variates. Consequently, our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it. This optimal estimator brings significantly improved performance in both evaluation and learning, and minimizes data requirements. Empirical observations corroborate our theoretical findings.

    Full text in ACM Digital Library

  • RESPrompt Tuning for Item Cold-start Recommendation
    by Yuezihan Jiang (Kuaishou Technology), Gaode Chen (Kuaishou Technology), Wenhan Zhang (Peking University), Jingchi Wang (Peking University), Yinjie Jiang (Kuaishou Technology), Qi Zhang (Kuaishou Technology), Jingjian Lin (Kuaishou Technology), Peng Jiang (Kuaishou Technology) and Kaigui Bian (Peking University)

    The item cold-start problem is crucial for online recommender systems, as the success of the cold-start phase determines whether items can transition into popular ones. Prompt learning, a powerful technique used in natural language processing (NLP) to address zero- or few-shot problems, has been adapted for recommender systems to tackle similar challenges. However, existing methods typically rely on content-based properties or text descriptions for prompting, which we argue may be suboptimal for cold-start recommendations due to 1) semantic gaps with recommender tasks, 2) model bias caused by warm-up items contribute most of the positive feedback to the model, which is the core of the cold-start problem that hinders the recommender quality on cold-start items. We propose to leverage high-value positive feedback, termed pinnacle feedback as prompt information, to simultaneously resolve the above two problems. We experimentally prove that compared to the content description proposed in existing works, the positive feedback is more suitable to serve as prompt information by bridging the semantic gaps. Besides, we propose item-wise personalized prompt networks to encode pinnaclce feedback to relieve the model bias by the positive feedback dominance problem. Extensive experiments on four real-world datasets demonstrate the superiority of our model over state-of-the-art methods. Moreover, PROMO has been successfully deployed on a popular short-video sharing platform, a billion-user scale commercial short-video application, achieving remarkable performance gains across various commercial metrics within cold-start scenarios.

    Full text in ACM Digital Library

  • RESPutting Popularity Bias Mitigation to the Test: A User-Centric Evaluation in Music Recommenders
    by Robin Ungruh (Delft University of Technology), Karlijn Dinnissen (Utrecht University), Anja Volk (Utrecht University), Maria Soledad Pera (Delft University of Technology) and Hanna Hauptmann (Utrecht University)

    Popularity bias is a prominent phenomenon in recommender systems (RS), especially in the music domain. Although popularity bias mitigation techniques are known to enhance the fairness of RS while maintaining their high performance, there is a lack of understanding regarding users’ actual perception of the suggested music. To address this gap, we conducted a user study (n=40) exploring user satisfaction and perception of personalized music recommendations generated by algorithms that explicitly mitigate popularity bias. Specifically, we investigate item-centered and user-centered bias mitigation techniques, aiming to ensure fairness for artists or users, respectively. Results show that neither mitigation technique harms the users’ satisfaction with the recommendation lists despite promoting underrepresented items. However, the item-centered mitigation technique impacts user perception; by promoting less popular items, it reduces users’ familiarity with the items. Lower familiarity evokes discovery—the feeling that the recommendations enrich the user’s taste. We demonstrate that this can ultimately lead to higher satisfaction, highlighting the potential of less-popular recommendations to improve the user experience.

    Full text in ACM Digital Library

  • RESRanking-Aware Unbiased Post-Click Conversion Rate Estimation via AUC Optimization on Entire Exposure Space
    by Yu Liu (Nanjing University;Huawei Technologies Co., Ltd.), Qinglin Jia (Huawei Noah’s Ark Lab), Shuting Shi (Huawei Technologies Co., Ltd.), Chuhan Wu (Huawei Noah’s Ark Lab), Zhaocheng Du (Huawei Noah’s Ark Lab), Zheng Xie (Nanjing University), Ruiming Tang (Huawei Noah’s Ark Lab), Muyu Zhang (Huawei Technologies Co., Ltd.) and Ming Li (Nanjing University)

    Estimating the post-click conversion rate (CVR) accurately in ranking systems is crucial in industrial applications. However, this task is often challenged by data sparsity and selection bias, which hinder accurate ranking. Previous approaches to address these challenges have typically focused on either modeling CVR across the entire exposure space which includes all exposure events, or providing unbiased CVR estimation separately. However, the lack of integration between these objectives has limited the overall performance of CVR estimation. Therefore, there is a pressing need for a method that can simultaneously provide unbiased CVR estimates across the entire exposure space. To achieve it, we formulate the CVR estimation task as an Area Under the Curve (AUC) optimization problem and propose the Entire-space Weighted AUC (EWAUC) framework. EWAUC utilizes sample reweighting techniques to handle selection bias and employs pairwise AUC risk, which incorporates more information from limited clicked data, to handle data sparsity. In order to model CVR across the entire exposure space unbiasedly, EWAUC treats the exposure data as both conversion data and non-conversion data to calculate the loss. The properties of AUC risk guarantee the unbiased nature of the entire space modeling. We provide comprehensive theoretical analysis to validate the unbiased nature of our approach. Additionally, extensive experiments conducted on real-world datasets demonstrate that our approach outperforms state-of-the-art methods in terms of ranking performance for the CVR estimation task.

    Full text in ACM Digital Library

  • RESReLand: Integrating Large Language Models’ Insights into Industrial Recommenders via a Controllable Reasoning Pool
    by Changxin Tian (Ant Group), Binbin Hu (Ant Group), Chunjing Gan (Ant Group), Haoyu Chen (Ant Group), Zhuo Zhang (Ant Group), Li Yu (Ant Group), Ziqi Liu (Ant Group), Zhiqiang Zhang (Ant Group), Jun Zhou (Ant Group) and Jiawei Chen (Zhejiang University)

    Recently, Large Language Models (LLMs) have shown significant potential in addressing the isolation issues faced by recommender systems. However, despite performance comparable to traditional recommenders, the current methods are cost-prohibitive for industrial applications. Consequently, existing LLM-based methods still need to catch up regarding effectiveness and efficiency. To tackle the above challenges, we present an LLM-enhanced recommendation framework named ReLand, which leverages Retrieval to effortlessly integrate Large language models’ insights into industrial recommenders. Specifically, ReLand employs LLMs to perform generative recommendations on sampled users (a.k.a., seed users), thereby constructing an LLM Reasoning Pool. Subsequently, we leverage retrieval to attach reliable recommendation rationales for the entire user base, ultimately effectively improving recommendation performance. Extensive offline and online experiments validate the effectiveness of ReLand. Since January 2024, ReLand has been deployed in the recommender system of Alipay, achieving statistically significant improvements of 3.19% in CTR and 1.08% in CVR.

    Full text in ACM Digital Library

  • RESRepeated Padding for Sequential Recommendation
    by Yizhou Dang (Northeastern University), Yuting Liu (Northeastern University), Enneng Yang (Northeastern University), Guibing Guo (Northeastern University), Linying Jiang (Northeastern University), Xingwei Wang (Northeastern University) and Jianzhe Zhao (Northeastern University)

    Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batch-based training needs to ensure that the sequences in each batch have the same length. The special value 0 is usually used as the padding content, which does not contain the actual information and is ignored in the model calculations. This common-sense padding strategy leads us to a problem that has never been explored in the recommendation field: Can we utilize this idle input space by padding other content to improve model performance and training efficiency further?

    In this paper, we propose a simple yet effective padding method called Repeated Padding (RepPad). Specifically, we use the original interaction sequences as the padding content and fill it to the padding positions during model training. This operation can be performed a finite number of times or repeated until the input sequences’ length reaches the maximum limit. Our RepPad can be considered as a sequence-level data augmentation strategy. Unlike most existing works, our method contains no trainable parameters or hyperparameters and is a plug-and-play data augmentation operation. Extensive experiments on various categories of sequential models and five real-world datasets demonstrate the effectiveness and efficiency of our approach. The average recommendation performance improvement is up to 60.3% on GRU4Rec and 24.3% on SASRec. We also provide in-depth analysis and explanation of what makes RepPad effective from multiple perspectives. Our datasets and codes are available at https://github.com/KingGugu/RepPad.

    Full text in ACM Digital Library

  • RESRight Tool, Right Job: Recommendation for Repeat and Exploration Consumption in Food Delivery
    by Jiayu Li (Tsinghua University), Aixin Sun (Nanyang Technological University), Weizhi Ma (Tsinghua University), Peijie Sun (Tsinghua University) and Min Zhang (Tsinghua University

    From e-commerce to music and news, recommender systems are tailored to specific scenarios. While researching generic models applicable to various scenarios is crucial, studying recommendations based on the unique characteristics of a specific and vital scenario holds both research and, more importantly, practical value.

    In this paper, we focus on store recommendations in the food delivery scenario, which is an intriguing and significant domain with unique behavior patterns and influential factors. First, we offer an in-depth analysis of real-world food delivery data across platforms and countries, revealing that (i) repeat and exploration orders are both noticeable behaviors and (ii) the influences of historical and collaborative situations on repeat and exploration consumption are distinct. Second, based on the observations, we separately design two simple yet effective recommendation models: RepRec for repeat orders and ExpRec for exploration ones. An ensemble module is further proposed to combine recommendations from two models for a unified recommendation list. Finally, experiments are conducted on three datasets spanning three countries across two food delivery platforms. Results demonstrate the superiority of our proposed recommenders on repeat, exploration, and combined recommendation tasks over various baselines. Such simple yet effective approaches will be beneficial for real applications. This work shows that dedicated analyses and methods for domain-specific characteristics are essential for the recommender system studies.

    Full text in ACM Digital Library

  • RESRPAF: A Reinforcement Prediction-Allocation Framework for Cache Allocation in Large-Scale Recommender Systems
    by Shuo Su (Kuaishou Technology), Xiaoshuang Chen (Kuaishou Technology), Yao Wang (Kuaishou Technology), Yulin Wu (Kuaishou Technology), Ziqiang Zhang (Tsinghua University), Kaiqiao Zhan (Kuaishou Technology), Ben Wang (Kuaishou Technology) and Kun Gai

    Modern recommender systems are built upon computation-intensive infrastructure, and it is challenging to perform real-time computation for each request, especially in peak periods, due to the limited computational resources. Recommending by user-wise result caches is widely used when the system cannot afford a real-time recommendation. However, it is challenging to allocate real-time and cached recommendations to maximize the users’ overall engagement. This paper shows two key challenges to cache allocation, i.e., the value-strategy dependency and the streaming allocation. Then, we propose a reinforcement prediction-allocation framework (RPAF) to address these issues. RPAF is a reinforcement-learning-based two-stage framework containing prediction and allocation stages. The prediction stage estimates the values of the cache choices considering the value-strategy dependency, and the allocation stage determines the cache choices for each individual request while satisfying the global budget constraint. We show that the challenge of training RPAF includes globality and the strictness of budget constraints, and a relaxed local allocator (RLA) is proposed to address this issue. Moreover, a PoolRank algorithm is used in the allocation stage to deal with the streaming allocation problem. Experiments show that RPAF significantly improves users’ engagement under computational budget constraints.

    Full text in ACM Digital Library

  • RESScalable Cross-Entropy Loss for Sequential Recommendations with Large Item Catalogs
    by Gleb Mezentsev (Skolkovo Institute of Science and Technology), Danil Gusak (Skolkovo Institute of Science and Technology; HSE University), Ivan Oseledets (Artificial Intelligence Research Institute; Skolkovo Institute of Science and Technology) and Evgeny Frolov (Artificial Intelligence Research Institute; Skolkovo Institute of Science and Technology; HSE University)

    Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs. This paper introduces a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

    Full text in ACM Digital Library

  • RESScaling Law of Large Sequential Recommendation Models
    by Gaowei Zhang (Renmin University of China), Yupeng Hou (University of California San Diego), Hongyu Lu (Tencent), Yu Chen (Tencent), Wayne Xin Zhao (Renmin University of China) and Ji-Rong Wen (Renmin University of China)

    Scaling of neural networks has recently shown great potential to improve the model capacity in various fields. Specifically, model performance has a power-law relationship with model size or data size, which provides important guidance for the development of large-scale models. However, there is still limited understanding on the scaling effect of user behavior models in recommender systems, where the unique data characteristics (e.g., data scarcity and sparsity) pose new challenges in recommendation tasks.

    In this work, we focus on investigating the scaling laws in large sequential recommendation models. Specifically, we consider a pure ID-based task formulation, where the interaction history of a user is formatted as a chronological sequence of item IDs. We don’t incorporate any side information (e.g., item text), to delve into the scaling law’s applicability from the perspective of user behavior. We successfully scale up the model size to 0.8B parameters, making it feasible to explore the scaling effect in a diverse range of model sizes. As the major findings, we empirically show that the scaling law still holds for these trained models, even in data-constrained scenarios. We then fit the curve for scaling law, and successfully predict the test loss of the two largest tested model scales.

    Furthermore, we examine the performance advantage of scaling effect on five challenging recommendation tasks, considering the unique issues (e.g., cold start, robustness, long-term preference) in recommender systems. We find that scaling up the model size can greatly boost the performance on these challenging tasks, which again verifies the benefits of large recommendation models.

    Full text in ACM Digital Library

  • RESScene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction
    by Wenhao Li (Huazhong University of Science and Technology; Meituan), Jie Zhou (Beihang University), Chuan Luo (Beihang University), Chao Tang (Meituan), Kun Zhang (Meituan) and Shixiong Zhao (The University of Hong Kong)

    In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices, along with the short online lifespan of scenes, give rise to the complexity of effective recommendations in online and dynamic scenes. In this work, we propose Scene-wise Adaptive Network (SwAN 1), a novel approach that emphasizes high-performance cold-start online recommendations for new scenes. Our approach introduces several crucial capabilities, including scene similarity learning, user-specific scene transition cognition, scene-specific information construction for the new scene, and enhancing the diverged logical information between scenes. We demonstrate SwAN’s potential to optimize dynamic multi-scene recommendation problems by effectively online handling cold-start recommendations for any newly arrived scenes. More encouragingly, SwAN has been successfully deployed in Meituan’s online catering recommendation service, which serves millions of customers per day, and SwAN has achieved a 5.64% CTR index improvement relative to the baselines and a 5.19% increase in daily order volume proportion.

    Full text in ACM Digital Library

  • RESSeCor: Aligning Semantic and Collaborative Representations by Large Language Models for Next-Point-of-Interest Recommendations
    by Shirui Wang (Tongji University), Bohan Xie (Tongji University), Ling Ding (Tongji University), Xiaoying Gao (Tongji University), Jianting Chen (Tongji University) and Yang Xiang (Tongji University)

    The widespread adoption of location-based applications has created a growing demand for point-of-interest (POI) recommendation, which aims to predict a user’s next POI based on their historical check-in data and current location. However, existing methods often struggle to capture the intricate relationships within check-in data. This is largely due to their limitations in representing temporal and spatial information and underutilizing rich semantic features. While large language models (LLMs) offer powerful semantic comprehension to solve them, they are limited by hallucination and the inability to incorporate global collaborative information. To address these issues, we propose a novel method SeCor, which treats POI recommendation as a multi-modal task and integrates semantic and collaborative representations to form an efficient hybrid encoding. SeCor first employs a basic collaborative filtering model to mine interaction features. These embeddings, as one modal information, are fed into LLM to align with semantic representation, leading to efficient hybrid embeddings. To mitigate the hallucination, SeCor recommends based on the hybrid embeddings rather than directly using the LLM’s output text. Extensive experiments on three public real-world datasets show that SeCor outperforms all baselines, achieving improved recommendation performance by effectively integrating collaborative and semantic information through LLMs.

    Full text in ACM Digital Library

  • RESThe Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation
    by Zekai Qu (China University of Geosciences Beijing), Ruobing Xie (Tencent Inc.), Chaojun Xiao (Tsinghua University), Zhanhui Kang (Tencent Inc.) and Xingwu Sun (Tencent Inc.)

    Sequential recommendation (SR) has seen significant advancements with the help of Pre-trained Language Models (PLMs). Some PLM-based SR models directly use PLM to encode user historical behavior’s text sequences to learn user representations, while there is seldom an in-depth exploration of the capability and suitability of PLM in behavior sequence modeling. In this work, we first conduct extensive model analyses between PLMs and PLM-based SR models, discovering great underutilization and parameter redundancy of PLMs in behavior sequence modeling. Inspired by this, we explore different lightweight usages of PLMs in SR, aiming to maximally stimulate the ability of PLMs for SR while satisfying the efficiency and usability demands of practical systems. We discover that adopting behavior-tuned PLMs for item initializations of conventional ID-based SR models is the most economical framework of PLM-based SR, which would not bring in any additional inference cost but could achieve a dramatic performance boost compared with the original version. Extensive experiments on five datasets show that our simple and universal framework leads to significant improvement compared to classical SR and SOTA PLM-based SR models without additional inference costs. Our code can be found in https://github.com/777pomingzi/Rethinking-PLM-in-RS.

    Full text in ACM Digital Library

  • RESThe Fault in Our Recommendations: On the Perils of Optimizing the Measurable
    by Omar Besbes (Columbia University), Yash Kanoria (Columbia University) and Akshit Kumar (Columbia University)

    Recommendation systems are widespread, and through customized recommendations, promise to match users with options they will like. To that end, data on engagement is collected and used. Most recommendation systems are ranking-based, where they rank and recommend items based on their predicted engagement. However, the engagement signals are often only a crude proxy for user utility, as data on the latter is rarely collected or available. This paper explores the following question: By optimizing for measurable proxies, are recommendation systems at risk of significantly under-delivering on user utility? If that is indeed the case, how can one improve utility which is seldom measured?To study these questions, we introduce a model of repeated user consumption in which, at each interaction, users select between an outside option and the best option from a recommendation set. Our model accounts for user heterogeneity, with the majority preferring “popular” content, and a minority favoring “niche” content. The system initially lacks knowledge of individual user preferences but can learn these preferences through observations of users’ choices over time. Our theoretical and numerical analysis demonstrate that optimizing for engagement signals can lead to significant utility losses. Instead, we propose a utility-aware policy that initially recommends a mix of popular and niche content. We show that such a policy substantially improves utility despite not measuring it. As the platform becomes more forward-looking, our utility-aware policy achieves the best of both worlds: near-optimal user utility and near-optimal engagement simultaneously. Our study elucidates an important feature of recommendation systems; given the ability to suggest multiple items, one can perform significant exploration without incurring significant reductions in short term engagement. By recommending high-risk, high-reward items alongside popular items, systems can enhance discovery of high utility items without significantly affecting engagement.

    Full text in ACM Digital Library

  • RESThe Role of Unknown Interactions in Implicit Matrix Factorization — A Probabilistic View
    by Joey De Pauw (University of Antwerp) and Bart Goethals (University of Antwerp)

    Matrix factorization is a well-known and effective methodology for top-k list recommendation. It became widely known during the Netflix challenge in 2006, and since then, many adapted and improved versions have been published. A particularly interesting matrix factorization algorithm called iALS (for implicit Alternating Least Squares) adapts the method for implicit feedback, i.e. a setting where only a very small amount of positive labels are available along with a majority of unknown labels. Compared to the classical task of rating prediction, learning from implicit feedback is applicable to many more domains, as the data is more abundant and requires less effort to elicit from users. However, the sparsity, imbalance, and implicit nature of the signal also pose unique challenges to retrieving the most relevant items to recommend.

    We revisit the role of unknown interactions in implicit matrix factorization. Traditionally, all unknowns are interpreted as negative samples and their importance in the training objective is then down-weighted to balance them out with the known, positive interactions. Interestingly, by adapting a probabilistic view of matrix factorization, we can retain the unknown nature of these interactions by modelling them as either positive or negative. With this new formulation that better fits the underlying data, we gain improved performance on the downstream recommendation task without any computational overhead compared to the popular iALS method.

    This paper outlines the key insights needed to adapt iALS to use logistic regression. Furthermore, a logistic version of the popular full-rank EASE model is introduced in a similar fasion. An extensive experimental evaluation on several real-world datasets demonstrates the effectiveness of our approach. Additionally, a discrepancy between the need for weighting between factorization and autoencoder models is discovered, leading towards a better understanding of these methods.

    Full text in ACM Digital Library

  • RESTouch the Core: Exploring Task Dependence Among Hybrid Targets for Recommendation
    by Xing Tang (Tencent), Yang Qiao (Tencent), Fuyuan Lyu (McGill University), Dugang Liu (Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)) and Xiuqiang He (Tencent)

    As user behaviors become complicated on business platforms, online recommendations focus more on how to touch the core conversions, which are highly related to the interests of platforms. These core conversions are usually continuous targets, such as watch time, revenue, and so on, whose predictions can be enhanced by previous discrete conversion actions. Therefore, multi-task learning (MTL) can be adopted as the paradigm to learn these hybrid targets. However, existing works mainly emphasize investigating the sequential dependence among discrete conversion actions, which neglects the complexity of dependence between discrete conversions and the final continuous conversion. Moreover, simultaneously optimizing hybrid tasks with stronger task dependence will suffer from volatile issues where the core regression task might have a larger influence on other tasks. In this paper, we study the MTL problem with hybrid targets for the first time and propose the model named Hybrid Targets Learning Network (HTLNet) to explore task dependence and enhance optimization. Specifically, we introduce label embedding for each task to explicitly transfer the label information among these tasks, which can effectively explore logical task dependence. We also further design the gradient adjustment regime between the final regression task and other classification tasks to enhance the optimization. Extensive experiments on two offline public datasets and one real-world industrial dataset are conducted to validate the effectiveness of HTLNet. Moreover, online A/B tests on the financial recommender system also show that our model has improved significantly. Our implementation is available here.

    Full text in ACM Digital Library

  • RESTowards Empathetic Conversational Recommender Systems
    by Xiaoyu Zhang (Shandong University), Ruobing Xie (Tencent), Yougang Lyu (Shandong University; University of Amsterdam), Xin Xin (Shandong University), Pengjie Ren (Shandong University), Mingfei Liang (Tencent), Bo Zhang (Tencent), Zhanhui Kang (Tencent), Maarten de Rijke (University of Amsterdam) and Zhaochun Ren (Leiden University)

    Conversational recommender systems (CRSs) are able to elicit user preferences through multi-turn dialogues. They typically incorporate external knowledge and pre-trained language models to capture the dialogue context. Most CRS approaches, trained on benchmark datasets, assume that the standard items and responses in these benchmarks are optimal. However, they overlook that users may express negative emotions with the standard items and may not feel emotionally engaged by the standard responses. This issue leads to a tendency to replicate the logic of recommenders in the dataset instead of aligning with user needs. To remedy this misalignment, we introduce empathy within a CRS. With empathy we refer to a system’s ability to capture and express emotions. We propose an empathetic conversational recommender (ECR) framework.

    ECR contains two main modules: emotion-aware item recommendation and emotion-aligned response generation. Specifically, we employ user emotions to refine user preference modeling for accurate recommendations. To generate human-like emotional responses, ECR applies retrieval-augmented prompts to fine-tune a pre-trained language model aligning with emotions and mitigating hallucination. To address the challenge of insufficient supervision labels, we enlarge our empathetic data using emotion labels annotated by large language models and emotional reviews collected from external resources. We propose novel evaluation metrics to capture user satisfaction in real-world CRS scenarios. Our experiments on the ReDial dataset validate the efficacy of our framework in enhancing recommendation accuracy and improving user satisfaction.

    Full text in ACM Digital Library

  • RESTowards Open-World Recommendation with Knowledge Augmentation from Large Language Models
    by Yunjia Xi (Shanghai Jiao Tong University), Weiwen Liu (Huawei Noah’s Ark Lab), Jianghao Lin (Shanghai Jiao Tong University), Xiaoling Cai (Huawei), Hong Zhu (Huawei), Jieming Zhu (Huawei Noah’s Ark Lab), Bo Chen (Huawei Noah’s Ark Lab), Ruiming Tang (Huawei Noah’s Ark Lab), Weinan Zhang (Shanghai Jiao Tong University) and Yong Yu (Shanghai Jiao Tong University)

    Recommender system plays a vital role in various online services. However, its insulated nature of training and deploying separately within a specific closed domain limits its access to open-world knowledge. Recently, the emergence of large language models (LLMs) has shown promise in bridging this gap by encoding extensive world knowledge and demonstrating reasoning capabilities. Nevertheless, previous attempts to directly use LLMs as recommenders cannot meet the inference latency demand of industrial recommender systems. In this work, we propose an Open-World Knowledge Augmented Recommendation Framework with Large Language Models, dubbed KAR, to acquire two types of external knowledge from LLMs — the reasoning knowledge on user preferences and the factual knowledge on items. We introduce factorization prompting to elicit accurate reasoning on user preferences. The generated reasoning and factual knowledge are effectively transformed and condensed into augmented vectors by a hybrid-expert adaptor in order to be compatible with the recommendation task. The obtained vectors can then be directly used to enhance the performance of any recommendation model. We also ensure efficient inference by preprocessing and prestoring the knowledge from the LLM. Extensive experiments show that KAR significantly outperforms the state-of-the-art baselines and is compatible with a wide range of recommendation algorithms. We deploy KAR to Huawei’s news and music recommendation platforms and gain a 7% and 1.7% improvement in the online A/B test, respectively.

    Full text in ACM Digital Library

  • RESTransformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation
    by Viet-Anh Tran (Deezer Research), Guillaume Salha-Galvan (Deezer Research), Bruno Sguerra (Deezer Research) and Romain Hennequin (Deezer Research)

    Music streaming services often leverage sequential recommender systems to predict the best music to showcase to users based on past sequences of listening sessions. Nonetheless, most sequential recommendation methods ignore or insufficiently account for repetitive behaviors. This is a crucial limitation for music recommendation, as repeatedly listening to the same song over time is a common phenomenon that can even change the way users perceive this song. In this paper, we introduce PISA (Psychology-Informed Session embedding using ACT-R), a session-level sequential recommender system that overcomes this limitation. PISA employs a Transformer architecture learning embedding representations of listening sessions and users using attention mechanisms inspired by Anderson’s ACT-R (Adaptive Control of Thought-Rational), a cognitive architecture modeling human information access and memory dynamics. This approach enables us to capture dynamic and repetitive patterns from user behaviors, allowing us to effectively predict the songs they will listen to in subsequent sessions, whether they are repeated or new ones. We demonstrate the empirical relevance of PISA using both publicly available listening data from Last.fm and proprietary data from Deezer, a global music streaming service, confirming the critical importance of repetition modeling for sequential listening session recommendation. Along with this paper, we publicly release our proprietary dataset to foster future research in this field, as well as the source code of PISA to facilitate its future use.

    Full text in ACM Digital Library

  • RESUnified Denoising Training for Recommendation
    by Haoyan Chua (Nanyang Technological University), Yingpeng Du (Nanyang Technological University), Zhu Sun (Singapore University of Technology and Design), Ziyan Wang (Nanyang Technological University), Jie Zhang (Nanyang Technological University) and Yew-Soon Ong (Nanyang Technological University)

    Most existing denoising recommendation methods alleviate noisy implicit feedback (user behaviors) through mainly empirical studies. However, such studies may lack theoretical explainability and fail to model comprehensive noise patterns, which hinders the understanding and capturing of different noise patterns that affect users’ behaviors. Thus, we propose to capture comprehensive noise patterns through theoretical and empirical analysis for more effective denoising, where users’ behaviors are divided into willingness and action phases to disentangle independent noise patterns. Willingness refers to the user’s intent to interact with an item, which may not lead to actual interaction due to different factors such as misclicking. Action denotes the user’s actual interaction with an item. Our analysis unveils that (1) in the willingness phase, high uncertainty in the user’s willingness to interact with the item can lead to high expectation loss which aligns with the findings of existing denoising methods; and (2) in the action phase, higher user-specific inconsistency between willingness and action not only leads to more noise in the user’s overall behaviors but also makes it harder to distinguish between true and noisy behaviors. Inspired by these findings, we propose a Unified Denoising Training (UDT) method for recommendation. To alleviate uncertainty in the willingness phase, we lower the importance of the user-item interaction with high willingness uncertainty recognized by high loss. To ease the inconsistency in the action phase, we lower the importance for users with high user-specific inconsistency as it may lead to noisier behaviors. Then, we increase the importance gap between the clean and noisy behaviors for users with low user-specific inconsistency as their behaviors are more distinguishable. Extensive experiments on three real-world datasets show that our proposed UDT outperforms state-of-the-art denoising recommendation methods.

    Full text in ACM Digital Library

  • RESUnleashing the Retrieval Potential of Large Language Models in Conversational Recommender Systems
    by Ting Yang (Hong Kong Baptist University) and Li Chen (Hong Kong Baptist University)

    Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through interactive natural language interaction. The recent advent of large language models (LLMs) has revolutionized human engagement in natural conversation, driven by their extensive world knowledge and remarkable natural language understanding and generation capabilities. However, introducing LLMs into CRSs presents new technical challenges. Directly prompting LLMs for recommendation generation requires understanding a large and evolving item corpus, as well as grounding the generated recommendations in the real item space. On the other hand, generating recommendations based on external recommendation engines or directly integrating their suggestions into responses may constrain the overall performance of LLMs, since these engines generally have inferior representation abilities compared to LLMs. To address these challenges, we propose an end-to-end large-scale CRS model, named as ReFICR, a novel LLM-enhanced conversational recommender that empowers a retrievable large language model to perform conversational recommendation by following retrieval and generation instructions through lightweight tuning. By decomposing the complex CRS task into multiple subtasks, we formulate these subtasks into two types of instruction formats: retrieval and generation. The hidden states of ReFICR are utilized for generating text embeddings for retrieval, and simultaneously ReFICR is fine-tuned to handle generation subtasks. We optimize the contrastive objective to enhance text embeddings for retrieval and jointly fine-tune the large language model objective for generation. Our experimental results on public datasets demonstrate that ReFICR significantly outperforms baselines in terms of recommendation accuracy and response quality. Our code is publicly available at the link: https://github.com/yt556677/ReFICR.

    Full text in ACM Digital Library

  • RESUnlocking the Hidden Treasures: Enhancing Recommendations with Unlabeled Data
    by Yuhan Zhao (Harbin Engineering University), Rui Chen (Harbin Engineering University), Qilong Han (Harbin Engineering University), Hongtao Song (Harbin Engineering University) and Li Chen (Hong Kong Baptist University)

    Collaborative filtering (CF) stands as a cornerstone in recommender systems, yet effectively leveraging the massive unlabeled data presents a significant challenge. Current research focuses on addressing the challenge of unlabeled data by extracting a subset that closely approximates negative samples. Regrettably, the remaining data are overlooked, failing to fully integrate this valuable information into the construction of user preferences. To address this gap, we introduce a novel positive-neutral-negative (PNN) learning paradigm. PNN introduces a neutral class, encompassing intricate items that are challenging to categorize directly as positive or negative samples. By training a model based on this triple-wise partial ranking, PNN offers a promising solution to learning complex user preferences. Through theoretical analysis, we connect PNN to one-way partial AUC (OPAUC) to validate its efficacy. Implementing the PNN paradigm is, however, technically challenging because: (1) it is difficult to classify unlabeled data into neutral or negative in the absence of supervised signals; (2) there does not exist any loss function that can handle set-level triple-wise ranking relationships. To address these challenges, we propose a semi-supervised learning method coupled with a user-aware attention model for knowledge acquisition and classification refinement. Additionally, a novel loss function with a two-step centroid ranking approach enables handling set-level rankings. Extensive experiments on four real-world datasets demonstrate that, when combined with PNN, a wide range of representative CF models can consistently and significantly boost their performance. Even with a simple matrix factorization, PNN can achieve comparable performance to sophisticated graph neutral networks. Our code is publicly available at https://github.com/Asa9aoTK/PNN-RecBole.

    Full text in ACM Digital Library

  • RESUtilizing Non-click Samples via Semi-supervised Learning for Conversion Rate Prediction
    by Jiahui Huang (University of Science and Technology of China), Lan Zhang (University of Science and Technology of China), Junhao Wang (University of Science and Technology of China), Shanyang Jiang (University of Science and Technology of China), Dongbo Huang (Tencent), Cheng Ding (Tencent) and Lan Xu (Tencent)

    Conversion rate (CVR) prediction is essential in recommender systems, facilitating precise matching between recommended items and users’ preferences. However, the sample selection bias (SSB) and data sparsity (DS) issues pose challenges to accurate prediction. Existing works have proposed the click-through and conversion rate (CTCVR) prediction task which models samples from exposure to “click and conversion” in entire space and incorporates multi-task learning. This approach has shown efficacy in mitigating these challenges. Nevertheless, it intensifies the false negative sample (FNS) problem. To be more specific, the CTCVR task implicitly treats all the CVR labels of non-click samples as negative, overlooking the possibility that some samples might convert if clicked. This oversight can negatively impact CVR model performance, as empirical analysis has confirmed. To this end, we advocate for discarding the CTCVR task and proposing a Non-click samples Improved Semi-supErvised (NISE) method for conversion rate prediction, where the non-click samples are treated as unlabeled. Our approach aims to predict their probabilities of conversion if clicked, utilizing these predictions as pseudo-labels for further model training. This strategy can help alleviate the FNS problem, and direct modeling of the CVR task across the entire space also mitigates the SSB and DS challenges. Additionally, we conduct multi-task learning by introducing an auxiliary click-through rate prediction task, thereby enhancing embedding layer representations. Our approach is applicable to various multi-task architectures. Comprehensive experiments are conducted on both public and production datasets, demonstrating the superiority of our proposed method in mitigating the FNS challenge and improving the CVR estimation. The implementation code is available at https://github.com/Hjh233/NISE.

    Full text in ACM Digital Library

Sapphire Supporter
 
Diamond Supporter
 
Amazon Science
 
Platinum Supporter
 
Gold Supporter
 
Silver Supporter
 
 
Bronze Supporter
 
Women in RecSys’s Event Supporter
 
Challenge Sponsor
EkstraBladet
 
Special Supporters