Accepted Contributions
List of all long papers accepted for RecSys 2023 (in alphabetical order).
- RESA Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions
by Norman Knyazev (Radboud University) and Harrie Oosterhuis (Radboud University).Most recommender systems (RecSys) do not provide an indication of confidence in their decisions. Therefore, they do not distinguish between recommendations of which they are certain, and those where they are not. Existing confidence methods for RecSys are either inaccurate heuristics, conceptually complex or very computationally expensive. Consequently, real-world RecSys applications rarely adopt these methods, and thus, provide no confidence insights in their behavior. In this work, we propose learned beta distributions (LBD) as a simple and practical recommendation method with an explicit measure of confidence. Our main insight is that beta distributions predict user preferences as probability distributions that naturally model confidence on a closed interval, yet can be implemented with the minimal model-complexity. Our results show that LBD maintains competitive accuracy to existing methods while also having a significantly stronger correlation between its accuracy and confidence. Furthermore, LBD has higher performance when applied to a high-precision targeted recommendation task. Our work thus shows that confidence in RecSys is possible without sacrificing simplicity or accuracy, and without introducing heavy computational complexity. Thereby, we hope it enables better insight into real-world RecSys and opens the door for novel future applications.
- RESA Multi-view Graph Contrastive Learning Framework for Cross-Domain Sequential Recommendation
by Zitao Xu (Shenzhen University), Weike Pan (Shenzhen University) and Zhong Ming (Shenzhen University).Sequential recommendation methods play an irreplaceable role in recommender systems which can capture the users’ dynamic preferences from the behavior sequences. Despite their success, these works usually suffer from the sparsity problem commonly existed in real applications. Cross-domain sequential recommendation aims to alleviate this problem by introducing relatively richer source-domain data. However, most existing methods capture the users’ preferences independently of each domain, which may neglect the item transition patterns across sequences from different domains, i.e., a user’s interaction in one domain may influence his/her next interaction in other domains. Moreover, the data sparsity problem still exists since some items in the target and source domains are interacted with only a limited number of times. To address these issues, in this paper we propose a generic framework named multi-view graph contrastive learning (MGCL). Specifically, we adopt the contrastive mechanism in an intra-domain item representation view and an inter-domain user preference view. The former is to jointly learn the dynamic sequential information in the user sequence graph and the static collaborative information in the cross-domain global graph, while the latter is to capture the complementary information of the user’s preferences from different domains. Extensive empirical studies on three real-world datasets demonstrate that our MGCL significantly outperforms the state-of-the-art methods.
- RESAdversarial Collaborative Filtering for Free
by Huiyuan Chen (Visa Research), Xiaoting Li (Visa Research), Vivian Lai (Visa Research), Chin-Chia Michael Yeh (Visa Research), Yujie Fan (Visa Research), Yan Zheng (Visa Research), Mahashweta Das (Visa Research) and Hao Yang (Visa Research).Collaborative Filtering (CF) has been successfully applied to help users discover the items of interest. Nevertheless, existing CF methods suffer from noisy data issue, which negatively impacts the quality of personalized recommendation. To tackle this problem, many prior studies leverage the adversarial learning principle to regularize the representations of users and items, which has shown great ability in improving both generalizability and robustness. Generally, those methods learn adversarial perturbations and model parameters using min-max optimization framework. However, there still have two major limitations: 1) Existing methods lack theoretical guarantees of why adding perturbations improve the model generalizability and robustness since noisy data is naturally different from adversarial attacks; 2) Solving min-max optimization is time-consuming. In addition to updating the model parameters, each iteration requires additional computations to update the perturbations, making them not scalable for industry-scale datasets.
In this paper, we present Sharpness-aware Matrix Factorization (SharpMF), a simple yet effective method that conducts adversarial training without extra computational cost over the base optimizer. To achieve this goal, we first revisit the existing adversarial collaborative filtering and discuss its connection with recent Sharpness-aware Minimization. This analysis shows that adversarial training actually seeks model parameters that lie in neighborhoods having uniformly low loss values, resulting in better generalizability. To reduce the computational overhead, SharpMF introduces a novel trajectory loss to measure sharpness between current weights and past weights. Experimental results on real-world datasets demonstrate that our SharpMF achieves superior performance with almost zero additional computational cost comparing to adversarial training.
- RESAlleviating the Long-Tail Problem in Conversational Recommender Systems
by Zhipeng Zhao (Singapore Management University), Kun Zhou (School of Information, Renmin University of China), Xiaolei Wang (Gaoling School of Artificial Intelligence, Renmin University of China), Wayne Xin Zhao (Gaoling School of Artificial Intelligence, Renmin University of China), Fan Pan (Poisson Lab, Huawei), Zhao Cao (Poisson Lab, Huawei) and Ji-Rong Wen (Gaoling School of Artificial Intelligence, Renmin University of China).Conversational recommender systems (CRS) aim to provide the recommendation service via natural language conversations. To develop an effective CRS, high-quality CRS datasets are very crucial. However, existing CRS datasets suffer from the long-tail issue, \ie a large proportion of items are rarely (or even never) mentioned in the conversations, which are called long-tail items. As a result, the CRSs trained on these datasets tend to recommend frequent items, and the diversity of the recommended items would be largely reduced, making users easier to get bored.
To address this issue, this paper presents \textbf{LOT-CRS}, a novel framework that focuses on simulating and utilizing a balanced CRS dataset (\ie covering all the items evenly) for improving \textbf{LO}ng-\textbf{T}ail recommendation performance of CRSs. In our approach, we design two pre-training tasks to enhance the understanding of simulated conversation for long-tail items, and adopt retrieval-augmented fine-tuning with label smoothness strategy to further improve the recommendation of long-tail items. Extensive experiments on two public CRS datasets have demonstrated the effectiveness and extensibility of our approach, especially on long-tail recommendation. All the experimental codes will be released after the review period.
- RESAugmented Negative Sampling for Collaborative Filtering
by Yuhan Zhao (Harbin Engineering University), Rui Chen (Harbin Engineering University), Riwei Lai (Harbin Engineering University), Qilong Han (Harbin Engineering University), Hongtao Song (Harbin Engineering University) and Li Chen (Hong Kong Baptist University).Negative sampling is essential for implicit-feedback-based collaborative filtering, which is used to constitute negative signals from massive unlabeled data to guide supervised learning. The state-of-the-art idea is to utilize hard negative samples that carry more useful information to form a better decision boundary. To balance efficiency and effectiveness, the vast majority of existing methods follow the two-pass approach, in which the first pass samples a fixed number of unobserved items by a simple static distribution and then the second pass selects the final negative items using a more sophisticated negative sampling strategy. However, selecting negative samples from the original items from a dataset is inherently limited due to the limited available choices, and thus may not be able to contrast positive samples well. In this paper, we confirm this observation via carefully designed experiments and introduce two major limitations of existing solutions: ambiguous trap and information discrimination.
Our response to such limitations is to introduce “augmented” negative samples that may not exist in the original dataset. This direction renders a substantial technical challenge because constructing unconstrained negative samples may introduce excessive noise that eventually distorts the decision boundary. To this end, we introduce a novel generic augmented negative sampling (ANS) paradigm and provide a concrete instantiation. First, we disentangle the hard and easy factors of negative items. Next, we generate new candidate negative samples by augmenting only the easy factors in a regulated manner: the direction and magnitude of the augmentation are carefully calibrated. Finally, we design an advanced negative sampling strategy to identify the final augmented negative samples, which considers not only the score used in existing methods but also a new metric called augmentation gain. Extensive experiments on five real-world datasets demonstrate that our method significantly outperforms state-of-the-art baselines. Our code is publicly available at https://anonymous.4open.science/r/ANS-Recbole-B070/.
- RESAutoOpt: Automatic Hyperparameter Scheduling and Optimization for Deep Click-through Rate Prediction
by Yujun Li (Noah’s Ark Lab), Xing Tang (Noah’s Ark Lab), Bo Chen (Noah’s Ark Lab), Yimin Huang (Noah’s Ark Lab), Ruiming Tang (Noah’s Ark Lab) and Zhenguo Li (Noah’s Ark Lab).Click-through Rate (CTR) prediction is essential for commercial recommender systems. Recently, to improve the prediction accuracy, plenty of deep learning-based CTR models have been proposed, which are sensitive to hyperparameters and difficult to optimize well. General hyperparameter optimization methods fix these hyperparameters across the entire model training and repeat them multiple times. This trial-and-error process not only leads to suboptimal performance but also requires non-trivial computation efforts. In this paper, we propose an automatic hyperparameters scheduling and optimization method for deep CTR models, \emph{AutoOpt}, making the optimization process more stable and efficient. Specifically, the whole training regime is firstly divided into several consecutive stages, where a data-efficient model is learned to model the relation between model states and prediction performance. To optimize the stage-wise hyperparameters, AutoOpt uses the \textit{global} and \textit{local} scheduling modules to propose proper hyperparameters for the next stage based on the training in the current stage. Extensive experiments on three public benchmarks are conducted to validate the effectiveness of AutoOpt. Moreover, AutoOpt has been deployed onto an advertising platform and a music platform, where online A/B tests also demonstrate superior improvement.
- RESBVAE: Behavior-aware Variational Autoencoder for Multi-Behavior Multi-Task Recommendation
by Qianzhen Rao (Shenzhen University), Yang Liu (Shenzhen University), Weike Pan (Shenzhen University) and Zhong Ming (Shenzhen University).A practical recommender system should be able to handle heterogeneous behavioral feedback as inputs and has multi-task outputs ability. Although the heterogeneous one-class collaborative filtering (HOCCF) and multi-task learning (MTL) methods has been well studied, there is still a lack of targeted manner in their combined fields, i.e., Multi-behavior Multi-task Recommendation (MMR). To fill the gap, we propose a novel recommendation framework called Behavior-aware Variational AutoEncoder (BVAE), which meliorates the parameter sharing and loss minimization method with the VAE structure to address the MMR problem. Specifically, our BVAE includes address behavior-aware semi-encoders and decoders, and a target feature fusion network with a global feature filtering network, while using standard deviation to weigh loss. These modules generate the behavior-aware recommended item list via constructing better semantic feature vectors for users, i.e., from dual perspectives of behavioral preference and global interaction. In addition, we optimize our BVAE in terms of adaptability and robustness, i.e., it is concise and flexible to consume any amount of behaviors with different distributions. Extensive empirical studies on two real and widely used datasets confirm the validity of our design and show that our BVAE can outperform the state-of-the-art related baseline methods under multiple evaluation metrics.
- RESContrastive Learning with Frequency-Domain Interest Trends for Sequential Recommendation
by Yichi Zhang (Harbin Engineering University), Guisheng Yin (Harbin Engineering University) and Yuxin Dong (Harbin Engineering University).Recently, contrastive learning for sequential recommendation has demonstrated its powerful ability to learn high-quality user representations. However, constructing augmented samples in the time domain poses challenges due to various reasons, such as fast-evolving trends, interest shifts, and system factors. Furthermore, the F-principle indicates that deep learning preferentially fits the low-frequency part, resulting in poor performance on high-frequency tasks. The complexity of time series and the low-frequency preference limit the utility of sequence encoders. To address these challenges, we need to construct augmented samples from the frequency domain, thus improving the ability to accommodate events of different frequency sizes. To this end, we propose a novel Contrastive Learning with Frequency-Domain Interest Trends for Sequential Recommendation (CFIT4SRec). We treat the embedding representations of historical interactions as “images” and introduce the second-order Fourier transform to construct augmented samples. The components of different frequency sizes reflect the interest trends between attributes and their surroundings in the hidden space. We introduce three data augmentation operations to accommodate events of different frequency sizes: low-pass augmentation, high-pass augmentation, and band-stop augmentation. Extensive experiments on four public benchmark datasets demonstrate the superiority of CFIT4SRec over the state-of-the-art baselines. The implementation code is available at https://github.com/zhangyichi1Z/CFIT4SRec.
- RESCorrecting for Interference in Experiments: A Case Study at Douyin
by Vivek Farias (MIT), Hao Li (Bytedance), Tianyi Peng (MIT), Xinyuyang Ren (Bytedance), Huawei Zhang (Bytedance) and Andrew Zheng (MIT).Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China’s analog of TikTok). In many cases, creators are the natural unit of experimentation, but creators interfere with each other through competition for viewers’ limited time and attention. “Naive” estimators currently used in practice simply ignore the interference, but in doing so incur bias on the order of the treatment effect. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, are impractically high variance. We introduce a novel Monte-Carlo estimator, based on “Differences-in-Qs” (DQ) techniques, which achieves bias which is second-order in the treatment effect, while remaining sample-efficient to estimate. On the theoretical side, our contribution is to develop a generalized theory of Taylor expansions for policy evaluation, which extends DQ theory to all major MDP formulations. On the practical side, we implement our estimator on Douyin’s experimentation platform, and in the process develop DQ into a truly “plug-and-play” estimator for interference in real-world settings: one which provides robust, low-bias, low-variance treatment effect estimates; admits computationally cheap, asymptotically exact uncertainty quantification; and reduces MSE by 99\% compared to the best existing alternatives in our applications.
- RESData-free Knowledge Distillation for Reusing Recommendation Models
by Cheng Wang (Huazhong University of Science and Technology), Jiacheng Sun (Huawei Noah’s Ark Lab), Zhenhua Dong (Huawei Noah’s Ark Lab), Jieming Zhu (Huawei Noah’s Ark Lab), Zhenguo Li (Huawei Noah’s Ark Lab), Ruixuan Li (Huazhong University of Science and Technology) and Rui Zhang (ruizhang.info).A common practice to keep the freshness of an offline Recommender System (RS) is to train models that fit the user’s most recent behaviours while directly replacing the outdated historical model. However, many feature engineering and computing resources are used to train these historical models, but they are underutilized in the downstream RS model training. In this paper, to turn these historical models into treasures, we introduce a model inversed data synthesis framework, which can recover training data information from the historical model and use it for knowledge transfer. This framework synthesizes a new form of data from the historical model. Specifically, we ‘invert’ an off-the-shield pretrained model to synthesize binary class user-item pairs beginning from random noise without requiring any additional information from the training dataset. To synthesize new data from a pretrained model, we update the input from random float initialization rather than one- or multi-hot vectors. An additional statistical regularization is added to further improve the quality of the synthetic data inverted from the deep model with batch normalization. The experimental results show that our framework can generalize across different types of models. We can efficiently train different types of classical Click-Through-Rate (CTR) prediction models from scratch with significantly few inversed synthetic data (2 orders of magnitude). Moreover, our framework can also work well in the knowledge transfer scenarios such as continual updating and data-free knowledge distillation.
- RESDeep Situation-Aware Interaction Network for Click-Through Rate Prediction
by Yimin Lv (Institute of Software, Chinese Academy of Sciences), Shuli Wang (Meituan), Beihong Jin (Institute of Software, Chinese Academy of Sciences), Yisong Yu (Institute of Software, Chinese Academy of Sciences), Yapeng Zhang (Meituan), Jian Dong (Meituan), Yongkang Wang (Meituan), Xingxing Wang (Meituan) and Dong Wang (Meituan).User behavior sequence modeling plays a significant role in Click-Through Rate (CTR) prediction on e-commerce platforms. Except for the interacted items, user behaviors contain rich interaction information, such as the behavior type, time, location, etc. However, so far, the information related to user behaviors has not yet been fully exploited. In the paper, we propose the concept of a situation and situational features for distinguishing interaction behaviors and then design a CTR model named Deep Situation-Aware Interaction Network (DSAIN). DSAIN first adopts the reparameterization trick to reduce noise in the original user behavior sequences. Then it learns the embeddings of situational features by feature embedding parameterization and tri-directional correlation fusion. Finally, it obtains the embedding of behavior sequence via heterogeneous situation aggregation. We conduct extensive offline experiments on three real-world datasets. Experimental results demonstrate the superiority of the proposed DSAIN model. More importantly, DSAIN has increased the CTR by 2.70\%, the CPM by 2.62\%, and the GMV by 2.16\% in the online A/B test. Now, DSAIN has been deployed on the Meituan food delivery platform and serves the main traffic of the Meituan takeout app. Our source code is available at https://github.com/W-void/DSAIN
- RESDisentangling Motives behind Item Consumption and Social Connection for Mutually-enhanced Joint Prediction
by Youchen Sun (Nanyang Technological University), Zhu Sun (A*STAR), Xiao Sha (Nanyang Technological University), Jie Zhang (Nanyang Technological University) and Yew Soon Ong (Nanyang Technological University).Item consumption and social connection, as common user behaviors in many web applications, have been extensively studied. However, most current works separately perform either item or social link prediction tasks, possibly with the help of the other as an auxiliary signal. Moreover, they merely consider the behaviors in a holistic manner yet neglect the multi-faceted motives behind them (e.g., watching movies to kill time or with friends; connecting with others due to friendships or colleagues). To fill the gap, we propose to disentangle the multi-faceted motives in each network, defined respectively by the two types of behaviors, for mutually- enhanced joint prediction (DMJP). Specifically, we first learn the disentangled user representations driven by motives of multi-facets in both networks. Thereafter, the mutual influence of the two networks is subtly discriminated at the facet-to-facet level. The fine-grained mutual influence, proven to be asymmetric, is then exploited to help refine user representations in both networks, with the goal of achieving a mutually-enhanced joint item and social link prediction. Empirical studies on three public datasets showcase the superiority of DMJP against state-of-the-arts (SOTAs) on both tasks.
- RESDistribution-based Learnable Filters with Side Information for Sequential Recommendation
by Haibo Liu (School of Cyber Security and Computer, HeBei university), Zhixiang Deng (School of Cyber Security and Computer, HeBei university), Liang Wang (School of Cyber Security and Computer, HeBei university), Jinjia Peng (School of Cyber Security and Computer, HeBei university) and Shi Feng (School of Computer Science & Engineering, Northeastern University).Sequential Recommendation aims to predict the next item by mining out the dynamic preference from user previous interactions. However, most methods represent each item as a single fixed vector, which is incapable of capturing the uncertainty of item-item transitions that result from time-dependent and multifarious interests of users. Besides, they fail to effectively exploit side information that helps to better express user preferences. At last, the noise in user’s access sequence, which is due to accidental clicks, can interfere with the next item prediction and lead to lower recommendation performance. To deal with these issues, we propose DLFS-Rec, a novel model that combines Distribution-based Learnable Filters with Side information for sequential Recommendation. Specifically, items and their side information are represented by stochastic Gaussian distribution, which is described by mean and covariance embeddings, and then the corresponding embeddings are fused to generate a final representation for each item. To attenuate noise, stacked learnable filter layers are applied to smooth the fused embeddings. The similarities between the distributions inferred from the last filter layer and candidates are measured by 2-Wasserstein distance for generating recommendation list. Extensive experiments on four public real-world datasets demonstrate the superiority of the proposed model over state-of-the-art baselines, especially on cold start users and items.
- RESDomain Disentanglement with Interpolative Data Augmentation for Dual-Target Cross-Domain Recommendation
by Jiajie Zhu (Macquarie University), Yan Wang (Macquarie University), Feng Zhu (Ant Group) and Zhu Sun (Macquarie University).The conventional single-target Cross-Domain Recommendation (CDR) aims to improve the recommendation performance on a sparser target domain by transferring the knowledge from a source domain that contains relatively richer information. By contrast, in recent years, dual-target CDR has been proposed to improve the recommendation performance on both domains simultaneously. However, to this end, there are two challenges in dual-target CDR: (1) how to generate both relevant and diverse augmented user representations, and (2) how to effectively decouple domain-independent information from domain-specific information, in addition to domain-shared information, to capture comprehensive user preferences. To address the above two challenges, we propose a Disentanglement-based framework with Interpolative Data Augmentation for dual-target Cross-Domain Recommendation, called DIDA-CDR. In DIDA-CDR, we first propose an interpolative data augmentation approach to generating both relevant and diverse augmented user representations to augment sparser domain and explore potential user preferences. We then propose a disentanglement module to effectively decouple domain-specific and domain-independent information to capture comprehensive user preferences. Both steps significantly contribute to capturing more comprehensive user preferences, thereby improving the recommendation performance on each domain. Extensive experiments conducted on five real-world datasets show the significant superiority of DIDA-CDR over the state-of-the-art methods.
- RESDREAM: Decoupled Representation via Extraction Attention Module and Supervised Contrastive Learning for Cross-Domain Sequential Recommender
by Xiaoxin Ye (School of Computer Science and Engineering, UNSW), Yun Li (School of Computer Science and Engineering, UNSW) and Lina Yao (CSIRO Data61, School of Computer Science and Engineering UNSW).Cross-Domain Sequential Recommendation(CDSR) aims to generate accurate predictions for future interactions by leveraging users’ cross-domain historical interactions. One major challenge of CDSR is how to jointly learn the single- and cross-domain user preferences efficiently. To enhance the target domain’s performance, most existing solutions start by learning the single-domain user preferences within each domain and then transferring the acquired knowledge from the rich domain to the target domain. However, this approach ignores the inter-sequence item relationship and also limits the opportunities for target domain knowledge to enhance the rich domain performance. Moreover, it also ignores the information within the cross-domain sequence. Despite cross-domain sequences being generally noisy and hard to learn directly, they contain valuable user behavior patterns with great potential to enhance performance. Another key challenge of CDSR is data sparsity, which also exists in other recommendation system problems. In the real world, the data distribution of the recommendation system is highly skewed to the popular products, especially on the large-scale dataset with millions of users and items. One more challenge is the class imbalance problem, inherited by the Sequential Recommendation problem. Generally, each sample only has one positive and thousands of negative samples. To address the above problems together, an innovative Decoupled Representation via Extraction Attention Module (DREAM) is proposed for CDSR to simultaneously learn single- and cross-domain user preference via decoupled representations. A novel Supervised Contrastive Learning framework is introduced to model the inter-sequence relationship as well as address the data sparsity via data augmentations. DREAM also leverages Focal Loss to put more weight on misclassified samples to address the class-imbalance problem, with another uplift on the overall model performance. Extensive experiments had been conducted on two cross-domain recommendation datasets, demonstrating DREAM outperforms various SOTA cross-domain recommendation algorithms achieving up to a 75% uplift in Movie-Book Scenarios.
- RESEquivariant Contrastive Learning for Sequential Recommendation
by Peilin Zhou (HKUST (Guangzhou)), Jingqi Gao (Upstage), Yueqi Xie (HKUST), Qichen Ye (Peking University), Yining Hua (Harvard Medical School), Jaeboum Kim (The University of Hong Kong Science and Technology, Upstage), Shoujin Wang (Data Science Institute, University of Technology Sydney) and Sunghun Kim (The University of Hong Kong Science and Technology).Contrastive learning (CL) benefits the training of sequential recommendation models with informative self-supervision signals. Existing solutions apply general sequential data augmentation strategies to generate positive pairs and encourage their representations to be invariant. However, due to the inherent properties of user behavior sequences, some augmentation strategies, such as item substitution, can lead to changes in user intent. Learning indiscriminately invariant representations for all augmentation strategies might be sub-optimal. Therefore, we propose Equivariant Contrastive Learning for Sequential Recommendation (ECL-SR), which endows SR models with great discriminative power, making the learned user behavior representations sensitive to invasive augmentations (e.g., item substitution) and insensitive to mild augmentations (e.g., feature-level dropout masking). In detail, we use the conditional discriminator to capture differences in behavior due to item substitution, which encourages the user behavior encoder to be equivariant to invasive augmentations. Comprehensive experiments on four benchmark datasets show that the proposed ECL-SR framework achieves competitive performance compared to state-of-the-art SR models. The source code will be released.
- RESExploring False Hard Negative Sample in Cross-Domain Recommendation
by Haokai Ma (Shandong University), Ruobing Xie (WeChat, Tencent), Lei Meng (School of software, Shandong University), Xin Chen (tencent), Xu Zhang (WeChat Search Application Department, Tencent Inc.), Leyu Lin (WeChat Search Application Department, Tencent) and Jie Zhou (Wechat, Tencent).Negative Sampling in recommendation aims to capture informative negative instances for the sparse user-item interactions to improve the performance. Conventional negative sampling methods tend to select informative hard negative samples (HNS) besides the default random samples. However, these hard negative sampling methods usually struggle with false hard negative samples (FHNS), which happens when a user-item interaction has not been observed yet and is picked as a negative sample, while the user will actually interact with this item once exposed to it. Such FHNS issues may seriously confuse the model training, while most conventional hard negative sampling methods do not systematically explore and distinguish FHNS from HNS. To address this issue, we propose a novel model-agnostic Real Hard Negative Sampling (RealHNS) framework specially for cross-domain recommendation (CDR), which aims to discover the false and refine the real from all HNS via both general and cross-domain real hard negative sample selectors. For the general part, we conduct the coarse-grained and fine-grained real HNS selectors sequentially, armed with a dynamic item-based FHNS filter to find high-quality HNS. For the cross-domain part, we further design a new cross-domain HNS for alleviating negative transfer in CDR and discover its corresponding FHNS via a dynamic user-based FHNS filter to keep its power. We conduct experiments on four datasets based on three representative model-agnostic hard negative sampling methods, along with extensive model analyses, ablation studies, and universality analyses. The consistent improvements indicate the effectiveness, robustness, and universality of RealHNS, which is also easy-to-deploy in real-world systems as a plug-and-play strategy. The source code will be released in the future.
- RESFast and Examination-agnostic Reciprocal Recommendation in Matching Markets
by Yoji Tomita (CyberAgent, Inc.), Riku Togashi (CyberAgent, Inc.), Yuriko Hashizume (CyberAgent, Inc.) and Naoto Ohsaka (CyberAgent, Inc.).n matching markets such as job posting and online dating platforms, the recommender system plays a critical role in the success of the platform. Unlike standard recommender systems that suggest items to users, reciprocal recommender systems (RRSs) that suggest other users must take into account the mutual interests of users. In addition, ensuring that recommendation opportunities do not disproportionately favor popular users is essential for the total number of matches and for fairness among users. Existing recommendation methods in matching markets, however, face computational challenges on large-scale platforms and depend on specific examination functions in the position-based model (PBM). In this paper, we introduce the reciprocal recommendation method based on the matching with transferable utility (TU matching) model in the context of ranking recommendations in matching markets and propose a fast and examination-model-free algorithm. Furthermore, we evaluate our approach on experiments with synthetic data and real-world data from an online dating platform in Japan. Our method performs better than or as well as existing methods in terms of the number of total matches and works well even in a large-scale dataset for which one existing method does not work.
- RESFull Index Deep Retrieval: End-to-End User and Item Structures for Cold-start and Long-tail Item Recommendation
by Zhen Gong (Shanghai Jiao Tong University), Xin Wu (Bytedance Inc.), Lei Chen (Bytedance Inc.), Zhenzhe Zheng (Shanghai Jiao Tong University), Shengjie Wang (Bytedance Inc.), Anran Xu (Shanghai Jiao Tong University), Chong Wang (Bytedance Inc.) and Fan Wu (Shanghai Jiao Tong University).End-to-end retrieval models, such as Tree-based Models (TDM) and Deep Retrieval (DR), have attracted a lot of attention, but they are flawed in cold-start and long-tail item recommendation scenarios. Specifically, DR learns a compact indexing structure, enabling efficient and accurate retrieval for large recommendation systems. However, it is discovered that DR largely fails on retrieving cold-start and long-tail items. This is because DR only utilizes user-item interaction data, which is rare and often noisy for cold-start and long-tail items. And the end-to-end retrieval models are unable to make use of the rich item content features. To address this issue while maintaining the efficiency of DR indexing structure, we propose Full Index Deep Retrieval (FIDR) that learns indices for the full corpus items, including cold-start and long-tail items. In addition to the original structure in DR (called User Structure in FIDR) that learns with user-item interaction data (e.g., clicks), we add an Item Structure to embed items directly based on item content features (e.g., categories). With joint efforts of User Structure and Item Structure, FIDR makes cold-start items retrievable and also improves the recommendation quality of long-tail items. To our best knowledge, FIDR is the first to solve the cold-start and long-tail recommendation problem for the end-to-end retrieval models. Through extensive experiments on three real-world datasets, we demonstrate that FIDR can effectively recommend cold-start and long-tail items and largely promote overall recommendation performance without sacrificing inference efficiency. According to the experiments, the recall of FIDR is improved by 8.8% ~ 11.9%, while the inference of FIDR is as efficient as DR.
- RESGenerative Learning Plan Recommendation for Employees: A Performance-aware Reinforcement Learning Approach
by Zhi Zheng (University of Science and Technology of China), Ying Sun (The Hong Kong University of Science and Technology (Guangzhou)), Xin Song (Baidu), Hengshu Zhu (BOSS Zhipin) and Hui Xiong (The Hong Kong University of Science and Technology (Guangzhou)).With the rapid development of enterprise Learning Management Systems (LMS), more and more companies are trying to build enterprise training and course learning platforms for promoting the career development of employees. Indeed, through course learning, many employees have the opportunity to improve their knowledge and skills. For these systems, a major issue is how to recommend learning plans, i.e., a set of courses arranged in the order they should be learned, that can help employees improve their work performance. Existing studies mainly focus on recommending courses that users are most likely to click on by capturing their learning preferences. However, the learning preference of employees may not be the right fit for their career development, and thus it may not necessarily mean their work performance can be improved accordingly. Furthermore, how to capture the mutual correlation and sequential effects between courses, and ensure the rationality of the generated results, is also a major challenge. To this end, in this paper, we propose the Generative Learning plAn recommenDation (GLAD) framework, which can generate personalized learning plans for employees to help them improve their work performance. Specifically, we first design a performance predictor and a rationality discriminator, which have the same transformer-based model architecture, but with totally different parameters and functionalities. In particular, the performance predictor is trained for predicting the work performance of employees based on their work profiles and historical learning records, while the rationality discriminator aims to evaluate the rationality of the generated results. Then, we design a learning plan generator based on the gated transformer and the cross-attention mechanism for learning plan generation. We calculate the weighted sum of the output from the performance predictor and the rationality discriminator as the reward, and we use Self-Critical Sequence Training (SCST) based policy gradient methods to train the generator following the Generative Adversarial Network (GAN) paradigm. Finally, extensive experiments on real-world data clearly validate the effectiveness of our GLAD framework compared with state-of-the-art baseline methods and reveal some interesting findings for talent management
- RESGoal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance Feedback
by Yaxiong Wu (University of Glasgow), Craig Macdonald (University of Glasgow) and Iadh Ounis (University of Glasgow).Interactive recommendation enables users to provide verbal and non-verbal relevance feedback (such as natural-language critiques and likes/dislikes) when viewing a ranked list of recommendations (such as images of fashion products) to guide the recommender system towards their desired items (i.e. goals) across multiple interaction turns. The multi-modal interactive recommendation (MMIR) task has been successfully formulated with deep reinforcement learning (DRL) algorithms by simulating the interactions between an environment (i.e. a user) and an agent (i.e. a recommender system). However, it is typically challenging and unstable to optimise the agent to improve the recommendation quality associated with implicit learning of multi-modal representations in an end-to-end fashion in DRL. This is known as the coupling of policy optimisation and representation learning. To address this coupling issue, we propose a novel goal-oriented multi-modal interactive recommendation model (GOMMIR) that uses both verbal and non-verbal relevance feedback to effectively incorporate the users’ preferences over time. Specifically, our GOMMIR model employs a multi-task learning approach to explicitly learn the multi-modal representations using a multi-modal composition network when optimising the recommendation agent. Moreover, we formulate the MMIR task using goal-oriented reinforcement learning and enhance the optimisation objective by leveraging non-verbal relevance feedback for hard negative sampling and providing extra goal-oriented rewards to effectively optimise the recommendation agent. Following previous work, we train and evaluate our GOMMIR model by using user simulators that can generate natural-language feedback about the recommendations as a surrogate for real human users. Experiments conducted on four well-known fashion datasets demonstrate that our proposed GOMMIR model yields significant improvements in comparison to the existing state-of-the-art baseline models.
- RESGoing Beyond Local: Global Graph-Enhanced Personalized News Recommendations
by Boming Yang (The University of Tokyo), Dairui Liu (University College Dublin), Toyotaro Suzumura (The University of Tokyo), Ruihai Dong (University College Dublin) and Irene Li (The University of Tokyo).Precisely recommending candidate news articles to users has always been a core challenge for personalized news recommendation systems. Most recent work primarily focuses on using advanced natural language processing (NLP) techniques to extract semantic information from rich textual data, employing content-based methods derived from locally viewed historical clicked news. However, this approach lacks a global perspective, failing to account for users’ hidden motivations and behaviors beyond semantic information. To address this challenge, we propose a novel model called GLORY(Global-LOcal news Recommendation sYstem), which combines global news representations learned from other users with local news representations to enhance personalized recommendation systems. We accomplish this by constructing a Global Clicked News Encoder, which includes a global news graph and employs gated graph neural networks to fuse news representations, thereby enriching clicked news representations. Similarly, we extend this approach to a Global Candidate News Encoder, utilizing a global entity graph and candidate news fusion to enhance candidate news representation. Evaluation results on two public news datasets demonstrate that our method outperforms existing approaches. Furthermore, our model offers more diverse recommendations.
- RESGradient Matching for Categorical Data Distillation in CTR Prediction
by Cheng Wang (School of Cyber Science and Engineering,Huazhong University of Science and Technology, Wuhan), Jiacheng Sun (Huawei Noah’s Ark Lab), Zhenhua Dong (Huawei Noah’s Ark Lab), Ruixuan Li (School of Computer Science and Technology,Huazhong University of Science and Technology, Wuhan) and Rui Zhang (ruizhang.info).The cost of hardware and energy consumption on training a click-through rate (CTR) model is highly prohibitive. A recent promising direction for reducing such costs is data distillation with gradient matching, which aims to synthesize a small distilled dataset to guide the model to a similar parameter space as those trained on real data. However, there are two main challenges to implementing such a method in the recommendation field: (1) The categorical recommended data are high dimensional and sparse one- or multi-hot data which will block the gradient flow, causing backpropagation-based data distillation invalid. (2) The data distillation process with gradient matching is computationally expensive due to the bi-level optimization. To this end, we investigate efficient data distillation tailored for recommendation data with plenty of side information where we formulate the discrete data to the dense and continuous data format. Then, we further introduce a one-step gradient matching scheme, which performs gradient matching for only a single step to overcome the inefficient training process. The overall proposed method is called Categorical data distillation with Gradient Matching (CGM), which is capable of distilling a large dataset into a small of informative synthetic data for training CTR models from scratch. Experimental results show that our proposed method not only outperforms the state-of-the-art coreset selection and data distillation methods but also has remarkable cross-architecture performance. Moreover, we explore the application of CGM on continual updating and mitigate the effect of different random seeds on the training results.
- RESgSASRec: Reducing Overconfidence in Sequential Recommendation Trained with Negative Sampling
by Aleksandr V. Petrov (University of Glasgow) and Craig Macdonald (University of Glasgow).Large catalogue size is one of the central challenges in training recommendation models: a large number of items makes it infeasible to compute scores for all items during training, forcing models to deploy negative sampling. However, negative sampling increases the proportion of positive interactions in the training data. Therefore models trained with negative sampling tend to overestimate the probabilities of positive interactions — a phenomenon we call overconfidence. While the absolute values of the predicted scores/probabilities are unimportant for ranking retrieved recommendations, overconfident models may fail to estimate nuanced differences in the top-ranked items, resulting in degraded performance. This paper shows that overconfidence explains why the popular SASRec model underperforms when compared to BERT4Rec (contrary to the BERT4Rec authors’ attribution to the bi-directional attention mechanism). We propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) to mitigate overconfidence and theoretically prove that it can mitigate overconfidence. We further propose the gSASRec model, an improvement over SASRec that deploys an increased number of negatives and gBCE loss. We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem. As a result, gSASRec can outperform BERT4Rec (e.g.\ +9.47\% NDCG on MovieLens-1M), while requiring less training time (e.g.\ -73\% training time on MovieLens-1M). Moreover, in contrast to BERT4Rec, gSASRec is suitable for large datasets that contain more than 1 million items.
- RESHow Should We Measure Filter Bubbles? A Regression Model and Evidence for Online News
by Lien Michiels (UAntwerpen), Jorre Vannieuwenhuyze (Statistiek Vlaanderen), Jens Leysen (University of Antwerp), Robin Verachtert (Froomle NV), Annelien Smets (imec-SMIT, Vrije Universiteit Brussel) and Bart Goethals (University of Antwerp).News media play an important role in democratic societies. Central to fulfilling this role is the premise that users should be exposed to diverse news. However, news recommender systems are gaining popularity on news websites, which has sparked concerns over filter bubbles. Editors, policy-makers and scholars are worried that news recommender systems may expose users to less diverse content over time. To the best of our knowledge, this hypothesis has not been tested in a longitudinal observational study of real users that interact with a real news website. Such observational studies require the use of research methods that are robust and can account for the many covariates that may influence the diversity of recommendations at any given time. In this work, we propose an analysis model to study whether the variety of articles recommended to a user decreases over time, in observational studies of real news websites with real users. Further, we present results from two case studies using aggregated and anonymized data that were collected by two western European news websites employing a collaborative filtering-based news recommender system to serve (personalized) recommendations to their users. Through these case studies we validate empirically that our modeling assumptions are sound and supported by the data, and that our model obtains more reliable and interpretable results than analysis methods used in prior empirical work on filter bubbles. Our case studies provide evidence of a small decrease in the topic variety of a user’s recommendations in the first weeks after they sign up, but no evidence of a decrease in political variety.
- RESIncentivizing Exploration in Linear Contextual Bandits under Information Gap
by Huazheng Wang (Oregon State University), Haifeng Xu (University of Chicago), Chuanhao Li (University of Virginia), Zhiyuan Liu (University of colorado,boulder) and Hongning Wang (University of Virginia).Contextual bandit algorithms have been popularly used to address interactive recommendation, where the users are assumed to be cooperative to explore all recommendations from a system. In this paper, we relax this strong assumption and study the problem of incentivized exploration with myopic users, where the users are only interested in recommendations with their currently highest estimated reward. As a result, in order to obtain long-term optimality, the system needs to offer compensation to incentivize the users to take the exploratory recommendations. We consider a new and practically motivated setting where the context features employed by the user are more \emph{informative} than those used by the system: for example, features based on users’ private information are not accessible by the system. We develop an effective solution for incentivized exploration under such an information gap, and prove that the method achieves a sublinear rate in both regret and compensation. We theoretically and empirically analyze the added compensation due to the information gap, compared with the case where the system has access to the same context features as the user does, i.e., without information gap. Moreover, we also provide a compensation lower bound of this problem.
- RESInTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models
by Kabir Nagrecha (University of California, San Diego), Lingyi Liu (Netflix, Inc.), Pablo Delgado (Netflix, Inc.) and Prasanna Padmanabhan (Netflix, Inc.).Deep learning-based recommendation models (DLRMs) have become an essential component of many modern recommender systems. Several companies are now building large compute clusters reserved only for DLRM training, driving new interest in cost- & time- saving optimizations. The systems challenges faced in this setting are unique; while typical deep learning (DL) training jobs are dominated by model execution times, the most important factor in DLRM training performance is often online data ingestion.
In this paper, we explore the unique characteristics of this data ingestion problem and provide insights into the specific bottlenecks and challenges of the DLRM training pipeline at scale. We study real-world DLRM data processing pipelines taken from our compute cluster to both observe the performance impacts of online ingestion and to identify shortfalls in existing data pipeline optimizers. We find that current tooling either yields sub-optimal performance, frequent crashes, or else requires impractical cluster re-organization to adopt. Our studies lead us to design and build a new solution for data pipeline optimization, InTune. InTune employs a reinforcement learning (RL) agent to learn how to distribute CPU resources across a DLRM data pipeline to more effectively parallelize data-loading and improve throughput. Our experiments show that InTune can build an optimized data pipeline configuration within only a few minutes, and can easily be integrated into existing training workflows. By exploiting the responsiveness and adaptability of RL, InTune achieves significantly higher online data ingestion rates than existing optimizers, thus reducing idle times in model execution and increasing efficiency. We apply InTune to our real-world cluster, and find that it increases data ingestion throughput by as much as 2.29X versus current state-of-the-art data pipeline optimizers while also improving both CPU & GPU utilization.
- RESKGTORe: Tailored Recommendations through Knowledge-aware GNN Models
by Alberto Carlo Maria Mancino (Politecnico di Bari), Antonio Ferrara (Politecnico di Bari), Salvatore Bufi (Polytechnic University of Bari), Daniele Malitesta (Polytechnic University of Bari), Tommaso Di Noia (Polytechnic University of Bari) and Eugenio Di Sciascio (Polytechnic University of Bari).Knowledge graphs (KG) have been proven to be a powerful source of side information to enhance the performance of recommendation algorithms. Their graph-based structure paves the way for the adoption of graph-aware learning models such as Graph Neural Networks (GNNs). In this respect, state-of-the-art models achieve good performance and interpretability via user-level combinations of intents leading users to their choices. Unfortunately, such results often come from and end-to-end learnings that considers a combination of the whole set of features contained in the KG without any analysis of the user decisions. In this paper, we introduce KGTORe, a GNN-based model that exploits KG to learn latent representations for the semantic features, and consequently, interpret the user decisions as a personal distillation of the item feature representations. Differently from previous models, KGTORe does not need to process the whole KG at training time but relies on a selection of the most discriminative features for the users, thus resulting in improved performance and personalization. Experimental results on three well-known datasets show that KGTORe achieves remarkable accuracy performance and several ablation studies demonstrate the effectiveness of its components.
- RESKnowledge-based Multiple Adaptive Spaces Fusion for Recommendation
by Meng Yuan (Institute of Artificial Intelligence, Beihang University, Beijing 100191, China), Fuzhen Zhuang (Institute of Artificial Intelligence, Beihang University, Beijing 100191, China), Zhao Zhang (University of Chinese Academy of Sciences, Beijing 100191, China), Deqing Wang (School of Computer Science and Engineering, Beihang University, Beijing 100191, China) and Jin Dong (Beijing Academy of Blockchain and Edge Computing).Since Knowledge Graphs (KGs) contain rich semantic information, recently there has been an influx of KG-enhanced recommendation methods. Most of existing methods are entirely designed based on euclidean space without considering curvature. However, recent studies have revealed that a tremendous graph-structured data exhibits highly non-euclidean properties. Motivated by these observations, in this work, we propose a knowledge-based multiple adaptive spaces fusion method for recommendation, namely MCKG. Unlike existing methods that solely adopt a specific manifold, we introduce the unified space that is compatible with hyperbolic, euclidean and spherical spaces. Furthermore, we fuse the multiple unified spaces in an attention manner to obtain the high-quality embeddings for better knowledge propagation. In addition, we propose a geometry-aware optimization strategy which enables the pull and push processes benefited from both hyperbolic and spherical spaces. Specifically, in hyperbolic space, we set smaller margins in the area near to the origin, which is conducive to distinguishing between highly similar positive items and negative ones. At the same time, we set larger margins in the area far from the origin to ensure the model has sufficient error tolerance. The similar manner also applies to spherical spaces. Extensive experiments on three real-world datasets demonstrate that the MCKG has a significant improvement over state-of-the-art recommendation methods. Further ablation experiments verify the importance of multi-space fusion and geometry-aware optimization strategy, justifying the rationality and effectiveness of MCKG.
- RESMasked and Swapped Sequence Modeling for Next Novel Basket Recommendation in Grocery Shopping
by Ming Li (University of Amsterdam), Mozhdeh Ariannezhad (University of Amsterdam), Andrew Yates (University of Amsterdam) and Maarten de Rijke (University of Amsterdam).Next basket recommendation (NBR) is the task of predicting the next set of items based on a sequence of already purchased baskets. It is a recommendation task that has been widely studied, especially in the context of grocery shopping. In NBR, it is useful to distinguish between repeat items, i.e., items that a user has consumed before, and explore items, i.e., items that a user has not consumed before. Most NBR work either ignores this distinction or focuses on repeat items.
We formulate the next novel basket recommendation (NNBR) task, i.e., the task of recommending a basket that only consists of novel items, which is valuable for both real-world application and NBR evaluation. We evaluate how existing NBR methods perform on the NNBR task and find that, so far, limited progress has been made w.r.t. the NNBR task. To address the NNBR task, we propose a simple bi-directional transformer basket recommendation model (BTBR), which is focused on directly modeling item-to-item correlations within and across baskets instead of learning complex basket representations. To properly train BTBR, we propose and investigate several masking strategies and training objectives: (i) item-level random masking, (ii) item-level select masking, (iii) basket-level all masking, (iv) item basket-level explore masking, and (v) joint masking. In addition, an item-basket swapping strategy is proposed to enrich the item interactions within the same baskets.
We conduct extensive experiments on three open datasets with various characteristics. The results demonstrate the effectiveness of BTBR and our masking and swapping strategies for the NNBR task. BTBR with a properly selected masking and swapping strategy can substantially improve the NNBR performance.
- RESMulti-Relational Contrastive Learning for Recommendation
by Wei Wei (University of Hong Kong), Lianghao Xia (University of Hong Kong) and Chao Huang (University of Hong Kong).Dynamic behavior modeling has become a crucial task for personalized recommender systems that aim to learn users’ time-evolving preferences on online platforms. However, many recommendation models rely on a single type of behavior learning, which significantly limits their ability to represent user-item relationships in real-life applications where interactions between users and items often come in multiple types (e.g., click, tag-as-favorite, review, and purchase). To offer better recommendations, this paper proposes the Evolving Graph Contrastive Memory Network (EGCM) to model dynamic interaction heterogeneity. Firstly, we develop a multi-relational graph encoder to capture short-term preference heterogeneity and preserve the dedicated relation semantics for different types of user-item interactions. Additionally, we design a dynamic cross-relational memory network that enables EGCM to capture users’ long-term multi-behavior preferences and the underlying evolving cross-type behavior dependencies over time. To obtain robust and informative user representations with both commonality and diversity across multi-behavior interactions, we design a multi-relational contrastive learning paradigm with heterogeneous short- and long-term interest modeling. We further provide theoretical analyses to support the modeling of commonality and diversity from the perspective of enhancing model optimization. Experiments on several real-world datasets demonstrate the superiority of our recommender system over various state-of-the-art baselines.
- RESMulti-task Item-attribute Graph Pre-training for Strict Cold-start Item Recommendation
by Yuwei Cao (University of Illinois at Chicago), Liangwei Yang (University of Illinois Chicago), Chen Wang (University of Illinois Chicago), Zhiwei Liu (Salesforce Inc.), Hao Peng (Beihang University), Chenyu You (Yale University) and Philip Yu (University of Illinois Chicago).Recommendation systems suffer in the strict cold-start (SCS) scenario, where the user-item interactions are entirely unavailable. The well-established, dominating identity (ID)-based approaches completely fail to work. Cold-start recommenders, on the other hand, leverage item contents (brand, title, descriptions, etc.) to map the new items to the existing ones. However, the existing SCS recommenders explore item contents in coarse-grained manners that introduce noise or information loss. Moreover, informative data sources other than item contents, such as users’ purchase sequences and review texts, are largely ignored. In this work, we explore the role of the fine-grained item attributes in bridging the gaps between the existing and the SCS items and pre-train a knowledgeable item-attribute graph for SCS item recommendation. Our proposed framework, ColdGPT, models item-attribute correlations into an item-attribute graph by extracting fine-grained attributes from item contents. ColdGPT then transfers knowledge into the item-attribute graph from various available data sources, i.e., item contents, historical purchase sequences, and review texts of the existing items, via multi-task learning. To facilitate the positive transfer, ColdGPT designs specific submodules according to the natural forms of the data sources and proposes to coordinate the multiple pre-training tasks via unified alignment-and-uniformity losses. Our pre-trained item-attribute graph acts as an implicit, extendable item embedding matrix, which enables the SCS item embeddings to be easily acquired by inserting these items into the item-attribute graph and propagating their attributes’ embeddings. We carefully process three public datasets, i.e., Yelp, Amazon-home, and Amazon-sports, to guarantee the SCS setting for evaluation. Extensive experiments show that ColdGPT consistently outperforms the existing SCS recommenders by large margins and even surpasses models that are pre-trained on 75 – 224 times more, cross-domain data on two out of four datasets. Our code and pre-processed datasets for SCS evaluations are publicly available to help future SCS studies.
- RESOnline Matching: A Real-time Bandit System for Large-scale Recommendations
by Xinyang Yi (Google), Shao-Chuan Wang (Google), Ruining He (Google), Hariharan Chandrasekaran (Google), Charles Wu (Google), Lukasz Heldt (Google), Lichan Hong (Google), Minmin Chen (Google) and Ed Chi (Google).The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users’ past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce \textsl{Online Matching}: a scalable closed-loop bandit system learning from users’ direct feedback on items in real time. We present a hybrid \textsl{offline + online} approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB — a novel extension of the LinUCB algorithm — to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.
- RESPairwise Intent Graph Embedding Learning for Context-Aware Recommendation
by Dugang Liu (Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)), Yuhao Wu (Shenzhen University), Weixin Li (Shenzhen University), Xiaolian Zhang (Huawei 2012 Lab), Hao Wang (Huawei 2012 Lab), Qinjuan Yang (Huawei 2012 Lab) and Zhong Ming (College of Computer Science and Software Engineering, Shenzhen University).Although knowledge graph have shown their effectiveness in mitigating data sparsity in many recommendation tasks, they remain underutilized in context-aware recommender systems (CARS) with the specific sparsity challenges associated with the contextual features, i.e., feature sparsity and interaction sparsity. To bridge this gap, in this paper, we propose a novel pairwise intent graph embedding learning (PING) framework to efficiently integrate knowledge graph into CARS. Specifically, our PING contains three modules: 1) a graph construction module is used to obtain a pairwise intent graph (PIG) containing nodes for users, items, entities and enhanced intent, where enhanced intent nodes are generated by applying user intent fusion (UIF) on relational intent and contextual intent, and two sub-intents are derived from the semantic information and contextual information, respectively; 2) a pairwise intent joint graph convolution module is used to obtain the refined embeddings of all the features by executing a customized convolution strategy on PIG, where each enhanced intent node acts as a hub to efficiently propagate information among different features and between all the features and knowledge graph; 3) a recommendation module with the refined embeddings is used to replace the randomly initialized embeddings of downstream recommendation models to improve model performance. Finally, we conduct extensive experiments on three public datasets to verify the effectiveness and compatibility of our PING.
- RESReciprocal Sequential Recommendation
by Bowen Zheng (Renmin University of China), Yupeng Hou (Renmin University of China), Wayne Xin Zhao (Renmin University of China), Yang Song (BOSS Zhipin) and Hengshu Zhu (BOSS Zhipin).Reciprocal recommender system (RRS), considering a two-way matching between two parties, has been widely applied in online platforms like online dating and recruitment. Existing RRS models mainly capture static user preferences, which have neglected the evolving user tastes and the dynamic matching relation between the two parties. Although dynamic user modeling has been well-studied in sequential recommender systems, existing solutions are developed in a user-oriented manner. Therefore, it is non-trivial to adapt sequential recommendation algorithms to reciprocal recommendation. In this paper, we formulate RRS as a distinctive sequence matching task, and further propose a new approach ReSeq for RRS, which is short for Reciprocal Sequential recommendation. To capture duel-perspective matching, we propose to learn fine-grained sequence similarities by co-attention mechanism across different time steps. Further, to improve the inference efficiency, we introduce the self-distillation technique to distill knowledge from the fine-grained matching module into the more efficient student module. In the deployment stage, only the efficient student module is used, greatly speeding up the similarity computation. Extensive experiments on five real-world datasets from two scenarios demonstrate the effectiveness and efficiency of the proposed method. Our code is available at https://anonymous.4open.science/r/ReSeq/.
- RESRethinking Multi-Interest Learning for Candidate Matching in Recommender Systems
by Yueqi Xie (HKUST), Jingqi Gao (Upstage), Peilin Zhou (HKUST (gz)), Qichen Ye (Peking University), Yining Hua (Massachusetts Institute of Technology), Jae Boum Kim (Hong Kong University of Science and Technology), Fangzhao Wu (MSRA) and Sunghun Kim (Hong Kong University of Science and Technology).Existing research efforts for multi-interest candidate matching in recommender systems mainly focus on improving model architecture or incorporating additional information, neglecting the importance of training schemes. This work revisits the training framework and uncovers two major problems hindering the expressiveness of learned multi-interest representations. First, the current training objective (i.e., uniformly sampled softmax) fails to effectively train discriminative representations in a multi-interest learning scenario due to the severe increase in easy negative samples. Second, a routing collapse problem is observed where each learned interest may collapse to express information only from a single item, resulting in information loss. To address these issues, we propose the REMI framework, consisting of an Interest-aware Hard Negative mining strategy (IHN) and a Routing Regularization (RR) method. IHN emphasizes interest-aware hard negatives by proposing an ideal sampling distribution and developing a Monte-Carlo strategy for efficient approximation. RR prevents routing collapse by introducing a novel regularization term on the item-to-interest routing matrices. These two components enhance the learned multi-interest representations from both the optimization objective and the composition information. REMI is a general framework that can be readily applied to various existing multi-interest candidate matching methods. Experiments on three real-world datasets show our method can significantly improve state-of-the-art methods with easy implementation and negligible computational overhead. The source code is available at https://anonymous.4open.science/r/ReMIRec-B64C/.
- RESSPARE: Shortest Path Global Item Relations for Efficient Session-based Recommendation
by Andreas Peintner (Universität Innsbruck), Amir Reza Mohammadi (Universität Innsbruck) and Eva Zangerle (Universität Innsbruck).Session-based recommendation aims to predict the next item based on a set of anonymous sessions. Capturing user intent from a short interaction sequence imposes a variety of challenges since no user profiles are available and interaction data is naturally sparse. Recent approaches relying on graph neural networks (GNNs) for session-based recommendation use global item relations to explore collaborative information from different sessions. These methods capture the topological structure of the graph and rely on multi-hop information aggregation in GNNs to exchange information along edges. Consequently, graph-based models suffer from noisy item relations in the training data and introduce high complexity for large item catalogs. We propose to explicitly model the multi-hop information aggregation mechanism over multiple layers via shortest-path edges based on knowledge from the sequential recommendation domain. Our approach does not require multiple layers to exchange information and ignores unreliable item-item relations. Furthermore, to address inherent data sparsity, we are the first to apply supervised contrastive learning by mining data-driven positive and hard negative item samples from the training data. Extensive experiments on three different datasets show that the proposed approach outperforms almost all of the state-of-the-art methods.
- RESSTAN: Stage-Adaptive Network for Multi-Task Recommendation by Learning User Lifecycle-Based Representation
by Wanda Li (Tsinghua University), Wenhao Zheng (Shopee Company), Xuanji Xiao (Shopee Company) and Suhang Wang (Penn State University).Recommendation systems play a vital role in many online platforms, with their primary objective being to satisfy and retain users. As directly optimizing user retention is challenging, multiple evaluation metrics are often employed. Existing methods generally formulate the optimization of these evaluation metrics as a multi-task learning problem, but often overlook the fact that user preferences for different tasks are personalized and change over time. Identifying and tracking the evolution of user preferences can lead to better user retention. To address this issue, we introduce the concept of “user lifecycle,” consisting of multiple stages characterized by users’ varying preferences for different tasks. We propose a novel \textbf{St}age-\textbf{A}daptive \textbf{N}etwork (\textbf{STAN}) framework for modeling user lifecycle stages. STAN first identifies latent user lifecycle stages based on learned user preferences, and then employs the stage representation to enhance multi-task learning performance. Our experimental results using both public and industrial datasets demonstrate that the proposed model significantly improves multi-task prediction performance compared to state-of-the-art methods, highlighting the importance of considering user lifecycle stages in recommendation systems. Furthermore, online A/B testing reveals that our model outperforms the existing model, achieving a significant improvement of 3.05\% in staytime per user and 0.88\% in CVR. These results indicate that our approach effectively improves the overall efficiency of the multi-task recommendation system.
- RESSTRec: Sparse Transformer for Sequential Recommendations
by Chengxi Li (City University of Hong Kong), Xiangyu Zhao (City University of Hong Kong), Yejing Wang (City University of Hong Kong), Qidong Liu (Xi’an Jiaotong University, City University of Hong Kong), Wanyu Wang (City University of Hong Kong), Yiqi Wang (Michigan State University), Lixin Zou (Wuhan University), Wenqi Fan (The Hong Kong Polytechnic University) and Qing Li (The Hong Kong Polytechnic University).With the rapid evolution of transformer architectures, an increasing number of researchers are exploring their application in sequential recommender systems (SRSs). Compared with the former SRS models, the transformer-based models get promising performance on SRS tasks. Existing transformer-based SRS frameworks, however, retain the vanilla attention mechanism, which calculates the attention scores between all item-item pairs in each layer, i.e., item interactions. Consequently, redundant item interactions may downgrade the inference speed and cause high memory costs for the model. In this paper, we first identify the sparse information phenomenon in transformer-based SRS scenarios and propose an efficient model, i.e., Sparse Transformer sequential Recommendation model (STRec). First, we devise a cross-attention-based sparse transformer for efficient sequential recommendation. Then, a novel sampling strategy is derived to preserve the necessary interactions. Extensive experimental results validate the effectiveness of our framework, which could outperform the state-of-the-art accuracy while reducing 54% inference time and 70% memory cost. Besides, we provide massive extended experiments to further investigate the property of our framework. Our code is available to ease reproducibility.
- RESTask Aware Feature Extraction Framework for Sequential Dependence Multi-Task Learning
by Xuewen Tao (Mybank, Ant Group), Mingming Ha (School of Automation and Electrical Engineering, University of Science and Technology Bejing; Mybank, Ant Group), Qiongxu Ma (Mybank, Ant Group), Hongwei Cheng (Mybank, Ant Group), Wenfang Lin (Mybank, Ant Group) and Xiaobo Guo (Institute of Information Science, Beijing Jiaotong Univeristy; Mybank, Ant Group).In online recommendation, financial service, etc., the most common application of multi-task learning (MTL) is the multi-step conversion estimations. A core property of the multi-step conversion is the sequential dependence among tasks. Most existing works focus far more on the specific post-view click-through rate (CTR) and post-click conversion rate (CVR) estimations, which neglect the generalization of sequential dependence multi-task learning (SDMTL). Besides, the performance of the SDMTL framework is also deteriorated by the interference derived from implicitly conflict information passing between adjacent tasks. In this paper, a systematic learning paradigm of the SDMTL problem is established for the first time, which can transform the SDMTL problem into a general MTL problem and be applicable to more general multi-step conversion scenarios with longer conversion path or stronger task dependence. Also, the distribution dependence between adjacent task spaces is illustrated from a theoretical point of view. On the other hand, an SDMTL architecture, named Task Aware Feature Extraction (TAFE), is developed to enable dynamic task representation learning from a sample-wise view. TAFE selectively reconstructs the implicit shared information corresponding to each sample case and performs explicit task-specific extraction under dependence constraints. Extensive experiments on offline public and real-world industrial datasets, and online A/B implementations demonstrate the effectiveness and applicability of proposed theoretical and implementation frameworks.
- RESTowards Robust Fairness-aware Recommendation
by Hao Yang (Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China), Zhining Liu (Ant Group), Zeyu Zhang (Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China), Chenyi Zhuang (Ant Group) and Xu Chen (Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China).Due to the progressive advancement of trustworthy machine learning algorithms, fairness in recommender systems is attracting increasing attention and is often considered from the perspective of users. Conventional fairness-aware recommendation models make the assumption that user preferences remain the same between the training set and the testing set. However, this assumption is disagreed with reality, where user preference can shift in the testing set due to the natural spatial or temporal heterogeneity. It is concerning that conventional fairness-aware models may be unaware of such distribution shifts, leading to a sharp decline in the model performance. To address the distribution shift problem, we propose a robust fairness-aware recommendation framework based on Distributionally Robust Optimization (DRO) technique. In specific, we assign learnable weights for each sample to approximate the distributions that leads to the worst-case model performance, and then optimize the fairness-aware recommendation model to improve the worst-case performance in terms of both fairness and recommendation accuracy. By iteratively updating the weights and the model parameter, our framework can be robust to unseen testing sets. To ease the learning difficulty of DRO, we use a hard clustering technique to reduce the number of learnable sample weights. To optimize our framework in a full differentiable manner, we soften the above clustering strategy. Empirically, we conduct extensive experiments based on four real-world datasets to verify the effectiveness of our proposed framework. For benefiting the research community, we have released our project at https://anonyrobfair.github.io/.
- RESTrending Now: Modeling Trend Recommendations
by Hao Ding (AWS AI Labs), Branislav Kveton (AWS AI Labs), Yifei Ma (AWS AI Labs), Youngsuk Park (AWS AI Labs), Venkataramana Kini (AWS AI Labs), Yupeng Gu (AWS AI Labs), Ravi Divvela (AWS AI Labs), Fei Wang (AWS AI Labs), Anoop Deoras (AWS AI Labs) and Hao Wang (AWS AI Labs).Modern recommender systems usually include separate recommendation carousels such as ‘trending now’ to list trending items and further boost their popularity, thereby attracting active users. Though widely useful, such ‘trending now‘ carousels typically generate item lists based on simple heuristics, e.g., the number of interactions within a time interval, and therefore still leave much room for improvement. This paper aims to systematically study this under-explored but important problem from the new perspective of time series forecasting. We first provide a set of rigorous definitions related to item trendiness with associated evaluation protocols, and then propose a deep latent variable model, dubbed Trend Recommender (TrendRec), to forecast items’ future trend and generate trending item lists. Experiments on real-world datasets from various domains show that our TrendRec significantly outperforms the baselines, verifying our model’s effectiveness.
- RESTwo-sided Calibration for Quality-aware Responsible Recommendation
by Chenyang Wang (Tsinghua University), Yankai Liu (China Mobile Research), Yuanqing Yu (Tsinghua University), Weizhi Ma (Tsinghua University), Min Zhang (Tsinghua University), Yiqun Liu (Tsinghua University), Haitao Zeng (China Mobile Research), Junlan Feng (China Mobile Research) and Chao Deng (China Mobile Research).Calibration in recommender systems ensures that the user’s interests distribution over groups of items is reflected with their corresponding proportions in the recommendation, which has gained increasing attention recently. For example, a user who watched 80 entertainment videos and 20 knowledge videos is expected to receive recommendations comprising about 80% entertainment and 20% knowledge videos as well. However, with the increasing calls for quality-aware responsible recommendation, it has become inadequate to just match users’ historical behaviors, which could still lead to undesired effects at the system level (e.g., overwhelming clickbaits). In this paper, we envision the two-sided calibration task that not only matches the users’ past interests distribution (user-level calibration) but also guarantees an overall target exposure distribution of different item groups (system-level calibration). The target group exposure distribution can be explicitly pursued by users, platform owners, and even the law (e.g., the platform owners expect about 50% knowledge video recommendation on the whole). To support this scenario, we propose a post-processing method named PCT. PCT first solves personalized calibration targets that minimize the changes in users’ historical interest distributions while ensuring the overall target group exposure distribution. Then, PCT reranks the original recommendation lists according to personalized calibration targets to generate both relevant and two-sided calibrated recommendations. Extensive experiments demonstrate the superior performance of the proposed method compared to calibrated and fairness-aware recommendation approaches. We also release a new dataset with item quality annotations to support further studies about quality-aware responsible recommendation.
- RESUncovering User Interest from Biased and Noised Watch Time in Video Recommendation
by Haiyuan Zhao (Renmin University of China), Lei Zhang (Renmin University of China), Jun Xu (Renmin University of China), Guohao Cai (Huawei Noah’s ark lab), Zhenhua Dong (Huawei Noah’s ark lab) and Ji-Rong Wen (Renmin University of China).In micro-video recommendation scenarios, watch time is commonly adopted as an indicator of users’ interest. However, watch time is not only determined by the matching of users’ interests but is affected by other factors. These factors mainly lie in two folds: on the one hand, users tend to spend more time on those charming videos with the growth of the duration (i.e., video length), named as duration bias; on the other hand, it costs people a period of time to judge whether they like the video, named as noisy watching. Consequently, the existence of duration bias and noisy watching make watch time an inadequate label for training a reliable recommendation model. Moreover, current methods focus only on the duration bias and ignore the duration noise, so they do not really uncover the user interest from watch time. In this study, we first analyze the generation mechanism of users’ watch time in a unified causal viewpoint. Unlike current methods, which only notice the duration bias in watch time, we considered the watch time as a mixture of the user’s actual interest, the duration biased watch time, and the noisy watch time. To mitigate both the duration bias and noisy watching, we propose Debiased and Denoised watch time Correction (D$^2$Co), which can be divided into two steps: First, we employ a duration-wise Gaussian Mixture Model plus frequency-weighted moving average for estimating the bias and noise terms; Then we utilize a sensitivity-controlled correction function to separate the user interest from the watch time, which is robust to the estimation error of bias and noise terms. The experiments on two public video recommendation datasets indicate the effectiveness of the proposed method.
- RESUnderstanding and Modeling Passive-Negative Feedback for Short-video Sequential Recommendation
by Yunzhu Pan (UESTC), Chen Gao (Tsinghua University), Yang Song (Kuaishou Inc.), Kun Gai (Unaffiliated), Depeng Jin (Department of Electronic Engineering, Tsinghua University) and Yong Li (Tsinghua University).Sequential recommendation is one of the most important tasks in recommender systems, which aims to recommend the next interacted item with historical behaviors as input. Traditional sequential recommendation always mainly considers the collected positive feedback such as click, purchase, etc. However, in short-video platforms such as TikTok, video viewing behavior may not always represent positive feedback. Specifically, the videos are played automatically, and users passively receive the recommended videos. In this new scenario, users passively express negative feedback by skipping over videos they do not like, which provides valuable information about their preferences. Different from the negative feedback studied in traditional recommender systems, this passive-negative feedback can reflect users’ interests and serve as an important supervision signal in extracting users’ preferences. Therefore, it is essential to carefully design and utilize it in this novel recommendation scenario. In this work, we first conduct analyses based on a large-scale real-world short-video behavior dataset and illustrate the significance of leveraging passive feedback. We then propose a novel method that deploys the sub-interest encoder, which incorporates positive feedback and passive-negative feedback as supervision signals to learn the user’s current active sub-interest. Moreover, we introduce an adaptive fusion layer to integrate various sub-interests effectively. To enhance the robustness of our model, we then introduce a multi-task learning module to simultaneously optimize two kinds of feedback – passive-negative feedback and traditional randomly-sampled negative feedback. The experiments on two large-scale datasets verify that the proposed method can significantly outperform state-of-the-art approaches. The codes and collected datasets are anonymously released at https:// anonymous.4open.science/ r/ SINE-2047/ to benefit the community.
- RESWhat We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory
by Yang Liu (University of Helsinki), Alan Medlar (University of Helsinki) and Dorota Glowacka (University of Helsinki).Current practices in offline evaluation use rank-based metrics to measure the quality of recommendation lists. This approach has practical benefits as it centers assessment on the output of the recommender system and, therefore, measures performance from the perspective of end-users. However, this methodology neglects how recommender systems more broadly model user preferences, which is not captured by only considering the top-n recommendations. In this article, we use item response theory (IRT), a family of latent variable models used in psychometric assessment, to gain a comprehensive understanding of offline evaluation. We used IRT to jointly estimate the latent abilities of 51 recommendation algorithms and the characteristics of 3 commonly used benchmark data sets. For all data sets, the latent abilities estimated by IRT suggest that higher scores from traditional rank-based metrics do not reflect improvements in modeling user preferences. Furthermore, we show the top-n recommendations with the most discriminatory power are biased towards lower difficulty items, leaving much room for improvement. Lastly, we highlight the role of popularity in evaluation by investigating how user engagement and item popularity influence recommendation difficulty.
- RESWhen Fairness meets Bias: a Debiased Framework for Fairness aware Top-N Recommendation
by Jiakai Tang (Gaoling School of Artificial Intelligence, Renmin University of China), Shiqi Shen (Wechat, Tencent, Beijing), Zhipeng Wang (Wechat, Tencent, Beijing), Zhi Gong (Wechat, Tencent, Beijing), Jingsen Zhang (Gaoling School of Artificial Intelligence, Renmin University of China) and Xu Chen (Gaoling School of Artificial Intelligence, Renmin University of China).Fairness in the recommendation domain has recently attracted increasing attention due to the more and more concerns on the algorithm discrimination and ethics. While recent years have witnessed many promising fairness aware recommender models, an important problem has been largely ignored, that is, the fairness can be biased due to the user personalized selection tendencies or the non-uniform item exposure probabilities. To study this problem, in this paper, we formally define a novel task named as unbiased fairness aware Top-N recommendation. For solving this task, we firstly define an ideal loss function based on all the user-item pairs. Considering that, in real-world datasets, only a small number of user-item interactions can be observed, we then approximate the above ideal loss with a more tractable objective based on the inverse propensity score (IPS). Since the recommendation datasets can be noisy and quite sparse, which brings difficulties for accurately estimating the IPS, we propose to optimize the objective in an IPS range instead of a specific point, which improve the model fault tolerance capability. In order to make our model more applicable to the commonly studied Top-N recommendation, we soften the ranking metrics such as Precision, Hit-Ratio and NDCG to derive an fully differentiable framework. We conduct extensive experiments to demonstrate the effectiveness of our model based on four real-world datasets.
List of all short papers accepted for RecSys 2023 (in alphabetical order).
- RESA Probabilistic Position Bias Model for Short-Video Recommendation Feeds
by Olivier Jeunen (ShareChat UK).Modern web-based platforms often show ranked lists of recommendations to users, in an attempt to maximise user satisfaction or business metrics. Typically, the goal of such systems boils down to maximising the exposure probability — conversely, minimising the rank— for items that are deemed “reward-maximising” according to some metric of interest. This general framing comprises music or movie streaming applications, as well as e-commerce, restaurant or job recommendations, and even web search. Position bias or user models can be used to estimate exposure probabilities for each use-case, specifically tailored to how users interact with the presented rankings. A unifying factor in these diverse problem settings is that typically only one or several items will be engaged with (clicked, streamed, purchased, et cetera) before a user leaves the ranked list. Short-video feeds on social media platforms diverge from this general framing in several ways, most notably that users do not tend to leave the feed after, for example, liking a post. Indeed, seemingly infinite feeds invite users to scroll further down the ranked list. For this reason, existing position bias or user models tend to fall short in such settings, as they do not accurately capture users’ interaction modalities. In this work, we propose a novel and probabilistically sound personalised position bias model for feed recommendations. We focus on a 1st-level feed in a hierarchical structure, where users may enter a 2nd-level feed via any given 1st-level item. We posit that users come to the platform with a given scrolling budget that is drawn according to a discrete power-law distribution, and show how the survival function of said distribution can be used to obtain closed-form estimates for personalised exposure probabilities. Empirical insights gained through data from a large-scale social media platform show how our probabilistic position bias model more accurately captures empirical exposure than existing models, and paves the way for improved unbiased evaluation and learning-to-rank.
- RESADRNet: A Generalized Collaborative Filtering Framework Combining Clinical and Non-Clinical Data for Adverse Drug Reaction Prediction
by Haoxuan Li (Center for Data Science, Peking University), Taojun Hu (Peking University), Zetong Xiong (Zhongnan University of Economic and Law), Chunyuan Zheng (University of California, San Diego), Fuli Feng (University of Science and Technology of China), Xiangnan He (University of Science and Technology of China) and Xiao-Hua Zhou (Peking University).Adverse drug reaction (ADR) prediction plays a crucial role in both health care and drug discovery for reducing patient mortality and enhancing drug safety. Recently, many studies have been devoted to effectively predict the drug-ADRs incidence rates. However, these methods either did not effectively utilize non-clinical data, i.e., physical, chemical, and biological information about the drug, or did little to establish a link between content-based and pure collaborative filtering during the training phase. In this paper, we first formulate the prediction of multi-label ADRs as a drug-ADR collaborative filtering problem, and to the best of our knowledge, this is the first work to provide extensive benchmark results of previous collaborative filtering methods on two large publicly available clinical datasets. Then, by exploiting the easy accessible drug characteristics from non-clinical data, we propose ADRNet, a generalized collaborative filtering framework combining clinical and non-clinical data for drug-ADR prediction. Specifically, ADRNet has a shallow collaborative filtering module and a deep drug representation module, which can exploit the high-dimensional drug descriptors to further guide the learning of low-dimensional ADR latent embeddings, which incorporates both the benefits of collaborative filtering and representation learning. Extensive experiments are conducted on two publicly available real-world drug-ADR clinical datasets and two non-clinical datasets to demonstrate the accuracy and efficiency of the proposed ADRNet.
- RESUsing Learnable Physics for Real-Time Exercise Form Recommendations
by Abhishek Jaiswal (Indian Institute of Technology Kanpur), Gautam Chauhan (Indian Institute of Technology Kanpur) and Nisheeth Srivastava (Indian Institute of Technology Kanpur).Good posture and form are essential for safe and productive exercising. Even in gym settings, trainers may not be readily available for feedback. Rehabilitation therapies and fitness workouts can thus benefit from recommender systems that provide real-time evaluation. In this paper, we present an algorithmic pipeline that can diagnose problems in exercises technique and offer corrective recommendations, with high sensitivity and specificity, in real-time. We use MediaPipe for pose recognition, count repetitions using peak-prominence detection, and use a learnable physics simulator to track motion evolution for each exercise. A test video is diagnosed based on deviations from the prototypical learned motion using statistical learning. The system is evaluated on six full and upper body exercises. These real-time interactive suggestions counseled via low-cost equipment like smartphones will allow exercisers to rectify potential mistakes making self-practice feasible while reducing the risk of workout injuries.
- RESReCon: Reducing Congestion in Job Recommendation using Optimal Transport
by Yoosof Mashayekhi (Ghent University), Bo Kang (Ghent University), Jefrey Lijffijt (Ghent University) and Tijl de Bie (Ghent University).Recommender systems may suffer from congestion, meaning that there is an unequal distribution of the items in how often they are recommended. Some items may be recommended much more than others. Recommenders are increasingly used in domains where items have limited availability, such as the job market, where congestion is especially problematic: Recommending a vacancy—for which typically only one person will be hired—to a large number of job seekers may lead to frustration for job seekers, as they may be applying for jobs where they are not hired. This may also leave vacancies unfilled and result in job market inefficiency. We propose a novel approach to job recommendation called ReCon, accounting for the congestion problem. Our approach is to use an optimal transport component to ensure a more equal spread of vacancies over job seekers, combined with a job recommendation model in a multi-objective optimization problem. We evaluated our approach on two real-world job market datasets. The evaluation results show that ReCon has good performance on both congestion-related (e.g., Congestion) and desirability (e.g., NDCG) measures.
- RESOptimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
by Ruiyang Xu (Meta AI), Jalaj Bhandari (Meta AI), Dmytro Korenkevych (Meta AI), Fan Liu (Meta), Yuchen He (Meta), Alex Nikulkov (Meta AI) and Zheqing Zhu (Meta AI).Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement a \textit{one-step policy improvement approach} that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is based on dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system, which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.
- RESAnalysis Operations for Constraint-based Recommender Systems
by Sebastian Lubos (Institute of Software Technology – Graz University of Technology), Viet-Man Le (Graz University of Technology), Alexander Felfernig (TU Graz) and Thi Ngoc Trang Tran (Graz University of Technology).Constraint-based recommender systems support users in the identification of complex items such as financial services and digital cameras. Such recommender systems enable users to find an appropriate item within the scope of a conversational process. In this context, relevant items are determined by matching user preferences with a corresponding product (item) assortment on the basis of a pre-defined set of constraints. The development and maintenance of constraint-based recommenders is often an error-prone activity – specifically with regard to the scoping of the offered item assortment. In this paper, we propose a set of offline analysis operations that provide insights to assess the quality of a constraint-based recommender system before the system is deployed for productive use. The operations include a.o. automated analysis of feature restrictiveness and item (product) accessibility. We analyze usage scenarios of the proposed analysis operations on the basis of a simplified example digital camera recommender.
- RESBootstrapped Personalized Popularity for Cold Start Recommender Systems
by Iason Chaimalas (University College London), Duncan Walker (British Broadcasting Corporation), Edoardo Gruppi (University College London), Ben Clark (British Broadcasting Corporation) and Laura Toni (University College London).Recommender Systems are severely hampered by the well-known Cold Start problem, identified by the lack of information on new items and users. This has led to research efforts focused on data imputation and augmentation models as predominantly data pre-processing strategies, yet their improvement of cold-user performance is largely indirect and often comes at the price of a reduction in accuracy for warmer users. To address these limitations, we propose Bootstrapped Personalized Popularity (B2P), a novel framework that improves performance for cold users (directly) and cold items (implicitly) via popularity models personalized with item metadata. B2P is scalable to very large datasets and directly addresses the Cold Start problem, so it can complement existing Cold Start strategies. Experiments on a real-world Enterprise dataset (anonymized) and a public dataset demonstrate that B2P (1) significantly improves cold-user performance, (2) boosts warm-user performance for bootstrapped models by lowering their training sparsity, and (3) improves total recommendation accuracy at a competitive diversity level relative to existing high-performing Collaborative Filtering models. We demonstrate that B2P is a powerful and scalable framework for strongly cold datasets.
- RESBeyond the Sequence: Statistics-driven Pre-training for Stabilizing Sequential Recommendation Model
by Sirui Wang (Meituan Group), Peiguang Li (Meituan Group), Yunsen Xian (Meituan Group) and Hongzhi Zhang (Meituan Group).The sequential recommendation task aims to predict the item that user is interested in according to his/her historical action sequence. However, inevitable random action, i.e. user randomly accesses an item among multiple candidates or clicks several items at random order, cause the sequence fails to provide stable and high-quality signals. To alleviate the issue, we propose the StatisTics-Driven Pre-traing framework (called STDP briefly). The main idea of the work lies in the exploration of utilizing the statistics information along with the pre-training paradigm to stabilize the optimization of recommendation model. Specifically, we derive two types of statistical information: item co-occurrence across sequence and attribute frequency within the sequence. And we design the following pre-training tasks: 1) The co-occurred items prediction task, which encourages the model to distribute its attention on multiple suitable targets instead of just focusing on the next item that may be unstable. 2) We generate a paired sequence by replacing items with their co-occurred items and enforce its representation close with the original one, thus enhancing the model’s robustness to the random noise. 3) To reduce the impact of random on user’s long-term preferences, we encourage the model to capture sequence-level frequent attributes. The significant improvement over six datasets demonstrates the effectiveness and superiority of the proposal, and further analysis verified the generalization of the STDP framework on other models.
- RESPersonalized Category Frequency prediction for Buy It Again recommendations
by Amit Pande (Target), Kunal Ghosh (Target) and Rankyung Park (Target).Buy It Again (BIA) recommendations are crucial to retailers to help improve user experience and site engagement by suggest- ing items that customers are likely to buy again based on their own repeat purchasing patterns. Most existing BIA studies analyze guests’ personalized behaviour at item granularity. This finer level of granularity might be appropriate for small businesses or small datasets for search purposes. However, this approach can be infea- sible for big retailers like Amazon, Walmart, or Target which have hundreds of millions of guests and tens of millions of items. For such data sets, it is more practical to have a coarse-grained model that captures customer behaviour at the item category level. In addition, customers commonly explore variants of items within the same categories, e.g., trying different brands or flavors of yogurt. A category-based model may be more appropriate in such scenarios. We propose a recommendation system called a hierarchical PCIC model that consists of a personalized category model (PC model) and a personalized item model within categories (IC model). PC model generates a personalized list of categories that customers are likely to purchase again. IC model ranks items within categories that guests are likely to reconsume within a category. The hierarchical PCIC model captures the general consumption rate of products using survival models. Trends in consumption are captured using time series models. Features derived from these models are used in training a category-grained neural network. We compare PCIC to twelve existing baselines on four standard open datasets. PCIC improves NDCG up to 16% while improving recall by around 2%. We were able to scale and train (over 8 hours) PCIC on a large dataset of 100M guests and 3M items where repeat categories of a guest outnumber repeat items. PCIC was deployed and A/B tested on the site of a major retailer, leading to significant gains in guest engagement.
- RESGenerative Next-Basket Recommendation
by Wenqi Sun (Renmin University of China), Ruobing Xie (WeChat, Tencent), Junjie Zhang (Renmin University of China), Wayne Xin Zhao (Renmin University of China), Leyu Lin (WeChat Search Application Department, Tencent) and Ji-Rong Wen (Renmin University of China).Next-basket Recommendation (NBR) refers to the task of predicting a set of items that a user will purchase in the next basket. However, most of existing works merely focus on the relevance between user preferences and predicted items, ignoring the essential relationships among items in the next basket, which often results in over-homogenization of items. In this work, we presents a novel Generative next-basket Recommendation model (GeRec), a new NBR paradigm that generates the recommended items one by one to form the next basket via an autoregressive decoder. This generative NBR paradigm contributes to capturing and considering item relationships inside each baskets in both training and serving. Moreover, we jointly consider user’s both item- and basket-level contextual information to better capture user’s multi-granularity preferences. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model.
- RESAdversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application
by Jianjun Yuan (Expedia Group), Wei Lee Woon (Expedia Group) and Ludovik Coba (Expedia Group).This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by $\bigO(kN^2\sqrt{T\log T})$, where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.
- RESCollaborative filtering algorithms are prone to mainstream-taste bias
by Pantelis Analytis (University of Southern Denmark) and Philipp Hager (University of Amsterdam).Collaborative filtering has been the main steam engine of the recommender systems community since the early 1990s. Collaborative filtering (and other) algorithms, however, have been predominantly evaluated by aggregating results across users or user groups. These performance averages hide large disparities: an algorithm may perform very well for some users (or groups) and very poorly for others. We show that performance variation is large and systematic. In experiments on three large scale datasets and using an array of collaborative filtering algorithms, we demonstrate the large performance disparities for different users across algorithms and datasets. We then show that performance variation is systematic and that two key features that characterize users, their mean taste similarity with other users and the dispersion in taste similarity, can explain performance variation better than previously identified features. We use these two features to visualize algorithm performance for different users, and point out that this mapping can be used to capture different categories of users that have been proposed before. Our results demonstrate an extensive mainstream-taste bias in all collaborative filtering algorithms, and they imply a fundamental fairness limitation that needs to be mitigated.
- RESHessian-aware Quantized Node Embeddings for Recommendation
by Huiyuan Chen (Visa Research), Kaixiong Zhou (Rice University), Kwei-Herng Lai (Rice University), Chin-Chia Michael Yeh (Visa Research), Yan Zheng (Visa Research), Xia Hu (Rice University) and Hao Yang (Visa Research).Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods quantize user/item representations into the binary embedding space to reduce space requirements and accelerate inference. Also, they use the Straight-through Estimator (STE) to prevent zero gradients during back-propagation. However, the STE often causes gradient mismatch problem, leading to sub-optimal results.
In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an effective solution for discrete representations of users/items that enable fast retrieval. HQ-GNN is composed of two components: a GNN encoder for learning continuous node embeddings and a quantized module for compressing full-precision embeddings into low-bit ones. Consequently, HQ-GNN benefits from both lower memory requirements and faster inference speeds compared to vanilla GNNs. To address the gradient mismatch problem in STE, we further consider the quantized errors and its second-order derivatives for better stability. The experimental results on several large-scale datasets show that HQ-GNN achieves a good balance between latency and performance.
- RESScalable Approximate NonSymmetric Autoencoder for Collaborative Filtering
by Martin Spišák (GLAMI.cz and Faculty of Mathematics and Physics, Charles University, Prague, Czechia), Radek Bartyzal (GLAMI.cz), Antonín Hoskovec (GLAMI.cz and Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Czechia), Ladislav Peška (Faculty of Mathematics and Physics, Charles University, Prague, Czechia) and Miroslav Tůma (Faculty of Mathematics and Physics, Charles University, Prague, Czechia).In the field of recommender systems, shallow autoencoders have recently gained significant attention. One of the most highly acclaimed shallow autoencoders is EASE, favored for its competitive recommendation accuracy and simultaneous simplicity. However, the poor scalability of EASE (both in time and especially in memory) severely restricts its use in production environments with vast item sets. In this paper, we propose a hyperefficient factorization technique for sparse approximate inversion of the data-Gram matrix used in EASE. The resulting autoencoder, SANSA, is an end-to-end sparse solution with prescribable density and almost arbitrarily low memory requirements (even for training). As such, SANSA allows us to effortlessly scale the concept of EASE to millions of items and beyond.
- RESM3REC: A Meta-based Multi-scenario Multi-task Recommendation Framework
by Zerong Lan (Dalian University of Technology), Yingyi Zhang (Dalian University of technology) and Xianneng Li (Dalian University of Technology).Users in recommender systems exhibit multi-behavior in multiple business scenarios on real-world e-commerce platforms. A crucial challenge in such systems is to make recommendations for each business scenario at the same time. On top of this, multiple predictions (e.g., Click Through Rate and Conversion Rate) need to be made simultaneously in order to improve the platform revenue. Research focus on making recommendations for several business scenarios is in the field of Multi-Scenario Recommendation (MSR), and Multi-Task Recommendation (MTR) mainly attempts to solve the possible problems in collaboratively executing different recommendation tasks. However, existing researchers have paid attention to either MSR or MTR, ignoring the integration of MSR and MTR that faces the issue of conflict between scenarios and tasks. To address the above issue, we propose a Meta-based Multi-scenario Multi-task RECommendation framework (M3REC) to serve multiple tasks in multiple business scenarios by a unified model. However, integrating MSR and MTR in a proper manner is non-trivial due to: 1) Unified representation problem: Users’ and items’ representation behave Non-i.i.d in different scenarios and tasks which takes inconsistency into recommendations. 2) Synchronous optimization problem: Tasks distribution varies in different scenarios, and a unified optimization method is needed to optimize multi-tasks in multi-scenarios. Thus, to unified represent users and items, we design a Meta-Item-Embedding Generator (MIEG) and a User-Preference Transformer (UPT). The MIEG module can generate initialized item embedding using item features through meta-learning technology, and the UPT module can transfer user preferences in other scenarios. Besides, the M3REC framework uses a specifically designed backbone network together with a task-specific aggregate gate to promote all tasks to achieve the purpose of optimizing multiple tasks in multiple business scenarios within one model. Experiments on two public datasets have shown that M3REC outperforms those compared MSR and MTR state-of-the-art methods.
- RESLarge Language Model Augmented Narrative Driven Recommendations
by Sheshera Mysore (University of Massachusetts Amherst), Andrew Mccallum (University of Massachusetts) and Hamed Zamani (University of Massachusetts Amherst).Narrative-driven recommendation (NDR) presents an information access problem where users solicit recommendations with verbose descriptions of their preferences and context, for example, travelers soliciting recommendations for points of interest while describing their likes/dislikes and travel circumstances. These requests are increasingly important with the rise of natural language-based conversational interfaces for search and recommendation systems. However, NDR lacks abundant training data for models, and current platforms commonly do not support these requests. Fortunately, classical user-item interaction datasets contain rich textual data, e.g., reviews, which often describe user preferences and context — this may be used to bootstrap training for NDR models. In this work, we explore using large language models (LLMs) for data augmentation to train NDR models. We use LLMs for authoring synthetic narrative queries from user-item interactions with few-shot prompting and train retrieval models for NDR on synthetic queries and user-item interaction data. Our experiments demonstrate that this is an effective strategy for training small-parameter retrieval models that outperform other retrieval and LLM baselines for narrative-driven recommendation.
- RESIncorporating Time in Sequential Recommendation Models
by Mostafa Rahmani (Amazon), James Caverlee (Amazon) and Fei Wang (Amazon).Sequential models are designed to learn sequential patterns in data based on the chronological order of user interactions. However, they often ignore the timestamps of these interactions. Incorporating time is crucial because many sequential patterns are time-dependent, and the model cannot make time-aware recommendations without considering time. This article demonstrates that providing a rich representation of time can significantly improve the performance of sequential models. The existing literature treats time as a one-dimensional time-series obtained by quantizing time. In this study, we propose treating time as a multi-dimensional time-series and explore representation learning methods, including a kernel based method and an embedding-based algorithm. Experiments on multiple datasets show that the inclusion of time significantly enhances the model’s performance, and multi-dimensional methods outperform the one-dimensional method by a substantial margin.
- RESEnhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation
by Vivian Lai (Visa Research), Huiyuan Chen (Visa Research), Chin-Chia Michael Yeh (Visa Research), Minghua Xu (Visa Research), Yiwei Cai (Visa Research) and Hao Yang (Visa Research).Transformers have become the favored model for sequential recommendation. However, previous studies rely on extensive data, such as massive pre-training or repeated data augmentation, leading to optimization-related problems, such as initialization sensitivity and large batch-size memory bottleneck. In this work, we examine Transformers’ loss geometry to improve the models’ data efficiency during training and generalization. By utilizing a newly introduced sharpness-aware optimizer to promote smoothness, we significantly enhance SASRec’s accuracy and robustness, a Transformer model, on various datasets. When trained on sequential data without significant pre-training or data augmentation, the resulting SASRec outperforms S3Rec and CL4Rec, both of which are of comparable size and throughput.
- RESAdaptive Collaborative Filtering with Personalized Time Decay Functions for Financial Product Recommendation
by Ashraf Ghiye (École Polytechnique), Baptiste Barreau (BNP Paribas CIB – Global Markets), Laurent Carlier (BNP Paribas CIB – Global Markets) and Michalis Vazirgiannis (École Polytechnique).Classical recommender systems often assume that historical data are stationary and fail to account for the dynamic nature of user preferences, limiting their ability to provide reliable recommendations in time-sensitive settings. This assumption is particularly problematic in finance, where financial products exhibit continuous changes in valuations, leading to frequent shifts in client interests. These evolving interests, summarized in the past client-product interactions, see their utility fade over time with a degree that might differ from one client to another. To address this challenge, we propose a time-dependent collaborative filtering algorithm that can adaptively discount distant client-product interactions using personalized decay functions. Our approach is designed to handle the non-stationarity of financial data and produce reliable recommendations by modeling the dynamic collaborative signals between clients and products. We evaluate our method using a proprietary dataset from BNP Paribas and demonstrate significant improvements over state-of-the-art benchmarks from relevant literature. Our findings emphasize the importance of incorporating time explicitly in the model to enhance the accuracy of financial product recommendation.
- RESPrivate Matrix Factorization with Public Item Features
by Mihaela Curmei (University of California, Berkeley), Walid Krichene (Google Research) and Li Zhang (Google Research).We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization, that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information.
The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback.
Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between the private models and their non-private counterpart. As privacy constraints become more stringent, the increased reliance on public side features leads to recommendations becoming more depersonalized, resulting in a smooth transition from collaborative filtering to item-based contextual recommendations.
- RESDeliberative Diversity for News Recommendations: Operationalization and Experimental User Study
by Lucien Heitz (University of Zurich), Juliane A. Lischka (University of Hamburg), Rana Abdullah (University of Hamburg), Laura Laugwitz (University of Hamburg), Hendrik Meyer (University of Hamburg) and Abraham Bernstein (University of Zurich).News recommender systems are an increasingly popular field of study that attracts a growing, interdisciplinary research community. As these systems play an important role in our daily lives, the mechanisms behind their curation processes are under close scrutiny. In the domain of personalized news, many platforms make design choices that are driven by economic incentives. In contrast to such systems that optimize for financial gains, there exists norm-driven diversity objectives, putting normative and democratic goals first. Their impact on users, however, in terms of triggering behavioral changes or affecting knowledgeability, is still under-researched. In this paper, we contribute to the field of news recommender system design by conducting a user study that looks at the impact of these normative approaches. We a.) operationalize the notion of deliberative democracy for news recommendations, show b.) the impact on political knowledgeability and c.) the influence on voting behavior. We found that exposure to small parties is associated with an increase in knowledge about their candidates and that intensive news consumption about a party can change the direction of attitudes towards their issues.
- RESCo-occurrence Embedding Enhancement for Long-tail Problem in Multi-Interest Recommendation
by Yaokun Liu (Tianjin University), Xiaowang Zhang (Tianjin University), Minghui Zou (Tianjin University) and Zhiyong Feng (Tianjin University).Multi-interest recommendation methods extract multiple interest vectors to represent the user comprehensively. Despite their success in the matching stage, previous works overlook the long-tail problem. This results in the model excelling at suggesting head items, while the performance for tail items, which make up more than 70% of all items, remains suboptimal. Hence, enhancing the tail item recommendation capability holds great potential for improving the performance of the multi-interest model.
Through experimental analysis, we reveal that the insufficient context for embedding learning is the reason behind the underperformance of tail items. Meanwhile, we face two challenges in addressing this issue: the absence of supplementary item features and the need to maintain head item performance. To tackle these challenges, we propose a CoLT module (Co-occurrence embedding enhancement for Long-Tail problem) that replaces the embedding layer of existing multi-interest frameworks. By linking co-occurring items to establish “assistance relationships”, CoLT aggregates information from relevant head items into tail item embeddings and enables joint gradient updates. Experiments on three datasets show our method outperforms SOTA models by 21.86% Recall@50 and improves the Recall@50 of tail items by 14.62% on average.
- RESExtended conversion: Capturing successful interactions in voice shopping
by Elad Haramaty (Amazon), Zohar Karnin (Amazon), Arnon Lazerson (Amazon), Liane Lewin-Eytan (Amazon Research) and Yoelle Maarek (Amazon).Being able to measure the success of online shopping interactions is crucial in order to evaluate and optimize the performance of e-commerce systems. We consider the domain of voice shopping, supported by digital voice-based assistants, where measuring successful interactions poses a challenge. Unlike Web shopping, which offers a rich amount of behavioral signals such as clicks, in voice shopping a non-negligible amount of shopping interactions frequently end without any immediate explicit or implicit user behavioral signal. Moreover, users may start their journey using voice, but finish elsewhere, for example by using their mobile app or Web. We explore the challenge of measuring successful interactions in voice product search based on users’ feedback, and propose a medium-term reward metric named Extended ConVersion (ECVR). ECVR extends the notion of conversion (purchase action), which is a clear and natural indication of success for an e-commerce system. The strength of this new metric, is that it does not only capture immediate conversion, but also a conversion that is part of the same user shopping journey, but is performed at a later stage, possibly using a different medium. We provide multiple ways of evaluating the quality of a metric, and use these to explore different parameters leading to different variants of ECVR. After finalizing these parameters, we show that a ranking system optimized for the proposed ECVR leads to an improvement in long-term engagement and revenue, without compromising immediate gains.
- RESOn the Consistency of Average Embeddings for Item Recommendation
by Walid Bendada (Deezer Research & LAMSADE, Université Paris Dauphine – PSL), Guillaume Salha-Galvan (Deezer Research), Romain Hennequin (Deezer Research), Thomas Bouabça (Deezer Research) and Tristan Cazenave (LAMSADE Université Paris Dauphine PSL – CNRS).A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.
- RESIntegrating the ACT-R Framework with Collaborative Filtering for Explainable Sequential Music Recommendation
by Marta Moscati (Johannes Kepler University Linz), Christian Wallmann (Johannes Kepler University Linz), Markus Reiter-Haas (Graz University of Technology), Dominik Kowald (Know-Center GmbH and Graz University of Technology), Elisabeth Lex (Graz University of Technology) and Markus Schedl (Johannes Kepler University Linz).Music listening sessions often consist of sequences including repeating tracks. Modeling such relistening behavior with models of human memory has been proven effective in predicting the next track of a session. However, these models intrinsically lack the capability of recommending novel tracks that the target user has not listened to in the past. Collaborative filtering strategies, on the contrary, provide novel recommendations by leveraging past collective behaviors but are often limited in their ability to provide explanations. To narrow this gap, we propose four hybrid algorithms that integrate collaborative filtering with the cognitive architecture ACT-R. We compare their performance in terms of accuracy, novelty, diversity, and popularity bias, to baselines of different types, including pure ACT-R, kNN-based, and neural-networks-based approaches. We show that the proposed algorithms are able to achieve the best performances in terms of novelty and diversity, and simultaneously achieve a higher accuracy of recommendation with respect to pure ACT-R models. Furthermore, we illustrate how the proposed models can provide explainable recommendations.
- RESWidespread flaws in offline evaluation of recommender systems
by Balázs Hidasi (Gravity R&D, a Taboola company) and Ádám Tibor Czapp (Gravity R&D, a Taboola company).Even though offline evaluation is just an imperfect proxy of online performance — due to the interactive nature of recommenders — it will probably remain the primary way of evaluation in recommender systems research for the foreseeable future, since the proprietary nature of production recommenders prevents independent validation of A/B test setups and verification of online results. Therefore, it is imperative that offline evaluation setups are as realistic and as flawless as they can be. Unfortunately, evaluation flaws are quite common in recommenders systems research nowadays, due to later works copying flawed evaluation setups from their predecessors without questioning their validity. In the hope of improving the quality of offline evaluation of recommender systems, we discuss four of these widespread flaws and why researchers should avoid them.
- RESTowards Sustainability-aware Recommender Systems: Analyzing the Trade-off Between Algorithms Performance and Carbon Footprint
by Giuseppe Spillo (University of Bari), Allegra De Filippo (University of Bologna), Cataldo Musto (Dipartimento di Informatica – University of Bari), Michela Milano (University of Bologna) and Giovanni Semeraro (University of Bari).In this paper, we present a comparative analysis of the trade-off between the performance of state-of-the-art recommendation algorithms and their sustainability. In particular, we compared 18 popular recommendation algorithms in terms of both standard metrics (i.e., accuracy and diversity of the recommendations) as well as in terms of energy consumption and carbon footprint on three different datasets. In order to obtain a fair comparison, all the algorithms were run based on the implementations available in a popular recommendation library, i.e., RecBole, and used the same experimental settings. The outcomes of the experiments clearly showed that the choice of the optimal recommendation algorithm requires a thorough analysis, since more sophisticated algorithms often led to tiny improvements at the cost of an exponential increase of carbon emissions. Through this paper, we aim to shed light on the problem of carbon footprint and energy consumption of recommender systems, and we make the first step towards the development of sustainability-aware recommendation algorithms.
- RESGroup Fairness for Content Creators: the Role of Human and Algorithmic Biases under Popularity-based Recommendations
by Stefania Ionescu (University of Zurich), Aniko Hannak (University of Zurich) and Nicolo Pagan (UZH).The Creator Economy faces concerning levels of unfairness. Content creators (CCs) publicly accuse platforms of purposefully reducing the visibility of their content based on protected attributes, while platforms place the blame on viewer biases. Meanwhile, prior work warns about the “rich-get-richer” effect perpetuated by existing popularity biases in recommender systems: Any initial advantage in visibility will likely be exacerbated over time. What remains unclear is how the biases based on protected attributes from platforms and viewers interact and contribute to the observed inequality in the context of popularity-biased recommender systems. The difficulty of the question lies in the complexity and opacity of the system. To overcome this challenge, we create a simple agent-based model (ABM) that unifies the platform systems which allocate the visibility of CCs (e.g., recommender systems, moderation) into a single popularity-based function, which we call the visibility allocation system (VAS). Through simulations, we find that although viewer homophilic biases do alone create inequalities, small levels of additional biases in VAS are more harmful. From the perspective of interventions, our results suggest that (a) attempts to reduce attribute-biases in moderation and recommendations should precede those reducing viewer homophilic tendencies, (b) decreasing the popularity-biases in VAS decreases but not eliminates inequalities, (c) boosting the visibility of protected CCs to overcome viewer homophily with respect to one metric is unlikely to produce fair outcomes with respect to all metrics, and (d) the process is also unfair for viewers and this unfairness could be overcome through the same interventions. More generally, this work demonstrates the potential of using ABMs to better understand the causes and effects of biases and interventions within complex sociotechnical systems.
- RESProviding Previously Unseen Users Fair Recommendations Using Variational Autoencoders
by Bjørnar Vassøy (Norwegian University of Science and Technology (NTNU)), Helge Langseth (Norwegian University of Science and Technology (NTNU)) and Benjamin Kille (Norwegian University of Science and Technology (NTNU)).An emerging definition of fairness in machine learning requires that models are oblivious to demographic user information, e.g., a user’s gender or age should not influence the model. Personalized recommender systems are particularly prone to violating this definition through their explicit user focus and user modelling. Explicit user modelling is also an aspect that makes many recommender systems incapable of providing hitherto unseen users with recommendations. We propose novel approaches for mitigating discrimination in Variational Autoencoder-based recommender systems by limiting the encoding of demographic information. The approaches are capable of, and evaluated on, providing entirely new users with fair recommendations.
- RESScalable Deep Q-Learning for Session-Based Slate Recommendation
by Aayush Singha Roy (Insight Centre for Data Analytics, University College Dublin), Edoardo D’Amico (Insight Centre for Data Analytics, University College Dublin), Elias Tragos (Insight Centre for Data Analytics, University College Dublin), Aonghus Lawlor (Insight Centre for Data Analytics, University College Dublin) and Neil Hurley (Insight Centre for Data Analytics, University College Dublin).Reinforcement learning (RL) has demonstrated great potential to improve slate-based recommender systems by optimizing recommendations for long-term user engagement. To handle the combinatorial action space in slate recommendation, recent works decompose the Q-value of a slate into item-wise Q-values, using an item-wise value-based policy. However, the common case where the value function is a parameterized function taking state and action as input results in a linearly increasing number of evaluations required to select an action, proportional to the number of candidate items. While slow training may be acceptable, this becomes intractable when considering the costly evaluation of the parameterized function, such as with deep neural networks, during model serving time. To address this issue, we propose an actor-based policy that reduces the evaluation of the Q-function to a subset of items, significantly reducing inference time and enabling practical deployment in real-world industrial settings. In our empirical evaluation, we demonstrate that our proposed approach achieves equivalent user session engagement to a value-based policy, while significantly reducing the slate serving time by at least 4 times.
- RESCR-SoRec: BERT driven Consistency Regularization for Social Recommendation
by Tushar Prakash (Sony Research India), Raksha Jalan (Sony Research india), Brijraj Singh (Sony Research india) and Naoyuki Onoe (Sony).In the real world, when we seek our friends’ opinions on various items or events, we request verbal social recommendations. It has been observed that we often turn to our friends for recommendations on a daily basis. The emergence of online social platforms has enabled users to share their opinion with their social connections. Therefore, we should consider users’ social connections to enhance online recommendation performance. The social recommendation aims to fuse social links with user-item interactions to offer more relevant recommendations. Several efforts have been made to develop an effective social recommendation system. However, there are two significant limitations to current methods: First, they haven’t thoroughly explored the intricate relationships between the diverse influences of neighbours on users’ preferences. Second, existing models are vulnerable to overfitting due to the relatively low number of user-item interaction records in the interaction space. For the aforementioned problems, this paper offers a novel framework called CR-SoRec, an effective recommendation model based on BERT and Consistency Regularization. This model incorporates Bidirectional Encoder Representations from Transformer(BERT) to learn bidirectional context-aware user and item embeddings with neighbourhood sampling. The neighbourhood Sampling technique samples the most influential neighbours for all the users/ items. Further, to effectively use the available user-item interaction data and social ties, we leverage diverse perspectives via consistency regularisation to harness the underlying information. The main objective of our model is to predict the next item that a user would interact with based on its interaction behaviour and social connections. Experimental results show that our model defines a new state-of-the-art on various datasets and outperforms previous work by a significant margin. Extensive experiments are conducted to analyse the method. We release the source code of our model at https://anonymous.4open.science/r/CR-SoRec-68F4
- RESLarge Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
by Scott Sanner (Google), Krisztian Balog (Google), Filip Radlinski (Google), Ben Wedin (Google) and Lucas Dixon (Google).Traditional recommender systems leverage users’ item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.
- RESInterface Design to Mitigate Inflation in Recommender Systems
by Rana Shahout (Technion), Yehonatan Peisakhovsky (Technion), Sasha Stoikov (Cornell Tech) and Nikhil Garg (Cornell Tech).Recommendation systems rely on user-provided data to learn about item quality and provide personalized recommendations. An implicit assumption when aggregating ratings into item quality is that ratings are strong indicators of item quality. In this work, we analyze this assumption using data collected from a music discovery application. Our study focuses on two factors that cause rating inflation: heterogeneous user rating behavior and the dynamics of personalized recommendations. We show that user rating behavior is significantly variable, leading to item quality estimates that reflect the users who rated an item more than the item quality itself. Additionally, items that are more likely to be shown via personalized recommendations can experience a substantial increase in their exposure and potential bias toward them. To mitigate these effects, we conducted a randomized controlled trial where the rating interface was modified. This resulted in a substantial improvement in user rating behavior and a reduction in item quality inflation. These findings highlight the importance of carefully considering the assumptions underlying recommendation systems and designing interfaces that encourage accurate rating behavior.
- RESTowards Self-Explaining Sequence-Aware Recommendation
by Alejandro Ariza-Casabona (University of Barcelona), Maria Salamo (Universitat de Barcelona), Ludovico Boratto (University of Cagliari) and Gianni Fenu (University of Cagliari).Self-explaining models are becoming an important perk of recommender systems, as they help users understand the reason behind certain recommendations, which encourages them to interact more often with the platform. In order to personalize recommendations, modern recommender approaches make the model aware of the user behavior history for interest evolution representation. However, existing explainable recommender systems do not consider the past user history to further personalize the explanation based on the user interest fluctuation. In this work, we propose a SEQuence-Aware Explainable Recommendation model (SEQUER) that is able to leverage the sequence of user-item review interactions to generate better explanations while maintaining recommendation performance. Experiments validate the effectiveness of our proposal on multiple recommendation scenarios. Our source code and preprocessed datasets are available at https://tinyurl.com/SEQUER-RECSYS23.
- RESLooks Can Be Deceiving: Linking User-Item Interactions and User’s Propensity Towards Multi-Objective Recommendations
by Patrik Dokoupil (Department of Software Engineering, Charles University), Ladislav Peska (Faculty of Mathematics and Physics, Charles University, Prague, Czechia) and Ludovico Boratto (University of Cagliari).Multi-objective recommender systems (MORS) provide suggestions to users according to multiple (and possibly conflicting) goals. When a system optimizes its results at the individual-user level, it tailors them on a user’s propensity towards the different objectives. Hence, the capability to understand users’ fine-grained needs towards each goal is crucial. In this paper, we present the results of a user study in which we monitored the way users interacted with recommended items, as well as their self-proclaimed propensities towards relevance, novelty and diversity objectives. The study was divided into several sessions, where users evaluated recommendation lists originating from a relevance-only single-objective baseline as well as MORS. We show that despite MORS-based recommendations attracted less selections, its presence in the early sessions is crucial for users’ satisfaction in the later stages. Surprisingly, the self-proclaimed willingness of users to interact with novel and diverse items is not always reflected in the recommendations they accept. Post-study questionnaires provide insights on how to deal with this matter, suggesting that MORS-based results should be accompanied by elements that allow users to understand the recommendations, so as to facilitate their acceptance.
- RESTi-DC-GNN: Incorporating Time-Interval Dual Graphs for Recommender Systems
by Nikita Severin (HSE University), Andrey Savchenko (Sber AI Lab), Dmitrii Kiselev (Artificial Intelligence Research Institute (AIRI)), Maria Ivanova (Sber AI Lab), Ivan Kireev (Sber AI Lab) and Ilya Makarov (Artificial Intelligence Research Institute (AIRI)).Recommender systems are essential for personalized content delivery and have become increasingly popular in recent years. However, traditional recommender systems are limited in their ability to capture complex relationships between users and items. Recently, dynamic graph neural networks (DGNNs) have emerged as a promising solution for improving recommender systems by incorporating temporal and sequential information in dynamic graphs. In this paper, we propose a novel method, “Ti-DC-GNN” (Time-Interval Dual Causal Graph Neural Networks), based on an intermediate representation of graph evolution as a sequence of time-interval graphs. The main parts of the method are the novel forms of interval graphs: graph of causality and graph of consequence that explicitly preserve inter-relationships between edges (user-items interactions). The local and global message passing are developed based on edge memory to identify both short-term and long-term dependencies. Experiments on several well-known datasets show that our method consistently outperforms modern temporal GNNs with node memory alone in dynamic edge prediction tasks.
- RESOf Spiky SVDs and Music Recommendation
by Darius Afchar (Deezer Research), Romain Hennequin (Deezer Research) and Vincent Guigue (AgroParisTech).The truncated singular value decomposition is a widely used methodology in music recommendation for direct similar-item retrieval or embedding musical items for downstream tasks. This paper investigates a curious effect that we show naturally occurring on many recommendation datasets: spiking formations in the embedding space. We first propose a metric to quantify this spiking organization’s strength, then mathematically prove its origin tied to underlying communities of items of varying internal popularity. With this new-found theoretical understanding, we finally open the topic with an industrial use case of estimating how music embeddings’ top-k similar items will change over time under the addition of data.
- RESTopic-Level Bayesian Surprise and Serendipity for Recommender Systems
by Tonmoy Hasan (UNC Charlotte) and Razvan Bunescu (UNC Charlotte).A recommender system that optimizes its recommendations solely to fit a user’s history of ratings for consumed items can create a filter bubble, wherein the user does not get to experience items from novel, unseen categories. One approach to mitigate this undesired behavior is to recommend items with high potential for serendipity, namely surprising items that are likely to be highly rated. In this paper, we propose a content-based formulation of serendipity that is rooted in Bayesian surprise and use it to measure the serendipity of items after they are consumed and rated by the user. When coupled with a collaborative-filtering component that identifies similar users, this enables recommending items with high potential for serendipity. To facilitate the evaluation of topic-level models for surprise and serendipity, we introduce a dataset of book reading histories extracted from Goodreads, containing over 26 thousand users and close to 1.3 million books, where we manually annotate 450 books read by 4 users in terms of their time-dependent, topic-level surprise. Experimental evaluations show that models that use Bayesian surprise correlate much better with the manual annotations of topic-level surprise than distance-based heuristics, and also obtain better serendipitous item recommendation performance.
- RESProgressive Horizon Learning: Adaptive Long Term Optimization for Personalized Recommendation
by Congrui Yi (Amazon), David Zumwalt (Amazon), Zijian Ni (Amazon) and Shreya Chakrabarti (Amazon).As B2C companies such as Amazon, Netflix, Spotify scale, personalized recommender systems are often needed to further drive long term business growth in acquisition, engagement, and retention of customers. However, long-term metrics associated with these goals can require several months to mature. Additionally, deep personalization also demands a large volume of training data that take a long time to collect. These factors incur substantial lead time for training a model to optimize a long-term metric. Before such model is deployed, a recommender system has to rely on a simple policy (e.g. random) to collect customer feedback data for training, inflicting high opportunity cost and delaying optimization of the target metric. Besides, as customer preferences can shift over time, a large temporal gap between inputs and outcome poses a high risk of data staleness and suboptimal learning. Existing approaches involve various compromises. For instance, contextual bandits often optimize short-term surrogate metrics with simple model structure, which can be suboptimal in the long run, while Reinforcement Learning approaches rely on an abundance of historical data for offline training, which essentially means long lead time before deployment. To address these problems, we propose Progressive Horizon Learning Recommender (PHLRec), a personalized model that can progressively learn metric patterns and adaptively evolve from short- to long-term optimization over time. Through simulations and real data experiments, we demonstrated that PHLRec outperforms competing methods, achieving optimality in both deployment speed and long-term metric performances.
- RESStability of Explainable Recommendation
by Sairamvinay Vijayaraghavan (Department of Computer Science, University of California, Davis) and Prasant Mohapatra (Department of Computer Science, University of California, Davis).Explainable Recommendation has been gaining attention over the last few years in industry and academia. Explanations provided along with recommendations for each user in a recommender system framework have many uses: particularly reasoning why a suggestion is provided and how well an item aligns with a user’s personalized preferences. Hence, explanations can play a huge role in influencing users to purchase products. However, the reliability of the explanations under varying scenarios has not been strictly verified in an empirical perspective. Unreliable explanations can bear strong consequences such as attackers leveraging explanations for manipulating and tempting users to purchase target items: that the attackers would want to promote. In this paper, we study the vulnerability of existent feature-oriented explainable recommenders, particularly analyzing their performance under different levels of external noises added into model parameters. We conducted experiments by analyzing three important state-of-the-art explainable recommenders when trained on two widely used e-commerce based recommendation datasets of different scales. We observe that all the explainable models are vulnerable to increased noise levels. Experimental results verify our hypothesis that the ability to explain recommendations does decrease along with increasing noise levels and particularly adversarial noise does contribute to a much stronger decrease. Our study presents an empirical verification on the topic of robust explanations in recommender systems which can be extended to different types of explainable recommenders in RS.
- RESInterpretable User Retention Modeling in Recommendation
by Rui Ding (Northeastern University), Ruobing Xie (WeChat, Tencent), Xiaobo Hao (WeChat, Tencent), Xiaochun Yang (Northeastern University), Kaikai Ge (WeChat, Tencent), Xu Zhang (WeChat, Tencent), Jie Zhou (WeChat, Tencent) and Leyu Lin (WeChat, Tencent).Recommendation usually focuses on immediate accuracy metrics like CTR as training objectives. User retention rate, which reflects the percentage of today’s users that will return to the recommender system in the next few days, should be paid more attention to in real-world systems. User retention is the most intuitive and accurate reflection of user long-term satisfaction. However, most existing recommender systems are not focused on user retention-related objectives, since their complexity and uncertainty make it extremely hard to discover why a user will or will not return to a system and which behaviors affect user retention. In this work, we conduct a series of preliminary explorations on discovering and making full use of the reasons for user retention in recommendation. Specifically, we make a first attempt to design a rationale contrastive multi-instance learning framework to explore the rationale and improve the interpretability of user retention. Extensive offline and online evaluations with detailed analyses of a real-world recommender system verify the effectiveness of our user retention modeling. We further reveal the real-world interpretable factors of user retention from both user surveys and explicit negative feedback quantitative analyses to facilitate future model designs.
- RESDeep Exploration for Recommendation Systems
by Zheqing Zhu (Meta AI, Stanford University) and Benjamin Van Roy (Stanford University).Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user’s response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user’s subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are delayed and sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.
- RESEx2Vec: Characterizing Users and Items from the Mere Exposure Effect
by Bruno Sguerra (Deezer Research) and Romain Hennequin (Deezer Research).The traditional recommendation framework seeks to connect user and content, by finding the best match possible based on users past interaction. However, a good content recommendation is not necessarily similar to what the user has chosen in the past. One limitation of basing future interaction on what happened in the past is that it ignores the fact that both sides of the problems are dynamic. As human, users naturally evolve, learn, forget, get bored, they change their perspective of the world and in consequence, of the recommendable content. In this work we present Ex2Vec our framework for accounting to the dynamic of the human side of the recommendation problem. We introduce the Mere Exposure Effect as a common phenomenon in music streaming platforms. We then present our model that leverage the effect for jointly characterizing users and music. We validate our model through predicting future music consumption based on repetition and discuss its implications.
- RESTALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
by Keqin Bao (University of Science and Technology in China), Jizhi Zhang (University of Science and Technology in China), Yang Zhang (University of Science and Technology of China), Wenjie Wang (National University of Singapore), Fuli Feng (University of Science and Technology in China) and Xiangnan He (University of Science and Technology of China).The impressive performance of Large Language Models (LLMs) across various fields has encouraged researchers to investigate their potential in recommendation tasks. To harness the LLMs’ extensive knowledge and powerful generalization abilities, initial efforts have tried to design instructions for recommendation tasks through In-context Learning. However, the recommendation performance of LLMs remains limited due to (i) significant differences between LLMs’ language-related pre-training tasks and recommendation tasks, and (ii) inadequate recommendation data during the LLMs’ pre-training. To fill the gap, we consider further tuning LLMs for recommendation tasks. To this end, we propose a lightweight tuning framework for LLMs-based recommendation, namely LLM4Rec, which constructs the recommendation data as tuning samples and utilizes LoRA for lightweight tuning. We conduct experiments on two datasets, validating that LLM4Rec is highly efficient w.r.t. computing costs (e.g., a single RTX 3090 is sufficient for tuning LLaMA-7B), and meanwhile, it can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with limited tuning samples (< 100 samples). Furthermore, LLM4Rec exhibits strong generalization ability in cross-domain recommendation. Our code and data are available at https://anonymous.4open.science/r/LLM4rec.
- RESInitiative transfer in conversational recommender systems
by Yuan Ma (University of Duisburg-Essen) and Jürgen Ziegler (University of Duisburg-Essen).Conversational recommender systems (CRS) are increasingly designed to offer mixed-initiative dialogs in which the user and the system can take turns in starting a communicative exchange, for example, by asking questions or stating preferences. However, whether and when users make use of the mixed-initiative capabilities in a CRS and which factors influence their behavior is as yet not well understood. We report an online study investigating user interaction behavior, especially the transfer of initiative between user and system in a real-time online CRS. We assessed the impact of dialog initiative at the system start as well as of several psychological user characteristics. To collect interaction data, we developed a CRS framework and implementation for the domain of smartphones. Two groups of participants on Prolific (total n=143) used the system which started either with a system-initiated or user-initiated dialog. In addition to interaction data, we measured several psychological factors as well as users’ subjective assessment of the system through questionnaires. We found that: 1. Most users tended to take over the initiative from the system or stay in user-initiated mode when it was offered initially. 2. Starting the dialog in user-initiated mode CRS lead to fewer interactions needed for selecting a product than in system-initiated mode. 3. The user’s initiative transfer was mainly affected by their personal interaction preferences (especially initiative preference). 4. The initial modes of the mixed-initiative CRS did not affect the user experience, but the occurrence of initiative transfers in the dialog negatively affected the degree of user interest and excitement. The results can inform the design and potential personalization of CRS.
- RESTime-Aware Item Weighting for the Next Basket Recommendations
by Aleksey Romanov (National Research University Higher School of Economics), Oleg Lashinin (Tinkoff), Marina Ananyeva (National Research University Higher School of Economics) and Sergey Kolesnikov (Tinkoff.AI).In this paper we study the next basket recommendation problem. Recent methods use different approaches to achieve better performance. However, many of them do not use information about the time of prediction and time intervals between baskets. To fill this gap, we propose a novel method, Time-Aware Item-based Weighting (TAIW), which takes timestamps and intervals into account. We provide experiments on three real-world datasets, and TAIW outperforms well-tuned state-of-the-art baselines for next-basket recommendations. In addition, we show the results of an ablation study and a case study of a few items.
- RESIs ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
by Jizhi Zhang (University of Science and Technology of China), Keqin Bao (University of Science and Technology of China), Yang Zhang (University of Science and Technology of China), Wenjie Wang (National University of Singapore), Fuli Feng (University of Science and Technology of China) and Xiangnan He (University of Science and Technology of China).The resounding triumph of the Large Language Models (LLMs) has ushered in a novel LLM for recommendation (LLM4rec) paradigm. Notwithstanding, the capacity of LLM4rec to provide equitable recommendations remains uncharted due to the potential presence of societal prejudices in LLMs. In order to avert the plausible hazard of employing LLM4rec, we scrutinize the fairness of LLM4rec with respect to the users’ sensitive attributes. Owing to the disparity between LLM4rec and the conventional recommendation paradigm, there are challenges in utilizing the conventional recommendation fairness benchmark directly. To explore the fairness of recommendations under the LLM4rec, we propose a new benchmark Fairness in Large language models for Recommendation (FairLR), which consists of carefully designed metrics and a dataset that considers eight sensitive attributes in two recommendation scenarios: music and movie. We utilize our FairLR benchmark to examine ChatGPT and expose that it still demonstrates bias towards certain sensitive attributes while making recommendations. Our code and dataset can be found at https://anonymous.4open.science/r/FairLR-751D/.
- RESMultiple Connectivity Views for Session-based Recommendation
by Yaming Yang (School of Artificial Intelligence, Peking University), Jieyu Zhang (University of Washington), Yujing Wang (School of Artificial Intelligence, Peking University), Zheng Miao (School of Artificial Intelligence, Peking University) and Yunhai Tong (Peking University).Session-based recommendation (SBR), which makes the next-item recommendation based on previous anonymous actions, has drawn increasing attention. The last decade has seen multiple deep learning-based modeling choices applied on SBR successfully, e.g., recurrent neural networks (RNNs), convolutional neural networks (CNNs), graph neural networks (GNNs), and each modeling choice has its intrinsic superiority and limitation. We argue that these modeling choices differentiate from each other by (1) the way they capture the interactions between items within a session and (2) the operators they adopt for composing the neural network, e.g., convolutional operator or self-attention operator.
In this work, we dive deep into the former as it is relatively unique to the SBR scenario, while the latter is shared by general neural network modeling techniques. We first introduce the concept of connectivity view to describe the different item interaction patterns at the input level. Then, we develop the Multiple Connectivity Views for Session-based Recommendation (MCV-SBR), a unified framework that incorporates different modeling choices in a single model through the lens of connectivity view. In addition, MCV-SBR allows us to effectively and efficiently explore the search space of the combinations of connectivity views by the Tree-structured Parzen Estimator Approach (TPE) algorithm. Finally, on three widely used SBR datasets, we verify the superiority of MCV-SBR by comparing the searched models with state-of-the-art baselines. We also conduct a series of studies to demonstrate the efficacy and practicability of the proposed connectivity view search algorithm, as well as other components in MCV-SBR.
List of all Reproducibility papers accepted for RecSys 2023 (in alphabetical order).
- REPChallenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis
by ito Walter Anelli (Politecnico di Bari), Daniele Malitesta (Polytechnic University of Bari), Claudio Pomo (Politecnico di Bari), Alejandro Bellogin (Universidad Autonoma de Madrid), Eugenio Di Sciascio (Politecnico di Bari) and Tommaso Di Noia (Politecnico di Bari)Among the most successful research directions in recommender systems, there are undoubtedly graph neural network-based models (GNNs). Through the natural modeling of users and items as a bipartite, undirected graph, GNNs have pushed up the performance bar for modern recommenders.
Unfortunately, most of the original graph-based works cherry-pick results from previous baseline papers without bothering to check whether the results are valid for the configuration under analysis. Thus, our work stands first and foremost as a work on the replicability of results. We provide a code that succeeds in replicating the results proposed in the articles introducing six of the most popular and recent graph recommendation models (i.e., NGCF, DGCF, LightGCN, SGL, UltraGCN, and GFCF). In our experimental setup, we test these six models on three common benchmarking datasets (i.e., Gowalla, Yelp 2018, and Amazon Book). In addition, to understand how these models perform with respect to traditional models for collaborative filtering, we compare the graph models under analysis with some models that have historically emerged as the best performers in an offline evaluation context. Then, the study is extended on two new datasets (i.e., Allrecipes and BookCrossing) for which no known setup exists in the literature. Since the performance on such datasets is not entirely aligned with the previous benchmarking one, we further analyze the possible impact of specific dataset characteristics on the recommendation accuracy performance. By investigating the information flow to the users from their neighborhoods, the analysis aims to identify for which models these intrinsic features in the dataset structure impact accuracy performance. The code to reproduce the experiments is available at: https://split.to/Graph-Reproducibility.
- REPEveryone’s a Winner! On Hyperparameter Tuning of Recommendation Models
by Faisal Shehzad (University of Klagenfurt) and Dietmar Jannach (University of Klagenfurt)The performance of a recommender system algorithm in terms of common offline accuracy measures often strongly depends on the chosen hyperparameters. Therefore, when comparing algorithms in offline experiments, we can obtain reliable insights regarding the effectiveness of a newly proposed algorithm only if we compare it to a number of state-of-the-art baselines that are carefully tuned for each of the considered datasets. While this fundamental principle of any area of applied machine learning is undisputed, we find that the tuning process for the baselines in the current literature is barely documented in much of today’s published research. Ultimately, in case the baselines are actually not carefully tuned, progress may remain unclear. In this paper, we showcase how every method in such an unsound comparison can be reported to be outperforming the state-of-the-art. Finally, we iterate appropriate research practices to avoid unreliable algorithm comparisons in the future.
- REPHUMMUS: A Linked, Healthiness-Aware, User-centered and Argument-Enabling Recipe Data Set for Recommendation
by Felix Bölz (INSA Lyon & University of Passau), Diana Nurbakova (INSA Lyon), Sylvie Calabretto (INSA Lyon), Armin Gerl (University of Passau), Lionel Brunie (INSA Lyon) and Harald Kosch (University of Passau)The overweight and obesity rate is increasing for decades worldwide. Healthy nutrition is, besides education and physical activity, one of the various keys to tackle this issue. In an effort to increase the availability of digital, healthy recommendations, the scientific area of food recommendation extends its focus from the accuracy of the recommendations to beyond-accuracy goals like transparency and healthiness. To address this issue a data basis is required, which in the ideal case encompasses user-item interactions like ratings and reviews, food-related information like recipe details, nutritional data, and in the best case additional data which describes the food items and their relations semantically. Though several recipe recommendation data sets exist, to the best of our knowledge, a holistic large-scale healthiness-aware and connected data sets have not been made available yet. The lack of such data could partially explain the poor popularity of the topic of healthy food recommendation when compared to the domain of movie recommendation. In this paper, we show that taking into account only user-item interactions is not sufficient for a recommendation. To close this gap, we propose a connected data set called HUMMUS (Health-aware User-centered recoMMedation and argUment enabling data Set) collected from Food.com containing multiple features including rich nutrient information, text reviews, and ratings, enriched by the authors with extra features such as Nutri-scores and connections to semantic data like the FoodKG and the FoodOn ontology. We hope that these data will contribute to the healthy food recommendation domain.
- REPRecAD: Towards A Unified Library for Recommender Attack and Defense
by Changsheng Wang (University of Science and Technology of China), Jianbai Ye (University of Science and Technology of China), Wenjie Wang (National University of Singapore), Chongming Gao (University of Science and Technology of China), Fuli Feng (University of Science and Technology of China) and Xiangnan He (University of Science and Technology of China)In recent years, recommender systems have become a ubiquitous part of our daily lives, while they suffer from a high risk of being attacked due to the growing commercial and social values. Despite significant research progress in recommender attack and defense, there is a lack of a widely-recognized benchmarking standard in the field, leading to unfair performance comparison and limited credibility of experiments. To address this, we propose RecAD, a unified library aiming at establishing an open benchmark for recommender attack and defense. RecAD takes an initial step to set up a unified benchmarking pipeline for reproducible research by integrating diverse datasets, standard source codes, hyper-parameter settings, running logs, attack knowledge, attack budget, and evaluation results. The benchmark is designed to be comprehensive and sustainable, covering both attack, defense, and evaluation tasks, enabling more researchers to easily follow and contribute to this promising field. RecAD will drive more solid and reproducible research on recommender systems attack and defense, reduce the redundant efforts of researchers, and ultimately increase the credibility and practical value of recommender attack and defense. The project and documents are released at https://github.com/gusye1234/recad.
- REPReproducibility Analysis of Recommender Systems relying on Visual Features: traps, pitfalls, and countermeasures
by Pasquale Lops (University of Bari), Elio Musacchio (Università degli Studi di Bari Aldo Moro), Cataldo Musto (Dipartimento di Informatica – University of Bari), Marco Polignano (Università degli Studi di Bari Aldo Moro), Antonio Silletti (Dipartimento di Informatica – University of Bari) and Giovanni Semeraro (Dipartimento di Informatica – University of Bari)Reproducibility is an important requirement for scientific progress, and the lack of reproducibility for a large amount of published research can hinder the progress over the state-of-the-art. This concerns several research areas, and recommender systems are witnessing the same reproducibility crisis. Even solid works published at prestigious venues might not be reproducible for several reasons: data might not be public, source code for recommendation algorithms might not be available or well documented, and evaluation metrics might be computed using parameters not explicitly provided. In addition, recommendation pipelines are becoming increasingly complex due to the use of deep neural architectures or representations for multimodal side information involving text, images, audio, or video. This makes the reproducibility of experiments even more challenging. In this work, we describe an extension of an already existing open-source recommendation framework, called ClayRS, with the aim of providing the foundation for future reproducibility of recommendation processes involving images as side information. This extension, called ClayRS Can See, is the starting point for reproducing state-of-the-art recommendation algorithms exploiting images. We have provided our implementation of one of these algorithms, namely VBPR – Visual Bayesian Personalized Ranking from Implicit Feedback, and we have discussed all the issues related to the reproducibility of the study to deeply understand the main traps and pitfalls, along with solutions to deal with such complex environments. We conclude the work by proposing a checklist for recommender systems reproducibility as a guide for the research community.
- REPReproducibility of Multi-Objective Reinforcement Learning Recommendation: Interplay between Effectiveness and Beyond-Accuracy Perspectives
by Vincenzo Paparella (Politecnico di Bari), Vito Walter Anelli (Politecnico di Bari), Ludovico Boratto (University of Cagliari) and Tommaso Di Noia (Politecnico di Bari)Providing effective suggestions is of predominant importance for successful Recommender Systems (RSs). Nonetheless, the need of accounting for additional multiple objectives has become prominent, from both the final users’ and the item providers’ points of view. This need has led to a new class of RSs, called Multi-Objective Recommender Systems (MORSs). These systems are designed to provide suggestions by considering multiple (conflicting) objectives simultaneously, such as diverse, novel, and fairness-aware recommendations. In this work, we reproduce a state-of-the-art study on MORSs that exploits a reinforcement learning agent to satisfy three objectives, i.e., accuracy, diversity, and novelty of recommendations. The selected study is one of the few MORSs where the source code and datasets are released to ensure the reproducibility of the proposed approach. Interestingly, we find that some challenges arise when replicating the results of the original work, due to the nature of multiple-objective problems. We also extend the evaluation of the approach to analyze the impact of improving user-centred objectives of recommendations (i.e., diversity and novelty) in terms of algorithmic bias. To this end, we take into consideration both popularity and category of the items. We discover some interesting trends in the recommendation performance according to different evaluation metrics. In addition, we see that the multi-objective reinforcement learning approach is responsible for increasing the bias disparity in the output of the recommendation algorithm for those items belonging to positively/negatively biased categories. We publicly release datasets and codes in the following GitHub repository: https://anonymous.4open.science/r/MORS_reproducibility-BD60
- REPThe effect of third party implementations on reproducibility
by Balázs Hidasi (Gravity R&D, a Taboola company) and Ádám Tibor Czapp (Gravity R&D, a Taboola company). The effect of third party implementations on reproducibilityReproducibility of recommender systems research has come under scrutiny during recent years. Along with works focusing on repeating experiments with certain algorithms, the research community has also started discussing various aspects of evaluation and how these affect reproducibility. We add a novel angle to this discussion by examining how unofficial third-party implementations could benefit or hinder reproducibility. Besides giving a general overview, we thoroughly examine six third-party implementations of a popular recommender algorithm and compare them to the official version on five public datasets. In the light of our alarming findings we aim to draw the attention of the research community to this neglected aspect of reproducibility.
List of all Late-breaking Results papers accepted for RecSys 2023 (in alphabetical order).
- LBRA Model-Agnostic Framework for Recommendation via Interest-aware Item Embeddings
by Amit Kumar Jaiswal (University of Surrey) and Yu Xiong (University of Surrey).Item representation holds significant importance in recommendation systems, which encompasses domains such as news, retail, and videos. Retrieval and ranking models utilise item representation to capture the user-item relationship based on user behaviours. While existing representation learning methods primarily focus on optimising item-based mechanisms, such as attention and sequential modelling. However, these methods lack a modelling mechanism to directly reflect user interests within the learned item representations. Consequently, these methods may be less effective in capturing user interests indirectly. To address this challenge, we propose a novel Interest-aware Capsule network (IaCN) recommendation model, a model-agnostic framework that directly learns interest-oriented item representations. IaCN serves as an auxiliary task, enabling the joint learning of both item-based and interest-based representations. This framework adopts existing recommendation models without requiring substantial redesign. We evaluate the proposed approach on benchmark datasets, exploring various scenarios involving different deep neural networks, behaviour sequence lengths, and joint learning ratios of interest-oriented item representations. Experimental results demonstrate significant performance enhancements across diverse recommendation models, validating the effectiveness of our approach.
- LBRAn Exploration of Sentence-Pair Classification for Algorithmic Recruiting
by Mesut Kaya (Aalborg University Copenhagen) and Toine Bogers (IT University of Copenhagen).Recent years have seen a rapid increase in the application of computational approaches to different HR tasks, such as algorithmic hiring, skill extraction, and monitoring of employee satisfaction. Much of the recent work on estimating the fit between a person and a job has used representation learning to represent both resumes and job vacancies computationally and determine the degree to which they match. A common approach to this task is Sentence-BERT, which uses a Siamese network to encode resumes and job descriptions into fixed-length vectors and estimates how well they match based on the similarity between those vectors. In our paper, we adapt BERT’s next-sentence prediction task—predicting whether one sentence is likely to follow another in a given context—to the task of matching resumes with job descriptions. Using historical data on past (mis)matches between job-resume pairs, we fine-tune BERT for this downstream task. Through a combination of offline and online experiments on data from a large Scandinavian job portal, we show that this approach performs significantly better than Sentence-BERT and other state-of-the-art approaches for determining person-job fit.
- LBRAnalyzing Accuracy versus Diversity in a Health Recommender System for Physical Activities: a Longitudinal User Study
by Ine Coppens (WAVES – imec – Ghent University), Luc Martens (WAVES – imec – Ghent University) and Toon De Pessemier (WAVES – imec – Ghent University).As personalization has great potential to improve mobile health apps, analyzing the effect of different recommender algorithms in the health domain is still in its infancy. As such, this paper investigates whether more accurate recommendations from a content-based recommender or more diverse recommendations from a user-based collaborative filtering recommender will lead to more motivation to move. An eight-week longitudinal between-subject user study is being conducted with an Android app in which participants receive personalized recommendations for physical activities and tips to reduce sedentary behavior. The objective manipulation check confirmed that the group with collaborative filtering received significantly more diverse recommendations. The subjective manipulation check showed that the content-based group assigned more positive feedback for perceived accuracy and star rating to the recommendations they chose and executed. However, perceived diversity and inspiringness was significantly higher in the content-based group, suggesting that users might experience the recommendations differently. Lastly, momentary motivation for the executed activities and tips was significantly higher in the content-based group. As such, the preliminary results of this longitudinal study suggest that more accurate and less diverse recommendations have better effects on motivating users to move more.
- LBRBroadening the Scope: Evaluating the Potential of Recommender Systems beyond prioritizing Accuracy
by Vincenzo Paparella (Politecnico di Bari), Dario Di Palma (Politecnico di Bari), Vito Walter Anelli (Politecnico di Bari) and Tommaso Di Noia (Politecnico di Bari).Although beyond-accuracy metrics have gained attention in the last decade, the accuracy of recommendations is still considered the gold standard to evaluate Recommender Systems (RSs). This approach prioritizes the accuracy of recommendations, neglecting the quality of suggestions to enhance user needs, such as diversity and novelty, as well as trustworthiness regulations in RSs for user and provider fairness. As a result, single metrics determine the success of RSs, but this approach fails to consider other criteria simultaneously. A downside of this method is that the most accurate model configuration may not excel in addressing the remaining criteria. This study seeks to broaden RS evaluation by introducing a multi-objective evaluation that considers all model configurations simultaneously under several perspectives. To achieve this, several hyper-parameter configurations of an RS model are trained, and the Pareto-optimal ones are retrieved. The Quality Indicators (QI) of Pareto frontiers, which are gaining interest in Multi-Objective Optimization research, are adapted to RSs. QI enables evaluating the model’s performance by considering various configurations and giving the same importance to each metric. The experiments show that this multi-objective evaluation overturns the ranking of performance among RSs, paving the way to revisit the evaluation approaches of the RecSys research community. We release codes and datasets in the following GitHub repository: https://anonymous.4open.science/r/RecMOE-3ED3.
- LBRClimbing crags repetitive choices and recommendations
by Iustina Ivanova (Independent Researcher).Outdoor sport climbing in Northern Italy attracts climbers from around the world. While this country has many rock formations, it offers enormous possibilities for adventurous people to explore the mountains. Unfortunately, this great potential causes a problem in finding suitable destinations (crags) to visit for climbing activity. Existing recommender systems in this domain address this issue and suggest potentially interesting items to climbers utilizing a content-based approach. These systems understand users’ preferences from their past logs recorded in an electronic training diary. At the same time, some sports people have a behavioral tendency to revisit the same place for subjective reasons. It might be related to weather and seasonality (for instance, some crags are suitable for climbing in winter/summer only), the users’ preferences (when climbers like specific destinations more than others), or personal goals to be achieved in sport (when climbers plan to try some routes again). Unfortunately, current climbing crags recommendations do not adapt when users demonstrate these repetitive behavior patterns. Sequential recommender systems can capture such users’ habits since their architectures were designed to model users’ next item choice by learning from their previous decision manners. To understand to which extent these sequential recommendations can predict the following crags choices in sport climbing, we analyzed a scenario when climbers show repetitious decisions. Further, we present a data set from collected climbers’ e-logs in the Arco region (Italy) and applied several sequential recommender systems architectures for predicting climbers’ following crags’ visits from their past logs. We evaluated these recommender systems offline and compared ranking metrics with the other reported results on the different data sets. The work concludes that sequential models obtain comparably accurate results as in the other studies and have the prospect for climbers’ subsequent visit prediction and crags recommendations.
- LBRContinual Collaborative Filtering Through Gradient Alignment
by Hieu Do (Singapore Management University) and Hady Lauw (Singapore Management University).A recommender system operates in a dynamic environment where new items emerge and new users join the system, resulting in ever-growing user-item interactions over time. Existing works either assume a model trained offline on a static dataset (requiring periodic re-training with ever larger datasets); or an online learning setup that favors recency over history. As privacy-aware users could hide their histories, the loss of older information means that periodic retraining may not always be feasible, while online learning may lose sight of users’ long-term preferences. In this work, we adopt a continual learning perspective to collaborative filtering, by compartmentalizing users and items over time into a notion of tasks. Of particular concern is to mitigate catastrophic forgetting that occurs when the model would reduce in performance for older users and items in prior tasks even as it tries to fit the newer users and items in the current task. To alleviate this, we propose a method that leverages gradient alignment to deliver a model that is more compatible across tasks and maximizes user agreement for better user representations to improve long-term recommendations.
- LBREvaluating The Effects of Calibrated Popularity Bias Mitigation: A Field Study
by Anastasiia Klimashevskaia (MediaFutures, University of Bergen), Mehdi Elahi (MediaFutures, University of Bergen), Dietmar Jannach (University of Klagenfurt), Lars Skjærven (TV 2), Astrid Tessem (TV 2) and Christoph Trattner (MediaFutures, University of Bergen).Despite their proven various benefits, Recommender Systems can cause or amplify certain undesired effects. In this paper, we focus on Popularity Bias, i.e., the tendency of a recommender system to utilize the effect of recommending popular items to the user. Prior research has studied the negative impact of this type of bias on individuals and society as a whole and proposed various approaches to mitigate this in various domains. However, almost all works adopted offline methodologies to evaluate the effectiveness of the proposed approaches. Unfortunately, such offline simulations can potentially be rather simplified and unable to capture the full picture. To contribute to this line of research and given a particular lack of knowledge about how debiasing approaches work not only offline, but online as well, we present in this paper the results of user study on a national broadcaster movie streaming platform in [country]1, i.e., [platform], following the A/B testing methodology. We deployed an effective mitigation approach for popularity bias, called Calibrated Popularity (CP), and monitored its performance in comparison to the platform’s existing collaborative filtering recommendation approach as a baseline over a period of almost four months. The results obtained from a large user base interacting in real-time with the recommendations indicate that the evaluated debiasing approach can be effective in addressing popularity bias while still maintaining the level of user interest and engagement
- LBRHow Users Ride the Carousel: Exploring the Design of Multi-List Recommender Interfaces From a User Perspective
by Benedikt Loepp (University of Duisburg-Essen) and Jürgen Ziegler (University of Duisburg-Essen).Multi-list interfaces are widely used in recommender systems, especially in industry, showing collections of recommendations, one below the other, with items that have certain commonalities. The composition and order of these “carousels” are usually optimized by simulating user interaction based on probabilistic models learned from item click data. Research that actually involves users is rare, with only few studies investigating general user experience in comparison to conventional recommendation lists. Hence, it is largely unknown how specific design aspects such as carousel type and length influence the individual perception and usage of carousel-based interfaces. This paper seeks to fill this gap through an exploratory user study. The results confirm previous assumptions about user behavior and provide first insights into the differences in decision making in the presence of multiple recommendation carousels.
- LBRIntegrating Item Relevance in Training Loss for Sequential Recommender Systems
by Andrea Bacciu (Sapienza University of Rome), Federico Siciliano (Sapienza University of Rome), Nicola Tonellotto (University of Pisa) and Fabrizio Silvestri (University of Rome).Sequential Recommender Systems (SRSs) are a popular type of recommender system that leverages user history to predict the next item of interest. However, the presence of noise in user interactions, stemming from account sharing, inconsistent preferences, or accidental clicks, can significantly impact the robustness and performance of SRSs, particularly when the entire item set to be predicted is noisy. This situation is more prevalent when only one item is used to train and evaluate the SRSs. To tackle this challenge, we propose a novel approach that addresses the issue of noise in SRSs. First, we propose a sequential multi-relevant future items training objective, leveraging a loss function aware of item relevance, thereby enhancing their robustness against noise in the training data. Additionally, to mitigate the impact of noise at evaluation time, we propose multi-relevant future items evaluation (MRFI-evaluation), aiming to improve overall performance. Our relevance-aware models obtain an improvement of ~1.58\% of NDCG@10 and 0.96\% in terms of HR@10 in the traditional evaluation protocol, the one which utilizes one relevant future item. In the MRFI-evaluation protocol, using multiple future items, the improvement is ~2.82\% of NDCG@10 and ~0.64\% of HR@10 w.r.t the best baseline model.
- LBRIntegrating Offline Reinforcement Learning with Transformers for Sequential Recommendation
by Xumei Xi (Cornell University), Yuke Zhao (Bloomberg LP), Quan Liu (Bloomberg), Liwen Ouyang (Bloomberg) and Yang Wu (Independent Researcher).We consider the problem of sequential recommendation, where the current recommendation is made based on past interactions. This recommendation task requires efficient processing of the sequential data and aims to provide recommendations that maximize the long-term reward. To this end, we train a farsighted recommender by using an offline RL algorithm with the policy network in our model architecture that has been initialized from a pre-trained transformer model. The pre-trained model leverages the superb ability of the transformer to process sequential information. Compared to prior works that rely on online interaction via simulation, we focus on implementing a fully offline RL framework that is able to converge in a fast and stable way. Through extensive experiments on public datasets, we show that our method is robust across various recommendation regimes, including e-commerce and movie suggestions. Compared to state-of-the-art supervised learning algorithms, our algorithm yields recommendations of higher quality, demonstrating the clear advantage of combining RL and transformers.
- LBRLearning the True Objectives of Multiple Tasks in Sequential Behavior Modeling
by Jiawei Zhang (Peking University).Multi-task optimization is an emerging research field in recommender systems that focuses on improving the recommendation performance of multiple tasks. Various methods have been proposed in the past to address task weight balancing, gradient conflict resolution, Pareto optimality, etc, yielding promising results in specific contexts. However, when it comes to real-world scenarios involving user sequential behaviors, these methods are not well suited. To address this gap, we propose AcouRec, a novel and effective approach for sequential behavior modeling in multi-task recommender systems inspired by acoustic attenuation. Specifically, AcouRec introduces an impact attenuation mechanism to mitigate the uncertain task interference in multi-task optimization. Extensive experiments on public datasets demonstrate the effectiveness of AcouRec.
- LBRLeveraging Large Language Models for Sequential Recommendation
by Jesse Harte (Delivery Hero SE), Wouter Zorgdrager (Delivery Hero SE), Panos Louridas (Athens University of Economics & Business), Asterios Katsifodimos (Delft University of Technology), Dietmar Jannach (University of Klagenfurt) and Marios Fragkoulis (Delivery Hero SE).Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we devise and evaluate three approaches to leverage the power of LLMs in different ways. Our results from experiments on two datasets show that initializing the state-of-the-art sequential recommendation model BERT4Rec with embeddings obtained from an LLM improves NDCG by 15-20% compared to the vanilla BERT4Rec model. Furthermore, we find that a simple approach that leverages LLM embeddings for producing recommendations, can provide competitive performance by highlighting semantically related items. We publicly share the code and data of our experiments to ensure reproducibility.
- LBROn the Consistency, Discriminative Power and Robustness of Sampled Metrics in Offline Top-N Recommender System Evaluation
by Yang Liu (University of Helsinki), Alan Medlar (University of Helsinki) and Dorota Glowacka (University of Helsinki).Negative item sampling in offline top-n recommendation evaluation has become increasingly wide-spread, but remains controversial. While several studies have warned against using sampled evaluation metrics on the basis of being a poor approximation of the full ranking (i.e.~using all negative items), others have highlighted their improved discriminative power and potential to make evaluation more robust. Unfortunately, empirical studies on negative item sampling are based on relatively few methods (between 3-12) and, therefore, lack the statistical power to assess the impact of negative item sampling in practice.
In this article, we present preliminary findings from a comprehensive benchmarking study of negative item sampling based on 52 recommendation algorithms and 3 benchmark data sets. We show how the number of sampled negative items and different sampling strategies affect the consistency and discriminative power of sampled evaluation metrics. Furthermore, we investigate the impact of sparsity bias and popularity bias on the robustness of these metrics. In brief, we show that the optimal parameterizations for negative item sampling are dependent on data set characteristics and the goals of the investigator, suggesting a need for greater transparency in related experimental design decisions.
- LBROutRank: Speeding up AutoML-based Model Search for Large Sparse Data sets with Cardinality-aware Feature Ranking
by Blaž Škrlj (Outbrain) and Blaž Mramor (Outbrain).The design of modern recommender systems relies on understanding which parts of the feature space are relevant for solving a given recommendation task. However, real-world data sets in this domain are often characterized by their large size, sparsity, and noise, making it challenging to identify meaningful signals. Feature ranking represents an efficient branch of algorithms that can help address these challenges by identifying the most informative features and facilitating the automated search for more compact and better-performing models (AutoML). We introduce OutRank, a system for versatile feature ranking and data quality-related anomaly detection. OutRank was built with categorical data in mind, utilizing a variant of mutual information that is normalized with regard to the noise produced by features of the same cardinality. We further extend the similarity measure by incorporating information on feature similarity and combined relevance. The proposed approach’s feasibility is demonstrated by speeding up the state-of-the-art AutoML system on a synthetic data set with no performance loss. Furthermore, we considered a real-life click-through-rate prediction data set where it outperformed strong baselines such as random forest-based approaches. The proposed approach enables exploration of up to 300% larger feature spaces compared to AutoML-only approaches, enabling faster search for better models on off-the-shelf hardware.
- LBRPower Loss Function in Neural Networks for Predicting Click-Through Rate
by Ergun Biçici (Huawei R&D Center Turkey).Loss functions guide machine learning models towards concentrating on the error most important to improve upon. We introduce power loss functions for neural networks and apply them on imbalanced click-through rate datasets. Power loss functions decrease the loss for confident predictions and increase the loss for error-prone predictions. They improve both AUC and F1 and produce better calibrated results. We obtain improvements in the results on four different classifiers and on two different datasets. We obtain significant improvements in AUC that reach $0.44\%$ for DeepFM on the Avazu dataset.
- LBRTowards Health-Aware Fairness in Food Recipe Recommendation
by Mehrdad Rostami (University of Oulu), Mohammad Aliannejadi (University of Amsterdam) and Mourad Oussalah (University of Oulu).Food recommendation systems play a crucial role in suggesting personalized recommendations designed to help users find food and recipes that align with their preferences. However, many existing food recommendation systems have overlooked the important aspect of considering the health and nutritional value of recommended foods, thereby limiting their effectiveness in generating truly healthy recommendations. Our preliminary analysis indicates that users tend to respond positively to unhealthy food and recipes. As a result, existing food recommender systems that neglect health considerations often assign high scores to popular items, inadvertently encouraging unhealthy choices among users. In this study, we propose the development of a fairness-based model that prioritizes health considerations. Our model incorporates fairness constraints from both the user and item perspectives, integrating them into a joint objective framework. Experimental results conducted on real-world food datasets demonstrate that the proposed system not only maintains the ability of food recommendation systems to suggest users’ favorite foods but also improves the health factor compared to unfair models, with an average enhancement of approximately 35%.
- LBRTurning Dross Into Gold Loss: is BERT4Rec really better than SASRec?
by Anton Klenitskiy (Sber, AI Lab) and Alexey Vasilev (Sber, AI Lab).Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.
- LBRUncertainty-adjusted Inductive Matrix Completion with Graph Neural Networks
by Petr Kasalicky (Singapore Management University, School of Computing and Information Systems), Antoine Ledent (Singapore Management University, School of Computing and Information Systems) and Rodrigo Alves (Czech Technical University, Faculty of Information Technology).We propose a robust recommender systems model which performs matrix completion and a ratings-wise uncertainty estimation jointly. Whilst the prediction module is purely based on an implicit low-rank assumption imposed via nuclear norm regularization, our loss function is augmented by an uncertainty estimation module which learns an anomaly score for each individual rating via a Graph Neural Network: data points deemed more anomalous by the GNN are downregulated in the loss function used to train the low-rank module. The whole model is trained in an end-to-end fashion, allowing the anomaly detection module to tap on the supervised information available in the form of ratings. Thus, our model’s predictors enjoy the favourable generalization properties that come with being chosen from small function space (i.e., low-rank matrices), whilst exhibiting the robustness to outliers and flexibility that comes with deep learning methods. Furthermore, the anomaly scores themselves contain valuable qualitative information. Experiments on various real-life datasets demonstrate that our model outperforms standard matrix completion and other baselines, confirming the usefulness of the anomaly detection module.
- LBRUncovering ChatGPT’s Capabilities in Recommender Systems
by Sunhao Dai (Renmin University of China), Ninglu Shao (Renmin University of China), Haiyuan Zhao (Renmin University of China), Weijie Yu (University of International Business and Economics), Zihua Si (Renmin University of China), Chen Xu (Renmin University of China), Zhongxiang Sun (Renmin University of China), Xiao Zhang (Renmin University of China) and Jun Xu (Renmin University of China).The debut of ChatGPT has recently attracted significant attention from the natural language processing (NLP) community and beyond. Existing studies have demonstrated that ChatGPT shows significant improvement in a range of downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms of recommendations remain unclear. In this study, we aim to enhance ChatGPT’s recommendation capabilities by aligning it with traditional information retrieval (IR) ranking capabilities, including point-wise, pair-wise, and list-wise ranking. To achieve this goal, we re-formulate the aforementioned three recommendation policies into prompt formats tailored specifically to the domain at hand. Through extensive experiments on four datasets from different domains, we analyze the distinctions among the three recommendation policies. Our findings indicate that ChatGPT achieves an optimal balance between cost and performance when equipped with list-wise ranking. This research sheds light on a promising direction for aligning ChatGPT with recommendation tasks. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at \url{https://anonymous.4open.science/r/LLM4RS-532C/}.
List of all Demonstration papers accepted for RecSys 2023 (in alphabetical order).
- DEMEasyStudy: Framework for Easy Deployment of User Studies on Recommender Systems
by Patrik Dokoupil (Department of Software Engineering, Charles University) and Ladislav Peska (Faculty of Mathematics and Physics, Charles University, Prague, Czechia).Improvements in the recommender systems (RS) domain are not possible without a thorough way to evaluate and compare newly proposed approaches. User studies represent a viable alternative to online and offline evaluation schemes, but despite their numerous benefits, they are only rarely used. One of the main reasons behind this fact is that preparing a user study from scratch involves a lot of extra work on top of a simple algorithm proposal. To simplify this task, we propose \textsc{EasyStudy}, a modular framework built on the credo “\textit{Make simple things fast and hard things possible}”. It features ready-to-use datasets, preference elicitation methods, incrementally tuned baseline algorithms, study flow plugins, and evaluation metrics. As a result, a simple study comparing several RS can be deployed with just a few clicks, while more complex study designs can still benefit from a range of reusable components, such as preference elicitation. Overall, \textsc{EasyStudy} dramatically decreases the gap between the laboriousness of offline evaluation vs. user studies and, therefore, may contribute towards the more reliable and insightful user-centric evaluation of next-generation RS.
- DEMImproving Group Recommendations using Personality, Dynamic Clustering and Multi-Agent MicroServices
by Patrícia Alves (GECAD/LASI – ISEP, Polytechnic of Porto), André Martins (GECAD/LASI – ISEP, Polytechnic of Porto), Paulo Novais (ALGORITMI/LASI, University of Minho) and Goreti Marreiros (GECAD/LASI, ISEP, Polytechnic of Porto).The complexity associated to group recommendations needs strategies to mitigate several problems, such as the group’s heterogeinity and conflicting preferences, the emotional contagion phenomenon, the cold-start problem, and the group members’ needs and concerns while providing recommendations that satisfy all members at once. In this demonstration, we show how we implemented a Multi-Agent Microservice to represent the tourists in a mobile Group Recommender System for Tourism prototype. A novel dynamic clustering process is presented to help minimize the group’s heterogeneity and conflicting preferences. To help solve the cold-start problem, the preliminary tourist attractions preference and travel-related preferences & concerns are predicted using the tourists’ personality, while taking the tourists’ disabilities and fears into account. Although there is no need for previous interactions data to build the tourists’ profile since we predict the tourists’ preferences, the tourist agents learn with each other by using association rules to find patterns in the tourists’ profile and in the ratings given to Points of Interest to refine the recommendations.
- DEMIntroducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit
by Tobias Vente (University of Siegen), Michael Ekstrand (Boise State University) and Joeran Beel (University of Siegen).LensKit is one of the first and most popular Recommender System Libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 65.5% (19 out of 29) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.
- DEMLLM Based Generation of Item-Description for Recommendation System
by Arkadeep Acharya (Sony Research India), Brijraj Singh (Sony Research India) and Naoyuki Onoe (Sony Research India).The description of an item plays a pivotal role in providing concise and informative summaries to captivate potential viewers and is essential for recommendation systems. Traditionally, such descriptions were obtained through manual web scraping techniques, which are time-consuming and susceptible to data inconsistencies. In recent years, Large Language Models (LLMs), such as GPT-3.5, and open source LLMs like Alpaca have emerged as powerful tools for natural language processing tasks. In this paper, we have explored how we can use LLMs to generate detailed descriptions of the items. To conduct the study, we have used the MovieLens 1M dataset comprising movie titles and the Goodreads Dataset consisting of names of books and subsequently, an open-sourced LLM, Alpaca, was prompted with few-shot prompting on this dataset to generate detailed movie descriptions considering multiple features like the names of the cast and directors for the ML dataset and the names of the author and publisher for the Goodreads dataset. The generated description was then compared with the scraped descriptions using a combination of Top Hits, MRR, and NDCG as evaluation metrics. The results demonstrated that LLM-based movie description generation exhibits significant promise, with results comparable to the ones obtained by web-scraped descriptions.
- DEMLocalify.org: Locally-focus Music Artist and Event Recommendation
by Douglas Turnbull (Ithaca College), April Trainor (Ithaca College), Griffin Homan (Ithaca College), Elizabeth Richards (Ithaca College), Kieran Bentley (Ithaca College), Victoria Conrad (Ithaca College), Paul Gagliano (Ithaca College) and Cassandra Raineault (Ithaca College).Cities with strong local music scenes enjoy many social and economic benefits. To this end, we are interested in developing a locally-focused artist and event recommendation system called Localify.org that supports and promotes local music scenes. Local artists tend to be relatively obscure and reside in the long tail of the artist’s popularity distribution. In this demo paper, we describe both the overall system architecture as well as our core recommender system that uses artist-artist similarity information as opposed to user-artist preference information. We also discuss the role of popularity bias and how we attempt to ameliorate it in the context of local music recommendation.
- DEMRe2Dan: Retrieval of medical documents for e-Health in Danish
by Antonela Tommasel (ISISTAN Research Institute, CONICET-UNCPBA), Rafael Pablos (Aarhus Universitet) and Ira Assent (Aarhus Universitet).With the clinical environment becoming more data-reliant, healthcare professionals now have unparalleled access to comprehensive clinical information from numerous sources. Then, one of the main issues is how to avoid overloading practitioners with large amounts of (irrelevant) information while guiding them to the relevant documents for specific patient cases. Additional challenges appear due to the shortness of queries and the presence of long (and maybe noisy) contextual information. This demo presents Re2Dan, a web Retrieval and recommender of Danish medical documents. Re2Dan leverages several techniques to improve the quality of retrieved documents. First, it combines lexical and semantic searches to understand the meaning and context of user queries, allowing the retrieval of documents that are conceptually similar to the user’s query. Second, it recommends similar queries, allowing users to discover related documents and insights. Third, when given contextual information (e.g., from patients’ clinical notes), it suggests medical concepts to expand the user query, enabling a more focused search scope and thus obtaining more accurate recommendations. Preliminary analyses showed the effectiveness of the recommender in improving the relevance and comprehensiveness of recommendations, thereby assisting healthcare professionals in finding relevant information for informed decision-making.
List of all doctoral symposium papers accepted for RecSys 2023 (in alphabetical order).
- DSAcknowledging dynamic aspects of trust in recommender systems
by Imane Akdim (School of Computer Science – Mohammed VI Polytechnic University).Trust-based recommender systems emerged as a solution to different limitations of traditional recommender systems. These systems rely on the assumption that users will adopt the preferences of users they deem trustworthy in an online social setting. However, most trust-based recommender systems consider trust to be a static notion, thereby disregarding crucial dynamic factors that influence the value of trust between users and the performance of the recommender system. In this work, we intend to address several challenges regarding the dynamics of trust within a trust-based recommender system. These issues include the temporal evolution of trust between users and change detection and prediction in users’ interactions. By exploring the factors that influence the evolution of human trust, a complex and abstract concept, this work will contribute to a better understanding of how trust operates in recommender systems.
- DSAdvancing Automation of Design Decisions in Recommender System Pipelines
by Tobias Vente (University of Siegen).Recommender systems have become essential in domains like streaming services, social media platforms, and e-commerce websites. However, the development of a recommender system involves a complex pipeline with preprocessing, data splitting, algorithm and model selection, and postprocessing stages, requiring critical design decisions. Every stage of the recommender systems pipeline requires design decisions that influence the performance of the recommender system. To ease design decisions, automated machine learning (AutoML) techniques have been adapted to the field of recommender systems, resulting in various AutoRecSys libraries. Nevertheless, these libraries lack library independence and limit flexibility in integrating automation techniques from different sources. In response, our research aims to enhance the usability of AutoML techniques for design decisions in recommender system pipelines. We focus on developing flexible and library-independent automation techniques for algorithm selection, model selection, and postprocessing steps. By enabling developers to make informed choices and ease the recommender system development process, we decrease the developer’s effort while improving the performance of the recommender systems. Moreover, we want to analyze the cost-to-benefit ratio of automation techniques in recommender systems, evaluating the computational overhead and the resulting improvements in predictive performance. Our objective is to leverage AutoML concepts to automate design decisions in recommender system pipelines, reduce manual effort, and enhance the overall performance and usability of recommender systems.
- DSChallenges for Anonymous Session-Based Recommender Systems in Indoor Environments
by Alessio Ferrato (Roma TRE).Recommender Systems (RSs) have gained widespread popularity for providing personalized recommendations in manifold domains. However, considering the growing user privacy concerns, the development of recommender systems that prioritize data protection has become increasingly important. In indoor environments, RSs face unique challenges, and ongoing research is being conducted to address them. Anonymous Session-Based Recommender Systems (ASBRSs) can represent a possible solution to address these challenges while ensuring user privacy. This paper aims to bridge the gap between existing RS research and the demand for privacy-preserving recommender systems, especially in indoor settings, where significant research efforts are underway. Therefore, it proposes three research questions: How does user modeling based on implicit feedback impact on ASBRSs, considering different embedding extraction networks? How can short sessions be leveraged to start the recommendation process in ASBRSs? To what extent can ASBRSs generate fair recommendations? By investigating these questions, this study establishes the foundations for applying ASBRSs in indoor environments, safeguarding user privacy, and contributing to the ongoing research in this field.
- DSComplementary Product Recommendation for Long-tail Products
by Rastislav Papso (Kempelen Institute of Intelligent Technologies).Identifying complementary relations between products plays a key role in e-commerce Recommender Systems (RS). Existing methods in Complementary Product Recommendation (CPR), however, focus only on identifying complementary relations in huge and data-rich catalogs, while none of them considers real-world scenarios of small and medium e-commerce platforms with limited number of interactions. In this paper, we discuss our research proposal that addresses the problem of identifying complementary relations in such sparse settings. To overcome the data sparsity problem, we propose to first learn complementary relations in large and data-rich catalogs and then transfer learned knowledge to small and scarce ones. To be able to map individual products across different catalogs and thus transfer learned relations between them, we propose to create Product Universal Embedding Space (PUES) using textual and visual product meta-data, which serves as a common ground for the products from arbitrary catalog.
- DSDemystifying Recommender Systems: A Multi-faceted Examination of Explanation Generation, Impact, and Perception
by Giacomo Balloccu (Università degli Studi di Cagliari).Recommender systems have become an integral component of the digital landscape, impacting a multitude of services and industries ranging from e-commerce to entertainment and beyond. By offering personalised suggestions, these systems challenge a fundamental problem in our modern information society named information overload. As users face a deluge of choices, recommender systems help sift through this immense sea of possibilities, delivering a personalised subset of options that align with user preferences and historical behaviour.
However, despite their considerable utility, recommender systems often operate as “black boxes,” obscuring the rationale behind recommendations. This opacity can engender mistrust and undermine user engagement, thus attenuating the overall effectiveness of the system. Researchers have emphasized the importance of explanations in recommender systems, highlighting how explanations can enhance system transparency, foster user trust, and improve decision-making processes, thereby enriching user experiences and yielding potential business benefits. Yet, a significant gap persists in the current state of human-understandable explanations research. While recommender systems have grown increasingly complex, our capacity to generate clear, concise, and relevant explanations that reflect this complexity remains limited. Crafting explanations that are both understandable and reflective of sophisticated algorithmic decision-making processes poses a significant challenge, especially in a manner that meets the user’s cognitive and contextual needs.
- DSDenoising Explicit Social Signals for Robust Recommendation
by Youchen Sun (Nanyang Technological University).Social recommender system assumes that user’s preferences can be influenced by their social connections. However, social networks are inherently noisy and contain redundant signals that are not helpful or even harmful for the recommendation task. In this extended abstract, we classify the noise in the explicit social links into intrinsic noise and extrinsic noise. Intrinsic noises are those edges that are natural in the social network but do not have an influence on the user preference modeling; Extrinsic noises, on the other hand, are those social links that are introduced intentionally through malicious attacks such that the attackers can manipulate the social influence to bias the recommendation outcome. To tackle this issue, we first propose a denoising framework that utilizes the information bottleneck principle and contrastive learning to filter out the noisy social edges and use the edges that are socially influential to enhance item prediction. Experiments will be conducted on the real-world datasets for the Top-K ranking evaluation as well as the model’s robustness to simulated social noises. Finally, we discuss the future plan about how to defend against extrinsic noise, which results from the malicious attack.
- DSEnhanced Privacy Preservation for Recommender Systems
by Ziqing Wu (NTU).My research focuses on privacy preservation for recommender systems specifically in the following aspects: first, how to better address users’ realistic privacy concerns and offer enhanced privacy control by considering what and with whom to share sensitive information for decentralized recommender systems; second, how to enhance the privacy preservation capability of LLM-based recommender systems; last, how to formulate uniform metrics to compare the privacy-preservation efficacy of the recommender system.
- DSExplainable Graph Neural Network Recommenders; Challenges and Opportunities
by Amir Reza Mohammadi (Universität Innsbruck).Graph Neural Networks (GNNs) have demonstrated significant potential in recommendation tasks by effectively capturing intricate connections among users, items, and their associated features. Given the escalating demand for interpretability, current research endeavors in the domain of GNNs for Recommender Systems (RecSys) necessitate the development of explainer methodologies to elucidate the decision-making process underlying GNN-based recommendations. In this work, we aim to present our research focused on techniques to extend beyond the existing approaches for addressing interpretability in GNN-based RecSys.
- DSExploring Unlearning Methods to Ensure the Privacy, Security, and Usability of Recommender Systems
by Jens Leysen (University of Antwerp).Machine learning algorithms have proven highly effective in analyzing large amounts of data and identifying complex patterns and relationships. One application of machine learning that has received significant attention in recent years is recommender systems, which are algorithms that analyze user behavior and other data to suggest items or content that a user may be interested in. However useful, these systems may unintentionally retain sensitive, outdated, or faulty information. Posing a risk to user privacy, system security, and limiting a system’s usability. In this research proposal, we aim to address these challenges by investigating methods for machine “unlearning”, which would allow information to be efficiently “forgotten” or “unlearned” from machine learning models. The main objective of this proposal is to develop the foundation for future machine unlearning methods. We first evaluate current unlearning methods and explore novel adversarial attacks on these methods’ verifiability, efficiency, and accuracy to gain new insights and further develop the theory of machine unlearning. Using our gathered insights, we seek to create novel unlearning methods that are verifiable, efficient, and limit unnecessary accuracy degradation. Through this research, we seek to make significant contributions to the theoretical foundations of machine unlearning while also developing unlearning methods that can be applied to real-world problems.
- DSImproving Recommender Systems Through the Automation of Design Decisions
by Lukas Wegmeth (University of Siegen).Recommender systems developers are constantly faced with difficult design decisions. Additionally, the number of options that a recommender systems developer has to consider continually grows over time with new innovations. The machine learning community is in a similar situation and has come together to tackle the problem. They invented concepts and tools to make machine learning development both easier and faster. These developments are categorized as automated machine learning (AutoML). As a result, the AutoML community formed and continuously innovates new approaches. Inspired by AutoML, the recommender systems community has recently understood the need for automation and sparsely introduced AutoRecSys. The goal of AutoRecSys is not to replace recommender systems developers but to improve performance through the automation of design decisions. With AutoRecSys, recommender systems engineers do not have to focus on easy but time-consuming tasks and are free to pursue difficult engineering tasks instead. Additionally, AutoRecSys enables easier access to recommender systems for beginners as it reduces the amount of knowledge required to get started with the development of recommender systems. AutoRecSys, like AutoML, is still early in its development and does not yet cover the whole development pipeline. Additionally, it is not yet clear, under which circumstances AutoML approaches can be transferred to recommender systems. Our research intends to close this gap by improving AutoRecSys both with regard to the transfer of AutoML and novel approaches. Furthermore, we focus specifically on the development of novel automation approaches for data processing and training. We note that the realization of AutoRecSys is going to be a community effort. Our part in this effort is to research AutoRecSys fundamentals, build practical tools for the community, raise awareness of the advantages of automation, and catalyze AutoRecSys development.
- DSKnowledge-Aware Recommender Systems based on Multi-Modal Information Sources
by Giuseppe Spillo (University of Bari ‘Aldo Moro’).The last few years saw a growing interest in Knowledge-Aware Recommender Systems (KARSs), given their capability in encoding and exploiting several data sources, both structured (such as \textit{knowledge graphs}) and unstructured (such as plain text); indeed, several pieces of research show the competitiveness of these models. Nowadays, a lot of models at the state-of-the-art in KARSs use deep learning, enabling them to exploit large amounts of information, including knowledge graphs (KGs), user reviews, plain text, and multimedia content (pictures, audio, videos). In my Ph.D. I will explore and study techniques for designing KARSs leveraging embeddings deriving from multi-modal information sources; the models I will design will aim at providing fair, accurate, and explainable recommendations.
- DSOvercoming Recommendation Limitations with Neuro-Symbolic Integration
by Tommaso Carraro (University of Padova / Fondazione Bruno Kessler).Despite being studied for over twenty years, Recommender Systems (RSs) still suffer from important issues that limit their applicability in real-world scenarios. Data sparsity, cold start, and explainability are some of the most impacting problems. Intuitively, these historical limitations can be mitigated by injecting prior knowledge into recommendation models. Neuro-Symbolic (NeSy) approaches are suitable candidates for achieving this goal. Specifically, they aim to integrate learning (e.g., neural networks) with symbolic reasoning (e.g., logical reasoning). Generally, the integration lets a neural model interact with a logical knowledge base, enabling reasoning capabilities. In particular, NeSy approaches have been shown to deal well with poor training data, and their symbolic component could enhance model transparency. This gives insights that NeSy systems could potentially mitigate the aforementioned RSs limitations. However, the application of such systems to RSs is still in its early stages, and most of the proposed architectures do not really exploit the advantages of a NeSy approach. To this end, we conducted preliminary experiments with a Logic Tensor Network (LTN), a novel NeSy framework. We used the LTN to train a vanilla Matrix Factorization model using a First-Order Logic knowledge base as an objective. In particular, we encoded facts to enable the regularization of the latent factors using content information, obtaining promising results. In this paper, we review existing NeSy recommenders, argue about their limitations, show our preliminary results with the LTN, and propose interesting future works in this novel research area. In particular, we show how the LTN can be intuitively used to regularize models, perform cross-domain recommendation, ensemble learning, and explainable recommendation, reduce popularity bias, and easily define the loss function of a model.
- DSRetrieval-augmented Recommender System: Enhancing Recommender Systems with Large Language Models
by Dario Di Palma (Politecnico di Bari).Recommender Systems (RSs) play a pivotal role in delivering personalized recommendations across various domains, from e-commerce to content streaming platforms. Recent advancements in natural language processing have introduced Large Language Models (LLMs) that exhibit remarkable capabilities in understanding and generating human-like text. RS are renowned for their effectiveness and proficiency within clearly defined domains; nevertheless, they are limited in adaptability and incapable of providing recommendations for unexplored data. Conversely, LLMs exhibit contextual awareness and strong adaptability to unseen data. Combining these technologies creates a potent tool for delivering contextual and relevant recommendations, even in cold scenarios characterized by high data sparsity. The proposal aims to explore the possibilities of integrating LLMs into RS, introducing a novel approach called Retrieval-augmented Recommender Systems, which combines the strengths of retrieval-based and generation-based models to enhance the ability of RSs to provide relevant suggestions.
- DSSequential Recommendation Models: A Graph-based Perspective
by Andreas Peintner (University of Innsbruck).Recommender systems (RecSys) traditionally leverage the users’ rich interaction data with the system, but ignore the sequential dependency of items. Sequential recommender systems aim to predict the next item the user will interact with (e.g., click on, purchase, or listen to) based on the preceding interactions of the user. Current state-of-the-art approaches focus on transformer-based architectures and graph neural networks. Specifically, the modeling of sequences as graphs has shown to be a promising approach to introduce a structured bias into the recommendation learning framework. In this work, we will outline our research of exploring different applications of graphs in sequential recommendation.
- DSUser-Centric Conversational Recommendation: Adapting the Need of User with Large Language Models
by Gangyi Zhang (University of Science and Technology of China).Conversational recommender systems (CRS) promise to provide a more natural user experience for exploring and discovering items of interest through ongoing conversation. However, effectively modeling user preferences during conversations and generating personalized recommendations in real time remain challenging problems. Users often express their needs in a vague and evolving manner, and CRS must adapt to capture the dynamics and uncertainty in user preferences to have productive interactions.
This research develops user-centric methods for building conversational recommendation system that can understand complex and changing user needs. We propose a graph-based conversational recommendation framework that represents multi-turn conversations as reasoning over a user-item-attribute graph. Enhanced conversational path reasoning incorporates graph neural networks to improve representation learning in this framework. To address uncertainty and dynamics in user preferences, we present the vague preference multi-round conversational recommendation scenario and an adaptive vague preference policy learning solution that employs reinforcement learning to determine recommendation and preference elicitation strategies tailored to the user.
Looking to the future, large language models offer promising opportunities to enhance various aspects of CRS, including user modeling, policy learning, response generation. Overall, this research takes a user-centered perspective in designing conversational agents that can adapt to the inherent ambiguity involved in natural language dialogues with people.
List of all industry track contributions accepted for RecSys 2023 (in alphabetical order).
- INDAccelerating Creator Audience Building through Centralized Exploration
by Buket Baran (Spotify), Guilherme Dinis Junior (Spotify), Antonina Danylenko (Spotify), Olayinka S. Folorunso (Spotify), Gösta Forsum (Spotify), Maksym Lefarov (Spotify), Lucas Maystre (Spotify) and Yu Zhao (Spotify).On Spotify, multiple recommender systems enable personalized user experiences across a wide range of product features. These systems are owned by different teams and serve different goals, but all of these systems need to explore and learn about new content as it appears on the platform. In this work, we describe ongoing efforts at Spotify to develop an efficient solution to this problem, by centralizing content exploration and providing signals to existing, decentralized recommendation systems (a.k.a. exploitation systems). We take a creator-centric perspective, and argue that this approach can dramatically reduce the time it takes for new content to reach its full potential.
- INDAdaptEx: a self-service contextual bandit platform
by William Black (Expedia Group), Ercument Ilhan (Expedia Group), Andrea Marchini (Expedia Group) and Vilda Markeviciute (Expedia Group).This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous “cold start” situations gracefully.
- INDAn Industrial Framework for Personalized Serendipitous Recommendation in E-commerce
by Zongyi Wang (jd.com), Yanyan Zou (JD.com), Anyu Dai (jd.com), Linfang Hou (jd.com), Nan Qiao (jd.com), Luobao Zou (jd.com), Mian Ma (JD.com), Zhuoye Ding (JD.com) and Sulong Xu (JD).Classical recommendation methods typically face the filter bubble problem where users likely receive recommendations of their familiar items, making them bored and dissatisfied. To alleviate such an issue, this applied paper introduces a novel framework for personalized serendipitous recommendation in an e-commerce platform (i.e., JD.com), which allows to present user unexpected and satisfying items deviating from user’s prior behaviors, considering both accuracy and novelty. To achieve such a goal, it is crucial yet challenging to recognize when a user is willing to receive serendipitous items and how many novel items are expected. To address above two challenges, a two-stage framework is designed. Firstly, a DNN-based scorer is deployed to quantify the novelty degree of a product category based on user behavior history. Then, we resort to a potential outcome framework to decide the optimal timing to recommend a user serendipitous items and the novelty degree of the recommendation. Online A/B test on the e-commerce recommender platform in JD.com demonstrates that our model achieves significant gains on various metrics, 0.54% relative increase of impressive depth, 0.8% of average user click count, 3.23% and 1.38% of number of novel impressive and clicked items individually.
- INDBeyond Labels: Leveraging Deep Learning and LLMs for Content Metadata
by Saurabh Agrawal (Tubi), John Trenkle (Tubi) and Jaya Kawale (Tubi).Content metadata plays a very important role in movie recommender systems as it provides valuable information about various aspects of a movie such as genre, cast, plot synopsis, box office summary, etc. Analyzing the metadata can help understand the user preferences and generate personalized recommendations catering to the niche tastes of the users. It can also help with content cold starting when the recommender system has little or no interaction data available to perform collaborative filtering. In this talk, we will focus on one particular type of metadata – genre labels. Genre labels associated with a movie or a TV series such as “horror” or “comedy” or “romance” help categorize a collection of movies into different themes and correspondingly setting up the audience expectation for a title. We present some of the challenges associated with using genre label information via traditional methods and propose a new way of examining the genre information that we call as the Genre Spectrum. The Genre Spectrum helps capture the various nuanced genres in a title and our offline and online experiments corroborate the effectiveness of the approach.
- INDContextual Multi-Armed Bandit for Email Layout Recommendation
by Yan Chen (Wayfair), Emilian Vankov (Wayfair), Linas Baltrunas (Netflix), Preston Donovan (Wayfair), Akash Mehta (Wayfair) and Benjamin Schroeder (Wayfair).We present the use of a contextual multi-armed bandit approach to improve the personalization of marketing emails sent to Wayfair’s customers. Emails are a critical outreach tool as they economically unlock a significant amount of revenue. We describe how we formulated our problem of selecting the optimal personalized email layout to use as a contextual multi-armed bandit problem. We also explain how we approximated a solution with an Epsilon-greedy strategy. We detail the thorough evaluations we ran, including offline experiments, an off-policy evaluation, and an online A/B test. Our results demonstrate that our approach is able to select personalized email layouts that lead to significant gains in topline business metrics including engagement and conversion rates.
- INDCreating the next generation of news experience on ekstrabladet.dk with recommender systems
by Johannes Kruse (DTU Compute & Ekstra Bladet), Kasper Lindskow (Ekstra Bladet), Michael Riis Andersen (DTU Compute) and Jes Frellsen (DTU Compute).With the uptake of algorithmic personalization, news organizations have to increasingly trust automated systems with previously considered editorial values, e.g., prioritizing news to readers. In the case study carried out by Ekstra Bladet, the Platform Intelligent News project demonstrates how recommender systems successfully enhanced the click-through rates (CTR) for multiple segments at ekstrabladet.dk while still prioritizing the news organization’s editorial values.
- INDDelivery Hero Recommendation Dataset: A Novel Dataset for Benchmarking Recommendation Algorithms
by Yernat Assylbekov (Delivery Hero), Raghav Bali (Delivery Hero), Luke Bovard (Delivery Hero) and Christian Klaue (Delivery Hero).In this paper, we propose a new dataset, Delivery Hero Recommendation Dataset (DHRD), which provides a diverse real-world dataset for researchers. DHRD comprises over a million food delivery orders from three distinct cities, encompassing thousands of vendors and an extensive range of dishes, serving a combined customer base of over a million individuals. We discuss the challenges associated with such real-world datasets. By releasing DHRD, researchers are empowered with a valuable resource for building and evaluating recommender systems, paving the way for advancements in this domain.
- INDEfficient Data Representation Learning in Google-scale Systems
by Derek Cheng (Google DeepMind), Ruoxi Wang (Google DeepMind), Wang-Cheng Kang (Google DeepMind), Benjamin Coleman (Google DeepMind), Yin Zhang (Google DeepMind), Jianmo Ni (Google DeepMind), Jonathan Valverde (Google DeepMind), Lichan Hong (Google DeepMind) and Ed Chi (Google DeepMind).Garbage in, Garbage out is a familiar maxim to ML practitioners and researchers, because the quality of a learned data representation is highly crucial to the quality of any ML model that consumes it as an input. To handle systems that serve billions of users at millions of queries per second (QPS), we need representation learning algorithms with significantly improved efficiency. At Google, we have dedicated thousands of iterations to develop a set of powerful techniques that efficiently learn high quality data representations.We have thoroughly validated these methods through offline evaluation, online A/B testing, and deployed these in over 50 models across major Google products. In this paper, we consider a generalized data representation learning problem that allows us to identify feature embeddings and crosses as common challenges. We propose two solutions, including: 1. Multi-size Unified Embedding to learn high-quality embeddings; and 2. Deep Cross Network V2 for learning effective feature crosses. We discuss the practical challenges we encountered and solutions we developed during deployment to production systems, compare with SOTA methods, and report offline and online experimental results. This work sheds light on the challenges and opportunities for developing next-gen algorithms for web-scale systems.
- INDFrom Research to Production: Towards Scalable and Sustainable Neural Recommendation Models on Commodity CPU Hardware
by Vihan Lakshman (ThirdAI), Anshumali Shrivastava (Rice University/ThirdAI), Tharun Medini (ThirdAI), Nicholas Meisburger (ThirdAI Corp), Joshua Engels (ThirdAI), David Torres Ramos (ThirdAI), Benito Geordie (ThirdAI), Pratik Pranav (ThirdAI), Shubh Gupta (ThirdAI), Yashwanth Adunukota (ThirdAI) and Siddharth Jain (ThirdAI).In the last decade, large-scale deep learning has fundamentally transformed industrial recommendation systems. However, this revolutionary technology remains prohibitively expensive due to the need for costly and scarce specialized hardware, such as GPUs, to train and serve models. In this talk, we share our multi-year journey at ThirdAI in developing efficient neural recommendation models that can be trained and deployed on commodity CPU machines without the need for costly accelerators like GPUs. In particular, we discuss the limitations of the current GPU-based ecosystem in machine learning, why recommendation systems are amenable to the strengths of CPU devices, and present results from our efforts to translate years of academic research into a deployable system that fundamentally shifts the economics of training and operating large-scale machine learning models.
- INDHeterogeneous Knowledge Fusion: A Novel Approach for Personalized Recommendation via LLM
by Bin Yin (Meituan), Junjie Xie (Meituan), Yu Qin (Meituan), Zixiang Ding (Meituan), Zhichao Feng (Meituan), Xiang Li (Unaffiliated) and Wei Lin (Unaffiliated).The analysis and mining of user heterogeneous behavior are of paramount importance in recommendation systems. However, the conventional approach of incorporating various types of heterogeneous behavior into recommendation models leads to feature sparsity and knowledge fragmentation issues. To address this challenge, we propose a novel approach for personalized recommendation via Large Language Model (LLM), by extracting and fusing heterogeneous knowledge from user heterogeneous behavior information. In addition, by combining heterogeneous knowledge and recommendation tasks, instruction tuning is performed on LLM for personalized recommendations. The experimental results demonstrate that our method can effectively integrate user heterogeneous behavior and significantly improve recommendation performance.
- INDIdentifying Controversial Pairs in Item-to-Item Recommendations
by Junyi Shen (Apple), Dayvid Rodrigues de Oliveira (Apple), Jin Cao (Apple), Brian Knott (Apple), Goodman Gu (Apple), Sindhu Vijaya Raghavan (Apple) and Rob Monarch (Apple).Recommendation systems in large-scale online marketplaces are essential to aiding users in discovering new content. However, state-of-the-art systems for item-to-item recommendation tasks are often based on a shallow level of contextual relevance, which can make the system insufficient for tasks where item relationships are more nuanced. Contextually relevant item pairs can sometimes have controversial or problematic relationships, and they could degrade user experiences and brand perception when recommended to users. For example, a recommendation of a divorce and co-parenting book can create a disturbing experience for someone who is downloading or viewing a marriage therapy book. In this paper, we propose a classifier to identify and prevent such problematic item-to-item recommendations and to enhance overall user experiences. The proposed approach utilizes active learning to sample hard examples effectively across sensitive item categories and uses human raters for data labeling. We also perform offline experiments to demonstrate the efficacy of this system for identifying and filtering controversial recommendations while maintaining recommendation quality.
- INDInvestigating the effects of incremental training on neural ranking models
by Benedikt Schifferer (NVIDIA), Wenzhe Shi (ShareChat), Gabriel de Souza Pereira Moreira (NVIDIA), Even Oldridge (NVIDIA), Chris Deotte (NVIDIA), Gilberto Titericz (NVIDIA), Kazuki Onodera (NVIDIA), Praveen Dhinwa (ShareChat), Vishal Agrawal (ShareChat) and Chris Green (ShareChat).Recommender systems are an essential component of online systems, providing users with a personalized experience. Some recommendation scenarios such as social networks or news are very dynamic, with new items added continuously and the interest of users changing over time due to breaking news or popular events. Incremental training is a popular technique to keep recommender models up-to-date in those dynamic platforms. In this paper, we provide an empirical analysis of a large industry dataset from the Sharechat app MOJ, a social media platform for short videos, to answer relevant questions like – how often should I retrain the model? – do different models, features and dataset sizes benefit from incremental training? – Do all users and items benefit the same from incremental training?
- INDLearning From Negative User Feedback and Measuring Responsiveness for Sequential Recommenders
by Yueqi Wang (Google), Yoni Halpern (Google), Shuo Chang (Google), Jingchen Feng (Google), Elaine Ya Le (Google), Longfei Li (Google), Xujian Liang (Google), Min-Cheng Huang (Google), Shane Li (Google), Alex Beutel (Google), Yaping Zhang (Google) and Shuchao Bi (Google).Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user’s positive interests, less attention has been paid to learn from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar recommendations to the user. However, negative feedback signals are often ignored in the training objective of sequential recommenders, which primarily aim at predicting positive user interactions. In this work, we incorporate explicit and implicit negative user feedback into the training objective of sequential recommenders using a “not-to-recommend” loss function that optimizes for the log likelihood of not recommending items with negative feedback. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system. Furthermore, we address a challenge in measuring recommender responsiveness to negative feedback by developing a counterfactual simulation framework to compare recommender responses between different user actions, showing improved responsiveness from the modeling change.
- INDLeveling Up the Peloton Homescreen: A System and Algorithm for Dynamic Row Ranking
by Natalia Chen (Peloton Interactive), Nganba Meetei (Peloton Interactive), Nilothpal Talukder (Peloton Interactive) and Alexey Zankevich (Peloton Interactive).At Peloton, we constantly strive to improve the member experience by highlighting personalized content that speaks to each individual user. One area of focus is our landing page, the homescreen, consisting of numerous rows of class recommendations used to captivate our users and guide them through our growing catalog of workouts. In this paper, we discuss a strategy we have used to increase the rate of workouts started from our homescreen through a Thompson sampling approach to row ranking, enhanced further with a collaborative filtering method based on user similarity calculated from workout history.
- INDLightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee’s Advertisement Recommendation
by Dang Minh Nguyen (Shopee, SEA Group), Chenfei Wang (Shopee, SEA Group), Yan Shen (Shopee, SEA Group) and Yifan Zeng (Shopee, SEA Group).Graph Neural Network (GNN) is the trending solution for item retrieval in recommendation problems. Most recent reports, however, focus heavily on new model architectures. This may bring some gaps when applying GNN in the industrial setup, where, besides the model, constructing graph and handling data sparsity also play critical roles in the overall success of the project. In this work, we report how we apply GNN for large-scale e-commerce item retrieval at Shopee. We detail our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. Specifically, we construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items’ embeddings for vector search. Finally, we develop multiple strategies to handle cold-start and long-tail items, which are critical in an advertisement (ads) system. Our models bring improvement in offline evaluations, online A/B tests, and are deployed to the main traffic of Shopee’s Recommendation Advertisement system.
- INDLoss Harmonizing for Multi-Scenario CTR Prediction
by Congcong Liu (JD.com), Liang Shi (JD.com), Pei Wang (JD.com), Fei Teng (JD.com), Xue Jiang (JD.com), Changping Peng (JD.com), Zhangang Lin (JD.com) and Jingping Shao (JD.com).Large-scale industrial systems often include multiple scenarios to satisfy diverse user needs. The common approach of using one model per scenario does not scale well and not suitable for minor scenarios with limited samples. An solution is to train a model on all scenarios, which can introduce domination and bias from the main scenario. MMoE-like structures have been proposed for multi-scenario prediction, but they do not explicitly address the issue of gradient unbalancing. This work proposes an adaptive loss harmonizing (ALH) algorithm for multi-scenario CTR prediction. It balances training by dynamically adjusting the learning speed, resulting in improved prediction performance. Experiments conducted on real production dataset and a rigorous A/B test prove the superiority of our method.
- INDMCM: A Multi-task Pre-trained Customer Model for Personalization
by Rui Luo (Amazon), Tianxin Wang (Amazon), Jingyuan Deng (Amazon) and Peng Wan (Amazon).Personalization plays a critical role in helping customers discover the products and contents they prefer for e-commerce stores.Personalized recommendations differ in contents, target customers, and UI. However, they require a common core capability – the ability to deeply understand customers’ preferences and shopping intents. In this paper, we introduce the MLCM (Multi-task Large pre-trained Customer Model), a large pre-trained BERT-based multi-task customer model with 10 million trainable parameters for e-commerce stores. This model aims to empower all personalization projects by providing commonly used preference scores for recommendations, customer embeddings for transfer learning, and a pre-trained model for fine-tuning. In this work, we improve the SOTA BERT4Rec framework to handle heterogeneous customer signals and multi-task training as well as innovate new data augmentation method that is suitable for recommendation task. Experimental results show that MLCM outperforms the original BERT4Rec by 17% on preference prediction tasks. Additionally, we demonstrate that the model can be easily fine-tuned to assist a specific recommendation task. For instance, after fine-tuning MLCM for an incentive based recommendation project, performance improves by 60% on the conversion prediction task and 25% on the click-through prediction task compared to the production baseline model.
- INDNavigating the Feedback Loop in Recommender Systems: Insights and Strategies from Industry Practice
by Ding Tong (Netflix), Qifeng Qiao (Netflix), Ting-Po Lee (Netflix), James McInerney (Netflix) and Justin Basilico (Netflix).Understanding and measuring the impact of feedback loops in industrial recommender systems is challenging, leading to the underestimation of the deterioration. In this study, we define open and closed feedback loops and investigate the unique reasons behind the emergence of feedback loops in the industry, drawing from real-world examples that have received limited attention in prior research. We highlight the measurement challenges associated with capturing the full impact of feedback loops using traditional online A/B tests. To address this, we propose the use of offline evaluation frameworks as surrogates for long-term feedback loop bias, supported by a practical simulation system using real data. Our findings provide valuable insights for optimizing the performance of recommender systems operating under feedback loop conditions.
- INDNonlinear Bandits Exploration for Recommendations
by Yi Su (Google) and Minmin Chen (Google).The paradigm of framing recommendations as (sequential) decision-making processes has gained significant interest. To achieve long-term user satisfaction, these interactive systems need to strikes a balance between exploitation (recommending high-reward items) and exploration (exploring uncertain regions for potentially better items). Classical bandit algorithms like Upper-Confidence-Bound and Thompson Sampling, and their contextual extensions with linear payoffs have exhibited strong theoretical guarantees and empirical success in managing the exploration-exploitation trade-off. Building efficient exploration-based systems for deep neural network powered real-world, large-scale industrial recommender systems remains under studied. In addition, these systems are often multi-stage, multi-objective and response time sensitive. In this talk, we share our experience in addressing these challenges in building exploration based industrial recommender systems. Specifically, we adopt the Neural Linear Bandit algorithm, which effectively combines the representation power of deep neural networks, with the simplicity of linear bandits to incorporate exploration in DNN based recommender systems. We introduce exploration capability to both the nomination and ranking stage of the industrial recommender system. In the context of the ranking stage, we delve into the extension of this algorithm to accommodate the multi-task setup, enabling exploration in systems with multiple objectives. Moving on to the nomination stage, we will address the development of efficient bandit algorithms tailored to factorized bi-linear models. These algorithms play a crucial role in facilitating maximum inner product search, which is commonly employed in large-scale retrieval systems. We validate our algorithms and present findings from real-world live experiments.
- INDOptimizing Podcast Discovery: Unveiling Amazon Music’s Retrieval and Ranking Framework
by Geetha Aluri (Amazon), Paul Greyson (Amazon) and Joaquin Delgado (Amazon).This work presents the search architecture of Amazon Music, which is a highly efficient system designed to retrieve relevant content for users. The architecture consists of three key stages: indexing, retrieval, and ranking. During the indexing stage, data is meticulously parsed and processed to create a comprehensive index that contains dense representations and essential information about each document (such as a music or podcast entity) in the collection, including its title, metadata, and relevant attributes. This indexing process enables fast and efficient data access during retrieval. The retrieval stage utilizes multi-faceted retrieval strategies, resulting in improved identification of candidate matches compared to traditional structured search methods. Subsequently, candidates are ranked based on their relevance to the customer’s query, taking into account document features and personalized factors. With a specific focus on the podcast use case, this paper highlights the deployment of the architecture and demonstrates its effectiveness in enhancing podcast search capabilities, providing tailored and engaging content experiences.
- INDPersonalised Recommendations for the BBC iPlayer: Initial approach and current challenges
by Benjamin R. Clark (British Broadcasting Corporation), Kristine Grivcova (British Broadcasting Corporation), Polina Proutskova (British Broadcasting Corporation) and Duncan M. Walker (British Broadcasting Corporation).BBC iPlayer is one of the most important digital products of the BBC, offering live and on-demand television for audiences in the UK with over 10 million weekly active users. The BBC’s role as a public service broadcaster, broadcasting over traditional linear channels as well as online presents a number of challenges for a recommender system. In addition to having substantially different objectives to a commercial service, we show that the diverse content offered by the BBC including news and sport, factual, drama and live events lead to a catalogue with a diversity of consumption patterns, depending on genre. Our research shows that simple models represent strong baselines in this system. We discuss our initial attempts to improve upon these baselines, and conclude with our current challenges.
- INDRecQR: Using Recommendation Systems for Query Reformulation to correct unseen errors in spoken dialog systems
by Manik Bhandari (Amazon.com), Mingxian Wang (Amazon), Oleg Poliannikov (Amazon) and Kanna Shimizu (Amazon).As spoken dialog systems like Siri, Alexa and Google Assistant become widespread, it becomes apparent that relying solely on global, one-size-fits-all models of Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Entity Resolution (ER), is inadequate for delivering a friction-less customer experience. To address this issue, Query Reformulation (QR) has emerged as a crucial technique for personalizing these systems and reducing customer friction. However, existing QR models, trained on personal rephrases in history face a critical drawback – they are unable to reformulate unseen queries to unseen targets. To alleviate this, we present RecQR, a novel system based on collaborative filters, designed to reformulate unseen defective requests to target requests that a customer may never have requested for in the past. RecQR anticipates a customer’s future requests and rewrites them using state of the art, large-scale, collaborative filtering and query reformulation models. Based on experiments we find that it reduces errors by nearly 40% (relative) on the reformulated utterances.
- INDReward innovation for long-term member satisfaction
by Gary Tang (Netflix), Jiangwei Pan (Netflix), Henry Wang (Netflix) and Justin Basilico (Netflix).Many large-scale recommender systems train on engagements because of their data abundance, immediacy of feedback, and correlation to user preferences. At Netflix and many digital products, engagement is an imperfect proxy to the overall goal of long-term user satisfaction. One way we address this misalignment is via reward innovation. In this paper, we provide a high-level description of the problem and motivate our approach. Finally, we present some practical insights into this track of work including challenges, lessons learned, and systems we’ve built to support the effort.
- INDScaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions
by Timo Wilm (OTTO (GmbH & Co KG)), Philipp Normann (OTTO (GmbH & Co KG)), Sophie Baumeister (OTTO (GmbH & Co KG)) and Paul-Vincent Kobow (OTTO (GmbH & Co KG)).This work introduces TRON, a scalable session-based Transformer Recommender using Optimized Negative-sampling. Motivated by the scalability and performance limitations of prevailing models such as SASRec and GRU4Rec+, TRON integrates top-k negative sampling and listwise loss functions to enhance its recommendation accuracy. Evaluations on relevant large-scale e-commerce datasets show that TRON improves upon the recommendation quality of current methods while maintaining training speeds similar to SASRec. A live A/B test yielded an 18.14% increase in click-through rate over SASRec, highlighting the potential of TRON in practical settings. For further research, we provide access to our source code and an anonymized dataset.
- INDStation and Track Attribute-Aware Music Personalization
by M. Jeffrey Mei (SiriusXM Radio Inc.), Oliver Bembom (SiriusXM Radio Inc.) and Andreas Ehmann (SiriusXM Radio Inc.).We present a transformer for music personalization that recommends tracks given a station seed (artist) and improves the accuracy vs. a baseline matrix factorization method by 10%. Adding more embeddings to capture track and station attributes further improves the accuracy of our recommendations, and also improves recommendation diversity, i.e. mitigates popularity bias. We analyze the learned embeddings and find they learn both explicit attributes provided at training and implicit attributes that may inform listener preferences. We also find that unlike matrix factorization, our model can identify and transfer relevant listener preferences across different genres and artists.
- INDTowards Companion Recommenders Assisting Users’ Long-Term Journeys
by Konstantina Christakopoulou (Google) and Minmin Chen (Google).Nowadays, with the abundance of the internet content, users expect the recommendation platforms to not only help them with one-off decisions and short-term tasks, but to also support their persistent and overarching interest journeys, including their real-life goals that last days, months or even years. In order for recommender systems to truly assist users through their real-life journeys, they need to first be able to understand and reason about interests, needs, and goals users want to pursue; and then plan taking those into account. However, the task presents several challenges. In this talk, we will present the key steps and elements needed to tackle the problem — particularly (1) user research for interest journeys; (2) personalized and interpretable user profiles; (3) adapting large language models, and other foundational models, for better user understanding; (4) better planning at a macro-level through reinforcement learning and reason-and-act conversational agents; (5) novel journey-powered front end user experiences, allowing for more user control. We hope that the talk will help inspire other researchers, and will pave the way towards companion recommenders that can truly assist the users throughout their interest journeys.
- INDTrack Mix Generation on Music Streaming Services using Transformers
by Walid Bendada (Deezer Research), Théo Bontempelli (Deezer Research), Mathieu Morlon (Deezer Research), Benjamin Chapus (Deezer Research), Thibault Cador (Deezer Research), Thomas Bouabça (Deezer Research) and Guillaume Salha-Galvan (Deezer Research).This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates “mix” playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.
- INDTransparently Serving the Public: Enhancing Public Service Media Values through Exploration
by Andreas Grün (ZDF) and Xenija Neufeld (Accso – Accelerated Solutions GmbH).In the last few years, we have reportedly underlined the importance of the Public Service Media Remit for ZDF as a Public Service Media provider. Offering fair, diverse, and useful recommendations to users is just as important for us as being transparent about our understanding of these values, the metrics that we are using to evaluate their extent, and the algorithms in our system that produce such recommendations. This year, we have made a major step towards transparency of our algorithms and metrics describing them for a broader audience, offering the possibility for the audience to learn details about our systems and to provide direct feedback to us. Having the possibility to measure and track PSM metrics, we have started to improve our algorithms towards PSM values. In this work, we describe these steps and the results of actively debasing and adding exploration into our recommendations to achieve more fairness.
- INDUnleash the Power of Context: Enhancing Large-Scale Recommender Systems with Context-Based Prediction Models
by Jan Hartman (Outbrain), Assaf Klein (Outbrain), Davorin Kopič (Outbrain) and Natalia Silberstein (Outbrain).In this work, we introduce the notion of Context-Based Prediction Models. A Context-Based Prediction Model determines the probability of a user’s action (such as a click or a conversion) solely by relying on user and contextual features, without considering any specific features of the item itself. We have identified numerous valuable applications for this modeling approach, including training an auxiliary context-based model to estimate click probability and incorporating its prediction as a feature in CTR prediction models.Our experiments indicate that this enhancement brings significant improvements in offline and online business metrics while having minimal impact on the cost of serving. Overall, our work offers a simple and scalable, yet powerful approach for enhancing the performance of large-scale commercial recommender systems, with broad implications for the field of personalized recommendations.
- INDVisual Representation for Capturing Creator Theme in Brand-Creator Marketplace
by Asnat Greenstein-Messica (Lightricks), Keren Gaiger (Lightricks), Sarel Duanis (Lightricks), Ravid Cohen (Lightricks) and Shaked Zychlinski (Lightricks).Providing cold start recommendations in a brand-creator marketplace is challenging as brands’ preferences extend beyond the mere objects depicted in the creator’s content and encompass the creator’s individual theme consistently resonates across images shared on her social media profile. Furthermore, brands often use textual keywords to describe their campaign’s aesthetic appeal, with which creators must align. To address these challenges, we propose two methods: SAME (Same Account Media Embedding), a novel creator representation employing a Siamese network to capture the unique creator theme and OAAR (Object-Agnostic Adjective Representation), enabling filtering creators based on textual adjectives that relate to aesthetic qualities through zero-shot learning. These two methods utilize CLIP, a state-of-the-art language-image model, and improve it in addressing the aforementioned challenges.