RecSys 2023 — Posters Day 2 - RecSys

Posters Day 2

Date: Thursday September 21
Room: Hall 405

RESIncorporating Time in Sequential Recommendation Models
by Mostafa Rahmani (Amazon), James Caverlee (Amazon) and Fei Wang (Amazon).

Sequential models are designed to learn sequential patterns in data based on the chronological order of user interactions. However, they often ignore the timestamps of these interactions. Incorporating time is crucial because many sequential patterns are time-dependent, and the model cannot make time-aware recommendations without considering time. This article demonstrates that providing a rich representation of time can significantly improve the performance of sequential models. The existing literature treats time as a one-dimensional time-series obtained by quantizing time. In this study, we propose treating time as a multi-dimensional time-series and explore representation learning methods, including a kernel based method and an embedding-based algorithm. Experiments on multiple datasets show that the inclusion of time significantly enhances the model’s performance, and multi-dimensional methods outperform the one-dimensional method by a substantial margin.

Full text in ACM Digital Library
RESEnhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential Recommendation
by Vivian Lai (Visa Research), Huiyuan Chen (Visa Research), Chin-Chia Michael Yeh (Visa Research), Minghua Xu (Visa Research), Yiwei Cai (Visa Research) and Hao Yang (Visa Research).

Transformers have become the favored model for sequential recommendation. However, previous studies rely on extensive data, such as massive pre-training or repeated data augmentation, leading to optimization-related problems, such as initialization sensitivity and large batch-size memory bottleneck. In this work, we examine Transformers’ loss geometry to improve the models’ data efficiency during training and generalization. By utilizing a newly introduced sharpness-aware optimizer to promote smoothness, we significantly enhance SASRec’s accuracy and robustness, a Transformer model, on various datasets. When trained on sequential data without significant pre-training or data augmentation, the resulting SASRec outperforms S3Rec and CL4Rec, both of which are of comparable size and throughput.

Full text in ACM Digital Library
RESAdaptive Collaborative Filtering with Personalized Time Decay Functions for Financial Product Recommendation
by Ashraf Ghiye (École Polytechnique), Baptiste Barreau (BNP Paribas CIB – Global Markets), Laurent Carlier (BNP Paribas CIB – Global Markets) and Michalis Vazirgiannis (École Polytechnique).

Classical recommender systems often assume that historical data are stationary and fail to account for the dynamic nature of user preferences, limiting their ability to provide reliable recommendations in time-sensitive settings. This assumption is particularly problematic in finance, where financial products exhibit continuous changes in valuations, leading to frequent shifts in client interests. These evolving interests, summarized in the past client-product interactions, see their utility fade over time with a degree that might differ from one client to another. To address this challenge, we propose a time-dependent collaborative filtering algorithm that can adaptively discount distant client-product interactions using personalized decay functions. Our approach is designed to handle the non-stationarity of financial data and produce reliable recommendations by modeling the dynamic collaborative signals between clients and products. We evaluate our method using a proprietary dataset from BNP Paribas and demonstrate significant improvements over state-of-the-art benchmarks from relevant literature. Our findings emphasize the importance of incorporating time explicitly in the model to enhance the accuracy of financial product recommendation.

Full text in ACM Digital Library
RESPrivate Matrix Factorization with Public Item Features
by Mihaela Curmei (University of California, Berkeley), Walid Krichene (Google Research) and Li Zhang (Google Research).

We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization, that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information.

The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback.

Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between the private models and their non-private counterpart. As privacy constraints become more stringent, the increased reliance on public side features leads to recommendations becoming more depersonalized, resulting in a smooth transition from collaborative filtering to item-based contextual recommendations.

Full text in ACM Digital Library
RESDeliberative Diversity for News Recommendations: Operationalization and Experimental User Study
by Lucien Heitz (University of Zurich), Juliane A. Lischka (University of Hamburg), Rana Abdullah (University of Hamburg), Laura Laugwitz (University of Hamburg), Hendrik Meyer (University of Hamburg) and Abraham Bernstein (University of Zurich).

News recommender systems are an increasingly popular field of study that attracts a growing, interdisciplinary research community. As these systems play an important role in our daily lives, the mechanisms behind their curation processes are under close scrutiny. In the domain of personalized news, many platforms make design choices that are driven by economic incentives. In contrast to such systems that optimize for financial gains, there exists norm-driven diversity objectives, putting normative and democratic goals first. Their impact on users, however, in terms of triggering behavioral changes or affecting knowledgeability, is still under-researched. In this paper, we contribute to the field of news recommender system design by conducting a user study that looks at the impact of these normative approaches. We a.) operationalize the notion of deliberative democracy for news recommendations, show b.) the impact on political knowledgeability and c.) the influence on voting behavior. We found that exposure to small parties is associated with an increase in knowledge about their candidates and that intensive news consumption about a party can change the direction of attitudes towards their issues.

Full text in ACM Digital Library
RESCo-occurrence Embedding Enhancement for Long-tail Problem in Multi-Interest Recommendation
by Yaokun Liu (Tianjin University), Xiaowang Zhang (Tianjin University), Minghui Zou (Tianjin University) and Zhiyong Feng (Tianjin University).

Multi-interest recommendation methods extract multiple interest vectors to represent the user comprehensively. Despite their success in the matching stage, previous works overlook the long-tail problem. This results in the model excelling at suggesting head items, while the performance for tail items, which make up more than 70% of all items, remains suboptimal. Hence, enhancing the tail item recommendation capability holds great potential for improving the performance of the multi-interest model.

Through experimental analysis, we reveal that the insufficient context for embedding learning is the reason behind the underperformance of tail items. Meanwhile, we face two challenges in addressing this issue: the absence of supplementary item features and the need to maintain head item performance. To tackle these challenges, we propose a CoLT module (Co-occurrence embedding enhancement for Long-Tail problem) that replaces the embedding layer of existing multi-interest frameworks. By linking co-occurring items to establish “assistance relationships”, CoLT aggregates information from relevant head items into tail item embeddings and enables joint gradient updates. Experiments on three datasets show our method outperforms SOTA models by 21.86% Recall@50 and improves the Recall@50 of tail items by 14.62% on average.

Full text in ACM Digital Library
RESExtended conversion: Capturing successful interactions in voice shopping
by Elad Haramaty (Amazon), Zohar Karnin (Amazon), Arnon Lazerson (Amazon), Liane Lewin-Eytan (Amazon Research) and Yoelle Maarek (Amazon).

Being able to measure the success of online shopping interactions is crucial in order to evaluate and optimize the performance of e-commerce systems. We consider the domain of voice shopping, supported by digital voice-based assistants, where measuring successful interactions poses a challenge. Unlike Web shopping, which offers a rich amount of behavioral signals such as clicks, in voice shopping a non-negligible amount of shopping interactions frequently end without any immediate explicit or implicit user behavioral signal. Moreover, users may start their journey using voice, but finish elsewhere, for example by using their mobile app or Web. We explore the challenge of measuring successful interactions in voice product search based on users’ feedback, and propose a medium-term reward metric named Extended ConVersion (ECVR). ECVR extends the notion of conversion (purchase action), which is a clear and natural indication of success for an e-commerce system. The strength of this new metric, is that it does not only capture immediate conversion, but also a conversion that is part of the same user shopping journey, but is performed at a later stage, possibly using a different medium. We provide multiple ways of evaluating the quality of a metric, and use these to explore different parameters leading to different variants of ECVR. After finalizing these parameters, we show that a ranking system optimized for the proposed ECVR leads to an improvement in long-term engagement and revenue, without compromising immediate gains.

Full text in ACM Digital Library
RESOn the Consistency of Average Embeddings for Item Recommendation
by Walid Bendada (Deezer Research & LAMSADE, Université Paris Dauphine – PSL), Guillaume Salha-Galvan (Deezer Research), Romain Hennequin (Deezer Research), Thomas Bouabça (Deezer Research) and Tristan Cazenave (LAMSADE Université Paris Dauphine PSL – CNRS).

A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.

Full text in ACM Digital Library
RESIntegrating the ACT-R Framework with Collaborative Filtering for Explainable Sequential Music Recommendation
by Marta Moscati (Johannes Kepler University Linz), Christian Wallmann (Johannes Kepler University Linz), Markus Reiter-Haas (Graz University of Technology), Dominik Kowald (Know-Center GmbH and Graz University of Technology), Elisabeth Lex (Graz University of Technology) and Markus Schedl (Johannes Kepler University Linz).

Music listening sessions often consist of sequences including repeating tracks. Modeling such relistening behavior with models of human memory has been proven effective in predicting the next track of a session. However, these models intrinsically lack the capability of recommending novel tracks that the target user has not listened to in the past. Collaborative filtering strategies, on the contrary, provide novel recommendations by leveraging past collective behaviors but are often limited in their ability to provide explanations. To narrow this gap, we propose four hybrid algorithms that integrate collaborative filtering with the cognitive architecture ACT-R. We compare their performance in terms of accuracy, novelty, diversity, and popularity bias, to baselines of different types, including pure ACT-R, kNN-based, and neural-networks-based approaches. We show that the proposed algorithms are able to achieve the best performances in terms of novelty and diversity, and simultaneously achieve a higher accuracy of recommendation with respect to pure ACT-R models. Furthermore, we illustrate how the proposed models can provide explainable recommendations.

Full text in ACM Digital Library
RESWidespread flaws in offline evaluation of recommender systems
by Balázs Hidasi (Gravity R&D, a Taboola company) and Ádám Tibor Czapp (Gravity R&D, a Taboola company).

Even though offline evaluation is just an imperfect proxy of online performance — due to the interactive nature of recommenders — it will probably remain the primary way of evaluation in recommender systems research for the foreseeable future, since the proprietary nature of production recommenders prevents independent validation of A/B test setups and verification of online results. Therefore, it is imperative that offline evaluation setups are as realistic and as flawless as they can be. Unfortunately, evaluation flaws are quite common in recommenders systems research nowadays, due to later works copying flawed evaluation setups from their predecessors without questioning their validity. In the hope of improving the quality of offline evaluation of recommender systems, we discuss four of these widespread flaws and why researchers should avoid them.

Full text in ACM Digital Library
RESTowards Sustainability-aware Recommender Systems: Analyzing the Trade-off Between Algorithms Performance and Carbon Footprint
by Giuseppe Spillo (University of Bari), Allegra De Filippo (University of Bologna), Cataldo Musto (Dipartimento di Informatica – University of Bari), Michela Milano (University of Bologna) and Giovanni Semeraro (University of Bari).

In this paper, we present a comparative analysis of the trade-off between the performance of state-of-the-art recommendation algorithms and their sustainability. In particular, we compared 18 popular recommendation algorithms in terms of both standard metrics (i.e., accuracy and diversity of the recommendations) as well as in terms of energy consumption and carbon footprint on three different datasets. In order to obtain a fair comparison, all the algorithms were run based on the implementations available in a popular recommendation library, i.e., RecBole, and used the same experimental settings. The outcomes of the experiments clearly showed that the choice of the optimal recommendation algorithm requires a thorough analysis, since more sophisticated algorithms often led to tiny improvements at the cost of an exponential increase of carbon emissions. Through this paper, we aim to shed light on the problem of carbon footprint and energy consumption of recommender systems, and we make the first step towards the development of sustainability-aware recommendation algorithms.

Full text in ACM Digital Library
RESGroup Fairness for Content Creators: the Role of Human and Algorithmic Biases under Popularity-based Recommendations
by Stefania Ionescu (University of Zurich), Aniko Hannak (University of Zurich) and Nicolo Pagan (UZH).

The Creator Economy faces concerning levels of unfairness. Content creators (CCs) publicly accuse platforms of purposefully reducing the visibility of their content based on protected attributes, while platforms place the blame on viewer biases. Meanwhile, prior work warns about the “rich-get-richer” effect perpetuated by existing popularity biases in recommender systems: Any initial advantage in visibility will likely be exacerbated over time. What remains unclear is how the biases based on protected attributes from platforms and viewers interact and contribute to the observed inequality in the context of popularity-biased recommender systems. The difficulty of the question lies in the complexity and opacity of the system. To overcome this challenge, we create a simple agent-based model (ABM) that unifies the platform systems which allocate the visibility of CCs (e.g., recommender systems, moderation) into a single popularity-based function, which we call the visibility allocation system (VAS). Through simulations, we find that although viewer homophilic biases do alone create inequalities, small levels of additional biases in VAS are more harmful. From the perspective of interventions, our results suggest that (a) attempts to reduce attribute-biases in moderation and recommendations should precede those reducing viewer homophilic tendencies, (b) decreasing the popularity-biases in VAS decreases but not eliminates inequalities, (c) boosting the visibility of protected CCs to overcome viewer homophily with respect to one metric is unlikely to produce fair outcomes with respect to all metrics, and (d) the process is also unfair for viewers and this unfairness could be overcome through the same interventions. More generally, this work demonstrates the potential of using ABMs to better understand the causes and effects of biases and interventions within complex sociotechnical systems.

Full text in ACM Digital Library
RESProviding Previously Unseen Users Fair Recommendations Using Variational Autoencoders
by Bjørnar Vassøy (Norwegian University of Science and Technology (NTNU)), Helge Langseth (Norwegian University of Science and Technology (NTNU)) and Benjamin Kille (Norwegian University of Science and Technology (NTNU)).

An emerging definition of fairness in machine learning requires that models are oblivious to demographic user information, e.g., a user’s gender or age should not influence the model. Personalized recommender systems are particularly prone to violating this definition through their explicit user focus and user modelling. Explicit user modelling is also an aspect that makes many recommender systems incapable of providing hitherto unseen users with recommendations. We propose novel approaches for mitigating discrimination in Variational Autoencoder-based recommender systems by limiting the encoding of demographic information. The approaches are capable of, and evaluated on, providing entirely new users with fair recommendations.

Full text in ACM Digital Library
RESScalable Deep Q-Learning for Session-Based Slate Recommendation
by Aayush Singha Roy (Insight Centre for Data Analytics, University College Dublin), Edoardo D’Amico (Insight Centre for Data Analytics, University College Dublin), Elias Tragos (Insight Centre for Data Analytics, University College Dublin), Aonghus Lawlor (Insight Centre for Data Analytics, University College Dublin) and Neil Hurley (Insight Centre for Data Analytics, University College Dublin).

Reinforcement learning (RL) has demonstrated great potential to improve slate-based recommender systems by optimizing recommendations for long-term user engagement. To handle the combinatorial action space in slate recommendation, recent works decompose the Q-value of a slate into item-wise Q-values, using an item-wise value-based policy. However, the common case where the value function is a parameterized function taking state and action as input results in a linearly increasing number of evaluations required to select an action, proportional to the number of candidate items. While slow training may be acceptable, this becomes intractable when considering the costly evaluation of the parameterized function, such as with deep neural networks, during model serving time. To address this issue, we propose an actor-based policy that reduces the evaluation of the Q-function to a subset of items, significantly reducing inference time and enabling practical deployment in real-world industrial settings. In our empirical evaluation, we demonstrate that our proposed approach achieves equivalent user session engagement to a value-based policy, while significantly reducing the slate serving time by at least 4 times.

Full text in ACM Digital Library
RESCR-SoRec: BERT driven Consistency Regularization for Social Recommendation
by Tushar Prakash (Sony Research India), Raksha Jalan (Sony Research india), Brijraj Singh (Sony Research india) and Naoyuki Onoe (Sony).

In the real world, when we seek our friends’ opinions on various items or events, we request verbal social recommendations. It has been observed that we often turn to our friends for recommendations on a daily basis. The emergence of online social platforms has enabled users to share their opinion with their social connections. Therefore, we should consider users’ social connections to enhance online recommendation performance. The social recommendation aims to fuse social links with user-item interactions to offer more relevant recommendations. Several efforts have been made to develop an effective social recommendation system. However, there are two significant limitations to current methods: First, they haven’t thoroughly explored the intricate relationships between the diverse influences of neighbours on users’ preferences. Second, existing models are vulnerable to overfitting due to the relatively low number of user-item interaction records in the interaction space. For the aforementioned problems, this paper offers a novel framework called CR-SoRec, an effective recommendation model based on BERT and Consistency Regularization. This model incorporates Bidirectional Encoder Representations from Transformer(BERT) to learn bidirectional context-aware user and item embeddings with neighbourhood sampling. The neighbourhood Sampling technique samples the most influential neighbours for all the users/ items. Further, to effectively use the available user-item interaction data and social ties, we leverage diverse perspectives via consistency regularisation to harness the underlying information. The main objective of our model is to predict the next item that a user would interact with based on its interaction behaviour and social connections. Experimental results show that our model defines a new state-of-the-art on various datasets and outperforms previous work by a significant margin. Extensive experiments are conducted to analyse the method. We release the source code of our model at https://anonymous.4open.science/r/CR-SoRec-68F4

Full text in ACM Digital Library
RESLarge Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences
by Scott Sanner (Google), Krisztian Balog (Google), Filip Radlinski (Google), Ben Wedin (Google) and Lucas Dixon (Google).

Traditional recommender systems leverage users’ item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.

Full text in ACM Digital Library
LBRTurning Dross Into Gold Loss: is BERT4Rec really better than SASRec?
by Anton Klenitskiy (Sber, AI Lab) and Alexey Vasilev (Sber, AI Lab).

Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.

Full text in ACM Digital Library
LBRUncovering ChatGPT’s Capabilities in Recommender Systems
by Sunhao Dai (Renmin University of China), Ninglu Shao (Renmin University of China), Haiyuan Zhao (Renmin University of China), Weijie Yu (University of International Business and Economics), Zihua Si (Renmin University of China), Chen Xu (Renmin University of China), Zhongxiang Sun (Renmin University of China), Xiao Zhang (Renmin University of China) and Jun Xu (Renmin University of China).

The debut of ChatGPT has recently attracted significant attention from the natural language processing (NLP) community and beyond. Existing studies have demonstrated that ChatGPT shows significant improvement in a range of downstream NLP tasks, but the capabilities and limitations of ChatGPT in terms of recommendations remain unclear. In this study, we aim to enhance ChatGPT’s recommendation capabilities by aligning it with traditional information retrieval (IR) ranking capabilities, including point-wise, pair-wise, and list-wise ranking. To achieve this goal, we re-formulate the aforementioned three recommendation policies into prompt formats tailored specifically to the domain at hand. Through extensive experiments on four datasets from different domains, we analyze the distinctions among the three recommendation policies. Our findings indicate that ChatGPT achieves an optimal balance between cost and performance when equipped with list-wise ranking. This research sheds light on a promising direction for aligning ChatGPT with recommendation tasks. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at \url{https://anonymous.4open.science/r/LLM4RS-532C/}.

Full text in ACM Digital Library
LBRContinual Collaborative Filtering Through Gradient Alignment
by Hieu Do (Singapore Management University) and Hady Lauw (Singapore Management University).

A recommender system operates in a dynamic environment where new items emerge and new users join the system, resulting in ever-growing user-item interactions over time. Existing works either assume a model trained offline on a static dataset (requiring periodic re-training with ever larger datasets); or an online learning setup that favors recency over history. As privacy-aware users could hide their histories, the loss of older information means that periodic retraining may not always be feasible, while online learning may lose sight of users’ long-term preferences. In this work, we adopt a continual learning perspective to collaborative filtering, by compartmentalizing users and items over time into a notion of tasks. Of particular concern is to mitigate catastrophic forgetting that occurs when the model would reduce in performance for older users and items in prior tasks even as it tries to fit the newer users and items in the current task. To alleviate this, we propose a method that leverages gradient alignment to deliver a model that is more compatible across tasks and maximizes user agreement for better user representations to improve long-term recommendations.

Full text in ACM Digital Library
LBRBroadening the Scope: Evaluating the Potential of Recommender Systems beyond prioritizing Accuracy
by Vincenzo Paparella (Politecnico di Bari), Dario Di Palma (Politecnico di Bari), Vito Walter Anelli (Politecnico di Bari) and Tommaso Di Noia (Politecnico di Bari).

Although beyond-accuracy metrics have gained attention in the last decade, the accuracy of recommendations is still considered the gold standard to evaluate Recommender Systems (RSs). This approach prioritizes the accuracy of recommendations, neglecting the quality of suggestions to enhance user needs, such as diversity and novelty, as well as trustworthiness regulations in RSs for user and provider fairness. As a result, single metrics determine the success of RSs, but this approach fails to consider other criteria simultaneously. A downside of this method is that the most accurate model configuration may not excel in addressing the remaining criteria. This study seeks to broaden RS evaluation by introducing a multi-objective evaluation that considers all model configurations simultaneously under several perspectives. To achieve this, several hyper-parameter configurations of an RS model are trained, and the Pareto-optimal ones are retrieved. The Quality Indicators (QI) of Pareto frontiers, which are gaining interest in Multi-Objective Optimization research, are adapted to RSs. QI enables evaluating the model’s performance by considering various configurations and giving the same importance to each metric. The experiments show that this multi-objective evaluation overturns the ranking of performance among RSs, paving the way to revisit the evaluation approaches of the RecSys research community. We release codes and datasets in the following GitHub repository: https://anonymous.4open.science/r/RecMOE-3ED3.

Full text in ACM Digital Library
LBRAnalyzing Accuracy versus Diversity in a Health Recommender System for Physical Activities: a Longitudinal User Study
by Ine Coppens (WAVES – imec – Ghent University), Luc Martens (WAVES – imec – Ghent University) and Toon De Pessemier (WAVES – imec – Ghent University).

As personalization has great potential to improve mobile health apps, analyzing the effect of different recommender algorithms in the health domain is still in its infancy. As such, this paper investigates whether more accurate recommendations from a content-based recommender or more diverse recommendations from a user-based collaborative filtering recommender will lead to more motivation to move. An eight-week longitudinal between-subject user study is being conducted with an Android app in which participants receive personalized recommendations for physical activities and tips to reduce sedentary behavior. The objective manipulation check confirmed that the group with collaborative filtering received significantly more diverse recommendations. The subjective manipulation check showed that the content-based group assigned more positive feedback for perceived accuracy and star rating to the recommendations they chose and executed. However, perceived diversity and inspiringness was significantly higher in the content-based group, suggesting that users might experience the recommendations differently. Lastly, momentary motivation for the executed activities and tips was significantly higher in the content-based group. As such, the preliminary results of this longitudinal study suggest that more accurate and less diverse recommendations have better effects on motivating users to move more.

Full text in ACM Digital Library
LBROn the Consistency, Discriminative Power and Robustness of Sampled Metrics in Offline Top-N Recommender System Evaluation
by Yang Liu (University of Helsinki), Alan Medlar (University of Helsinki) and Dorota Glowacka (University of Helsinki).

Negative item sampling in offline top-n recommendation evaluation has become increasingly wide-spread, but remains controversial. While several studies have warned against using sampled evaluation metrics on the basis of being a poor approximation of the full ranking (i.e.~using all negative items), others have highlighted their improved discriminative power and potential to make evaluation more robust. Unfortunately, empirical studies on negative item sampling are based on relatively few methods (between 3-12) and, therefore, lack the statistical power to assess the impact of negative item sampling in practice.

In this article, we present preliminary findings from a comprehensive benchmarking study of negative item sampling based on 52 recommendation algorithms and 3 benchmark data sets. We show how the number of sampled negative items and different sampling strategies affect the consistency and discriminative power of sampled evaluation metrics. Furthermore, we investigate the impact of sparsity bias and popularity bias on the robustness of these metrics. In brief, we show that the optimal parameterizations for negative item sampling are dependent on data set characteristics and the goals of the investigator, suggesting a need for greater transparency in related experimental design decisions.

Full text in ACM Digital Library
DEMLLM Based Generation of Item-Description for Recommendation System
by Arkadeep Acharya (Sony Research India), Brijraj Singh (Sony Research India) and Naoyuki Onoe (Sony Research India).

The description of an item plays a pivotal role in providing concise and informative summaries to captivate potential viewers and is essential for recommendation systems. Traditionally, such descriptions were obtained through manual web scraping techniques, which are time-consuming and susceptible to data inconsistencies. In recent years, Large Language Models (LLMs), such as GPT-3.5, and open source LLMs like Alpaca have emerged as powerful tools for natural language processing tasks. In this paper, we have explored how we can use LLMs to generate detailed descriptions of the items. To conduct the study, we have used the MovieLens 1M dataset comprising movie titles and the Goodreads Dataset consisting of names of books and subsequently, an open-sourced LLM, Alpaca, was prompted with few-shot prompting on this dataset to generate detailed movie descriptions considering multiple features like the names of the cast and directors for the ML dataset and the names of the author and publisher for the Goodreads dataset. The generated description was then compared with the scraped descriptions using a combination of Top Hits, MRR, and NDCG as evaluation metrics. The results demonstrated that LLM-based movie description generation exhibits significant promise, with results comparable to the ones obtained by web-scraped descriptions.

Full text in ACM Digital Library
DEMIntroducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit
by Tobias Vente (University of Siegen), Michael Ekstrand (Boise State University) and Joeran Beel (University of Siegen).

LensKit is one of the first and most popular Recommender System Libraries. While LensKit offers a wide variety of features, it does not include any optimization strategies or guidelines on how to select and tune LensKit algorithms. LensKit developers have to manually include third-party libraries into their experimental setup or implement optimization strategies by hand to optimize hyperparameters. We found that 65.5% (19 out of 29) of papers using LensKit algorithms for their experiments did not select algorithms or tune hyperparameters. Non-optimized models represent poor baselines and produce less meaningful research results. This demo introduces LensKit-Auto. LensKit-Auto automates the entire Recommender System pipeline and enables LensKit developers to automatically select, optimize, and ensemble LensKit algorithms.

Full text in ACM Digital Library
INDStation and Track Attribute-Aware Music Personalization
by M. Jeffrey Mei (SiriusXM Radio Inc.), Oliver Bembom (SiriusXM Radio Inc.) and Andreas Ehmann (SiriusXM Radio Inc.).

We present a transformer for music personalization that recommends tracks given a station seed (artist) and improves the accuracy vs. a baseline matrix factorization method by 10%. Adding more embeddings to capture track and station attributes further improves the accuracy of our recommendations, and also improves recommendation diversity, i.e. mitigates popularity bias. We analyze the learned embeddings and find they learn both explicit attributes provided at training and implicit attributes that may inform listener preferences. We also find that unlike matrix factorization, our model can identify and transfer relevant listener preferences across different genres and artists.

Full text in ACM Digital Library
INDOptimizing Podcast Discovery: Unveiling Amazon Music’s Retrieval and Ranking Framework
by Geetha Aluri (Amazon), Paul Greyson (Amazon) and Joaquin Delgado (Amazon).

This work presents the search architecture of Amazon Music, which is a highly efficient system designed to retrieve relevant content for users. The architecture consists of three key stages: indexing, retrieval, and ranking. During the indexing stage, data is meticulously parsed and processed to create a comprehensive index that contains dense representations and essential information about each document (such as a music or podcast entity) in the collection, including its title, metadata, and relevant attributes. This indexing process enables fast and efficient data access during retrieval. The retrieval stage utilizes multi-faceted retrieval strategies, resulting in improved identification of candidate matches compared to traditional structured search methods. Subsequently, candidates are ranked based on their relevance to the customer’s query, taking into account document features and personalized factors. With a specific focus on the podcast use case, this paper highlights the deployment of the architecture and demonstrates its effectiveness in enhancing podcast search capabilities, providing tailored and engaging content experiences.

Full text in ACM Digital Library
INDTowards Companion Recommenders Assisting Users’ Long-Term Journeys
by Konstantina Christakopoulou (Google) and Minmin Chen (Google).

Nowadays, with the abundance of the internet content, users expect the recommendation platforms to not only help them with one-off decisions and short-term tasks, but to also support their persistent and overarching interest journeys, including their real-life goals that last days, months or even years. In order for recommender systems to truly assist users through their real-life journeys, they need to first be able to understand and reason about interests, needs, and goals users want to pursue; and then plan taking those into account. However, the task presents several challenges. In this talk, we will present the key steps and elements needed to tackle the problem — particularly (1) user research for interest journeys; (2) personalized and interpretable user profiles; (3) adapting large language models, and other foundational models, for better user understanding; (4) better planning at a macro-level through reinforcement learning and reason-and-act conversational agents; (5) novel journey-powered front end user experiences, allowing for more user control. We hope that the talk will help inspire other researchers, and will pave the way towards companion recommenders that can truly assist the users throughout their interest journeys.

Full text in ACM Digital Library
INDDelivery Hero Recommendation Dataset: A Novel Dataset for Benchmarking Recommendation Algorithms
by Yernat Assylbekov (Delivery Hero), Raghav Bali (Delivery Hero), Luke Bovard (Delivery Hero) and Christian Klaue (Delivery Hero).

In this paper, we propose a new dataset, Delivery Hero Recommendation Dataset (DHRD), which provides a diverse real-world dataset for researchers. DHRD comprises over a million food delivery orders from three distinct cities, encompassing thousands of vendors and an extensive range of dishes, serving a combined customer base of over a million individuals. We discuss the challenges associated with such real-world datasets. By releasing DHRD, researchers are empowered with a valuable resource for building and evaluating recommender systems, paving the way for advancements in this domain.

Full text in ACM Digital Library
INDTransparently Serving the Public: Enhancing Public Service Media Values through Exploration
by Andreas Grün (ZDF) and Xenija Neufeld (Accso – Accelerated Solutions GmbH).

In the last few years, we have reportedly underlined the importance of the Public Service Media Remit for ZDF as a Public Service Media provider. Offering fair, diverse, and useful recommendations to users is just as important for us as being transparent about our understanding of these values, the metrics that we are using to evaluate their extent, and the algorithms in our system that produce such recommendations. This year, we have made a major step towards transparency of our algorithms and metrics describing them for a broader audience, offering the possibility for the audience to learn details about our systems and to provide direct feedback to us. Having the possibility to measure and track PSM metrics, we have started to improve our algorithms towards PSM values. In this work, we describe these steps and the results of actively debasing and adding exploration into our recommendations to achieve more fairness.

Full text in ACM Digital Library
INDLearning From Negative User Feedback and Measuring Responsiveness for Sequential Recommenders
by Yueqi Wang (Google), Yoni Halpern (Google), Shuo Chang (Google), Jingchen Feng (Google), Elaine Ya Le (Google), Longfei Li (Google), Xujian Liang (Google), Min-Cheng Huang (Google), Shane Li (Google), Alex Beutel (Google), Yaping Zhang (Google) and Shuchao Bi (Google).

Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user’s positive interests, less attention has been paid to learn from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar recommendations to the user. However, negative feedback signals are often ignored in the training objective of sequential recommenders, which primarily aim at predicting positive user interactions. In this work, we incorporate explicit and implicit negative user feedback into the training objective of sequential recommenders using a “not-to-recommend” loss function that optimizes for the log likelihood of not recommending items with negative feedback. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system. Furthermore, we address a challenge in measuring recommender responsiveness to negative feedback by developing a counterfactual simulation framework to compare recommender responses between different user actions, showing improved responsiveness from the modeling change.

Full text in ACM Digital Library
INDAdaptEx: a self-service contextual bandit platform
by William Black (Expedia Group), Ercument Ilhan (Expedia Group), Andrea Marchini (Expedia Group) and Vilda Markeviciute (Expedia Group).

This paper presents AdaptEx, a self-service contextual bandit platform widely used at Expedia Group, that leverages multi-armed bandit algorithms to personalize user experiences at scale. AdaptEx considers the unique context of each visitor to select the optimal variants and learns quickly from every interaction they make. It offers a powerful solution to improve user experiences while minimizing the costs and time associated with traditional testing methods. The platform unlocks the ability to iterate towards optimal product solutions quickly, even in ever-changing content and continuous “cold start” situations gracefully.

Full text in ACM Digital Library
INDIdentifying Controversial Pairs in Item-to-Item Recommendations
by Junyi Shen (Apple), Dayvid Rodrigues de Oliveira (Apple), Jin Cao (Apple), Brian Knott (Apple), Goodman Gu (Apple), Sindhu Vijaya Raghavan (Apple) and Rob Monarch (Apple).

Recommendation systems in large-scale online marketplaces are essential to aiding users in discovering new content. However, state-of-the-art systems for item-to-item recommendation tasks are often based on a shallow level of contextual relevance, which can make the system insufficient for tasks where item relationships are more nuanced. Contextually relevant item pairs can sometimes have controversial or problematic relationships, and they could degrade user experiences and brand perception when recommended to users. For example, a recommendation of a divorce and co-parenting book can create a disturbing experience for someone who is downloading or viewing a marriage therapy book. In this paper, we propose a classifier to identify and prevent such problematic item-to-item recommendations and to enhance overall user experiences. The proposed approach utilizes active learning to sample hard examples effectively across sensitive item categories and uses human raters for data labeling. We also perform offline experiments to demonstrate the efficacy of this system for identifying and filtering controversial recommendations while maintaining recommendation quality.

Full text in ACM Digital Library
INDInvestigating the effects of incremental training on neural ranking models
by Benedikt Schifferer (NVIDIA), Wenzhe Shi (ShareChat), Gabriel de Souza Pereira Moreira (NVIDIA), Even Oldridge (NVIDIA), Chris Deotte (NVIDIA), Gilberto Titericz (NVIDIA), Kazuki Onodera (NVIDIA), Praveen Dhinwa (ShareChat), Vishal Agrawal (ShareChat) and Chris Green (ShareChat).

Recommender systems are an essential component of online systems, providing users with a personalized experience. Some recommendation scenarios such as social networks or news are very dynamic, with new items added continuously and the interest of users changing over time due to breaking news or popular events. Incremental training is a popular technique to keep recommender models up-to-date in those dynamic platforms. In this paper, we provide an empirical analysis of a large industry dataset from the Sharechat app MOJ, a social media platform for short videos, to answer relevant questions like – how often should I retrain the model? – do different models, features and dataset sizes benefit from incremental training? – Do all users and items benefit the same from incremental training?

Full text in ACM Digital Library
INDReward innovation for long-term member satisfaction
by Gary Tang (Netflix), Jiangwei Pan (Netflix), Henry Wang (Netflix) and Justin Basilico (Netflix).

Many large-scale recommender systems train on engagements because of their data abundance, immediacy of feedback, and correlation to user preferences. At Netflix and many digital products, engagement is an imperfect proxy to the overall goal of long-term user satisfaction. One way we address this misalignment is via reward innovation. In this paper, we provide a high-level description of the problem and motivate our approach. Finally, we present some practical insights into this track of work including challenges, lessons learned, and systems we’ve built to support the effort.

Full text in ACM Digital Library
INDHeterogeneous Knowledge Fusion: A Novel Approach for Personalized Recommendation via LLM
by Bin Yin (Meituan), Junjie Xie (Meituan), Yu Qin (Meituan), Zixiang Ding (Meituan), Zhichao Feng (Meituan), Xiang Li (Unaffiliated) and Wei Lin (Unaffiliated).

The analysis and mining of user heterogeneous behavior are of paramount importance in recommendation systems. However, the conventional approach of incorporating various types of heterogeneous behavior into recommendation models leads to feature sparsity and knowledge fragmentation issues. To address this challenge, we propose a novel approach for personalized recommendation via Large Language Model (LLM), by extracting and fusing heterogeneous knowledge from user heterogeneous behavior information. In addition, by combining heterogeneous knowledge and recommendation tasks, instruction tuning is performed on LLM for personalized recommendations. The experimental results demonstrate that our method can effectively integrate user heterogeneous behavior and significantly improve recommendation performance.

Full text in ACM Digital Library
DSOvercoming Recommendation Limitations with Neuro-Symbolic Integration
by Tommaso Carraro (University of Padova / Fondazione Bruno Kessler).

Despite being studied for over twenty years, Recommender Systems (RSs) still suffer from important issues that limit their applicability in real-world scenarios. Data sparsity, cold start, and explainability are some of the most impacting problems. Intuitively, these historical limitations can be mitigated by injecting prior knowledge into recommendation models. Neuro-Symbolic (NeSy) approaches are suitable candidates for achieving this goal. Specifically, they aim to integrate learning (e.g., neural networks) with symbolic reasoning (e.g., logical reasoning). Generally, the integration lets a neural model interact with a logical knowledge base, enabling reasoning capabilities. In particular, NeSy approaches have been shown to deal well with poor training data, and their symbolic component could enhance model transparency. This gives insights that NeSy systems could potentially mitigate the aforementioned RSs limitations. However, the application of such systems to RSs is still in its early stages, and most of the proposed architectures do not really exploit the advantages of a NeSy approach. To this end, we conducted preliminary experiments with a Logic Tensor Network (LTN), a novel NeSy framework. We used the LTN to train a vanilla Matrix Factorization model using a First-Order Logic knowledge base as an objective. In particular, we encoded facts to enable the regularization of the latent factors using content information, obtaining promising results. In this paper, we review existing NeSy recommenders, argue about their limitations, show our preliminary results with the LTN, and propose interesting future works in this novel research area. In particular, we show how the LTN can be intuitively used to regularize models, perform cross-domain recommendation, ensemble learning, and explainable recommendation, reduce popularity bias, and easily define the loss function of a model.

Full text in ACM Digital Library
DSImproving Recommender Systems Through the Automation of Design Decisions
by Lukas Wegmeth (University of Siegen).

Recommender systems developers are constantly faced with difficult design decisions. Additionally, the number of options that a recommender systems developer has to consider continually grows over time with new innovations. The machine learning community is in a similar situation and has come together to tackle the problem. They invented concepts and tools to make machine learning development both easier and faster. These developments are categorized as automated machine learning (AutoML). As a result, the AutoML community formed and continuously innovates new approaches. Inspired by AutoML, the recommender systems community has recently understood the need for automation and sparsely introduced AutoRecSys. The goal of AutoRecSys is not to replace recommender systems developers but to improve performance through the automation of design decisions. With AutoRecSys, recommender systems engineers do not have to focus on easy but time-consuming tasks and are free to pursue difficult engineering tasks instead. Additionally, AutoRecSys enables easier access to recommender systems for beginners as it reduces the amount of knowledge required to get started with the development of recommender systems. AutoRecSys, like AutoML, is still early in its development and does not yet cover the whole development pipeline. Additionally, it is not yet clear, under which circumstances AutoML approaches can be transferred to recommender systems. Our research intends to close this gap by improving AutoRecSys both with regard to the transfer of AutoML and novel approaches. Furthermore, we focus specifically on the development of novel automation approaches for data processing and training. We note that the realization of AutoRecSys is going to be a community effort. Our part in this effort is to research AutoRecSys fundamentals, build practical tools for the community, raise awareness of the advantages of automation, and catalyze AutoRecSys development.

Full text in ACM Digital Library
DSChallenges for Anonymous Session-Based Recommender Systems in Indoor Environments
by Alessio Ferrato (Roma TRE).

Recommender Systems (RSs) have gained widespread popularity for providing personalized recommendations in manifold domains. However, considering the growing user privacy concerns, the development of recommender systems that prioritize data protection has become increasingly important. In indoor environments, RSs face unique challenges, and ongoing research is being conducted to address them. Anonymous Session-Based Recommender Systems (ASBRSs) can represent a possible solution to address these challenges while ensuring user privacy. This paper aims to bridge the gap between existing RS research and the demand for privacy-preserving recommender systems, especially in indoor settings, where significant research efforts are underway. Therefore, it proposes three research questions: How does user modeling based on implicit feedback impact on ASBRSs, considering different embedding extraction networks? How can short sessions be leveraged to start the recommendation process in ASBRSs? To what extent can ASBRSs generate fair recommendations? By investigating these questions, this study establishes the foundations for applying ASBRSs in indoor environments, safeguarding user privacy, and contributing to the ongoing research in this field.

Full text in ACM Digital Library
DSAcknowledging dynamic aspects of trust in recommender systems
by Imane Akdim (School of Computer Science – Mohammed VI Polytechnic University).

Trust-based recommender systems emerged as a solution to different limitations of traditional recommender systems. These systems rely on the assumption that users will adopt the preferences of users they deem trustworthy in an online social setting. However, most trust-based recommender systems consider trust to be a static notion, thereby disregarding crucial dynamic factors that influence the value of trust between users and the performance of the recommender system. In this work, we intend to address several challenges regarding the dynamics of trust within a trust-based recommender system. These issues include the temporal evolution of trust between users and change detection and prediction in users’ interactions. By exploring the factors that influence the evolution of human trust, a complex and abstract concept, this work will contribute to a better understanding of how trust operates in recommender systems.

Full text in ACM Digital Library
DSDenoising Explicit Social Signals for Robust Recommendation
by Youchen Sun (Nanyang Technological University).

Social recommender system assumes that user’s preferences can be influenced by their social connections. However, social networks are inherently noisy and contain redundant signals that are not helpful or even harmful for the recommendation task. In this extended abstract, we classify the noise in the explicit social links into intrinsic noise and extrinsic noise. Intrinsic noises are those edges that are natural in the social network but do not have an influence on the user preference modeling; Extrinsic noises, on the other hand, are those social links that are introduced intentionally through malicious attacks such that the attackers can manipulate the social influence to bias the recommendation outcome. To tackle this issue, we first propose a denoising framework that utilizes the information bottleneck principle and contrastive learning to filter out the noisy social edges and use the edges that are socially influential to enhance item prediction. Experiments will be conducted on the real-world datasets for the Top-K ranking evaluation as well as the model’s robustness to simulated social noises. Finally, we discuss the future plan about how to defend against extrinsic noise, which results from the malicious attack.

Full text in ACM Digital Library

Back to program

Posters Day 2

RecSys 2023 (Singapore)

Diamond Supporter

Platinum Supporter

Gold Supporter

Silver Supporter

Bronze Supporter

Challenge Sponsor

Special Supporters

About this site

RecSys 2026

About the photos on this site