RecSys 2024 — Wednesday Posters - RecSys

Wednesday Posters

Date: Wednesday October 16
Room: Chamber of Commerce

RESA Dataset for Adapting Recommender Systems to the Fashion Rental Economy
by Karl Audun Kagnes Borgersen (Universitetet i Agder), Morten Goodwin (University of Agder), Morten Grundetjern (Universitetet i Agder) and Jivitesh Sharma (University of Agder)

In response to the escalating ecological challenges that threaten global sustainability, there’s a need to investigate alternative methods of commerce, such as rental economies. Like most online commerce, rental or otherwise, a functioning recommender system is crucial for their success. Yet the domain has, until this point, been largely neglected by the recommender system research community.

Our dataset, derived from our collaboration with the leading Norwegian fashion rental company Vibrent, encompasses 77.1k transactions, rental histories from 7.4k anonymized users, and 15.6k unique outfits in which each physical item’s attributes and rental history is meticulously tracked. All outfits are listed as individual items or their corresponding item groups, referring to shared designs between the individual items. This notation underlines the novel challenges of rental as compared to more traditional recommender system problems where items are generally interchangeable. For example, an RS for rental items requires tracking each physical item to ensure it isn’t rented for the same time period to several different customers, as compared to retail, in which tracking or recommending individual items is largely unnecessary. Each outfit is accompanied by a set of tags describing some of their attributes. We also provide a total of 50.1k images displaying across all items, along with a set of precomputed zero-shot embeddings.

We apply a myriad of common recommender system methods to the dataset to provide a performance baseline. This baseline is calculated for both the traditional fashion recommender system problem of recommending outfit groups and the novel problem of predicting individual item rental. To our knowledge, this is the first published article to directly discuss fashion rental recommender systems, as well as the first published dataset intended for this purpose. We hope that the publication of this dataset will serve as a catalyst for a new branch of research for specialized fashion rental recommender systems.

The dataset has been made freely available at https://www.kaggle.com/datasets/kaborg15/vibrent-clothes-rental-dataset

All code associated with the project have been made available at:https://github.com/cair/Vibrent_Clothes_Rental_Dataset_Collection

Full text in ACM Digital Library
RESA multimodal single-branch embedding network for recommendation in cold-start and missing modality scenarios
by Christian Ganhör (Johannes Kepler University Linz), Marta Moscati (Johannes Kepler University Linz), Anna Hausberger (Johannes Kepler University Linz), Shah Nawaz (Johannes Kepler University Linz) and Markus Schedl (Johannes Kepler University Linz; Linz Institute of Technology)

Most recommender systems adopt collaborative filtering (CF) and provide recommendations based on past collective interactions. Therefore, the performance of CF algorithms degrades when few or no interactions are available, a scenario referred to as cold-start. To address this issue, previous work relies on models leveraging both collaborative data and side information on the users or items. Similar to multimodal learning, these models aim at combining collaborative and content representations in a shared embedding space. In this work we propose a novel technique for multimodal recommendation, relying on a multimodal Single-Branch embedding network for Recommendation (SiBraR). Leveraging weight-sharing, SiBraR encodes interaction data as well as multimodal side information using the same single-branch embedding network on different modalities. This makes SiBraR effective in scenarios of missing modality, including cold start. Our extensive experiments on large-scale recommendation datasets from three different recommendation domains (music, movie, and e-commerce) and providing multimodal content information (audio, text, image, labels, and interactions) show that SiBraR significantly outperforms CF as well as state-of-the-art content-based RSs in cold-start scenarios, and is competitive in warm scenarios. We show that SiBraR’s recommendations are accurate in missing modality scenarios, and that the model is able to map different modalities to the same region of the shared embedding space, hence reducing the modality gap.

Full text in ACM Digital Library
RESA Pre-trained Zero-shot Sequential Recommendation Framework via Popularity Dynamics
by Junting Wang (Urbana-Champaign), Praneet Rathi (Urbana-Champaign) and Hari Sundaram (Urbana-Champaign)

This paper proposes a novel pre-trained framework for zero-shot cross-domain sequential recommendation without auxiliary information. While using auxiliary information (e.g., item descriptions) seems promising for cross-domain transfer, a cross-domain adaptation of sequential recommenders can be challenging when the target domain differs from the source domain—item descriptions are in different languages; metadata modalities (e.g., audio, image, and text) differ across source and target domains. If we can learn universal item representations independent of the domain type (e.g., groceries, movies), we can achieve zero-shot cross-domain transfer without auxiliary information. Our critical insight is that user interaction sequences highlight shifting user preferences via the popularity dynamics of interacted items. We present a pre-trained sequential recommendation framework: PrepRec, which utilizes a novel popularity dynamics-aware transformer architecture. Through extensive experiments on five real-world datasets, we show that PrepRec, without any auxiliary information, can zero-shot adapt to new application domains and achieve competitive performance compared to state-of-the-art sequential recommender models. In addition, we show that PrepRec complements existing sequential recommenders. With a simple post-hoc interpolation, PrepRec improves the performance of existing sequential recommenders on average by 11.8% in Recall@10 and 22% in NDCG@10. We provide an anonymized implementation of PrepRec at https://github.com/CrowdDynamicsLab/preprec.

Full text in ACM Digital Library
LBRAre We Explaining the Same Recommenders? Incorporating Recommender Performance for Evaluating Explainers
by Amir Reza Mohammadi (University of Innsbruck), Andreas Peintner (University of Innsbruck), Michael Müller (University of Innsbruck) and Eva Zangerle (University of Innsbruck)

Explainability in recommender systems is both crucial and challenging. Among the state-of-the-art explanation strategies, counterfactual explanation provides intuitive and easily understandable insights into model predictions by illustrating how a small change in the input can lead to a different outcome. Recently, this approach has garnered significant attention, with various studies employing different metrics to evaluate the performance of these explanation methods. In this paper, we investigate the metrics used for evaluating counterfactual explainers for recommender systems. Through extensive experiments, we demonstrate that the performance of recommenders has a direct effect on counterfactual explainers and ignoring it results in inconsistencies in the evaluation results of explainer methods. Our findings highlight an additional challenge in evaluating counterfactual explainer methods and underscore the need to report the recommender performance or consider it in evaluation metrics.

Full text in ACM Digital Library
RESCalibrating the Predictions for Top-N Recommendations
by Masahiro Sato (FUJIFILM)

Well-calibrated predictions of user preferences are essential for many applications. Since recommender systems typically select the top-N items for users, calibration for those top-N items, rather than for all items, is important. We show that previous calibration methods result in miscalibrated predictions for the top-N items, despite their excellent calibration performance when evaluated on all items. In this work, we address the miscalibration in the top-N recommended items. We first define evaluation metrics for this objective and then propose a generic method to optimize calibration models focusing on the top-N items. It groups the top-N items by their ranks and optimizes distinct calibration models for each group with rank-dependent training weights. We verify the effectiveness of the proposed method for both explicit and implicit feedback datasets, using diverse classes of recommender models.

Full text in ACM Digital Library
RESCALRec: Contrastive Alignment of Generative LLMs For Sequential Recommendation
by Yaoyiran Li (University of Cambridge), Xiang Zhai (Google), Moustafa Alzantot (Google Research), Keyi Yu (Google), Ivan Vulić (University of Cambridge), Anna Korhonen (University of Cambridge) and Mohamed Hammad (Google)

Traditional recommender systems such as matrix factorization methods have primarily focused on learning a shared dense embedding space to represent both items and user preferences. Subsequently, sequence models such as RNN, GRUs, and, recently, Transformers have emerged and excelled in the task of sequential recommendation. This task requires understanding the sequential structure present in users’ historical interactions to predict the next item they may like. Building upon the success of Large Language Models (LLMs) in a variety of tasks, researchers have recently explored using LLMs that are pretrained on vast corpora of text for sequential recommendation. To use LLMs for sequential recommendation, both the history of user interactions and the model’s prediction of the next item are expressed in text form. We propose CALRec, a two-stage LLM finetuning framework that finetunes a pretrained LLM in a two-tower fashion using a mixture of two contrastive losses and a language modeling loss: the LLM is first finetuned on a data mixture from multiple domains followed by another round of target domain finetuning. Our model significantly outperforms many state-of-the-art baselines (+37% in Recall@1 and +24% in NDCG@10) and our systematic ablation studies reveal that (i) both stages of finetuning are crucial, and, when combined, we achieve improved performance, and (ii) contrastive alignment is effective among the target domains explored in our experiments.

Full text in ACM Digital Library
RESCAPRI-FAIR: Integration of Multi-sided Fairness in Contextual POI Recommendation Framework
by Francis Zac Dela Cruz (University of New South Wales), Flora D. Salim (University of New South Wales), Yonchanok Khaokaew (University of New South Wales) and Jeffrey Chan (RMIT University)

Point-of-interest (POI) recommendation considers spatio-temporal factors like distance, peak hours, and user check-ins. Given their influence on both consumer experience and POI business, it’s crucial to consider fairness from multiple perspectives. Unfortunately, these systems often provide less accurate recommendations to inactive users and less exposure to unpopular POIs. This paper develops a post-filter method that includes provider and consumer fairness in existing models, aiming to balance fairness metrics like item exposure with performance metrics such as precision and distance. Experiments show that a linear scoring model for provider fairness in re-scoring items offers the best balance between performance and long-tail exposure, sometimes without much precision loss. Addressing consumer fairness by recommending more popular POIs to inactive users increased precision in some models and datasets. However, combinations that reached the Pareto front of consumer and provider fairness resulted in the lowest precision values, highlighting that tradeoffs depend greatly on the model and dataset.

Full text in ACM Digital Library
RESComparative Analysis of Pretrained Audio Representations in Music Recommender Systems
by Yan-Martin Tamm (University of Tartu) and Anna Aljanaki (University of Tartu)

Over the years, Music Information Retrieval (MIR) has proposed various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models with a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network learning over these models. Our research addresses this gap and evaluates the applicability of six pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in the context of MRS. We assess their performance using three recommendation models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our findings suggest that pretrained audio representations exhibit significant performance variability between traditional MIR tasks and MRS, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.

Full text in ACM Digital Library
RESConFit: Improving Resume-Job Matching using Data Augmentation and Contrastive Learning
by Xiao Yu (Columbia University), Jinzhong Zhang (Intellipro Group Inc.) and Zhou Yu (Columbia University)

A reliable resume-job matching system helps a company find suitable candidates from a pool of resumes, and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction records in resume-job datasets are sparse. Different from many prior work that use complex modeling techniques, we tackle this sparsity problem using data augmentations and a simple contrastive learning approach. ConFit first formulates resume-job datasets as a sparse bipartite graph, and creates an augmented dataset by paraphrasing specific sections in a resume or a job post. Then, ConFit finetunes pre-trained encoders with contrastive learning to further increase training samples from B pairs per batch to
O(B2) per batch. We evaluate ConFit on two real-world datasets and find it outperforms prior methods (including BM25 and OpenAI text-ada-002) by up to 19% and 31% absolute in nDCG@10 for ranking jobs and ranking resumes, respectively. We believe ConFit’s simple yet highly performant approach lays a strong foundation for future research in modeling person-job fit

Full text in ACM Digital Library
RESCoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation
by Jieming Zhu (Huawei Noah’s Ark Lab), Mengqun Jin (Tsinghua University), Qijiong Liu (The HK PolyU), Zexuan Qiu (The Chinese University of Hong Kong), Zhenhua Dong (Huawei Noah’s Ark Lab) and Xiu Li (Tsinghua University)

Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic tokens to index items, while the second stage autoregressively generates semantic tokens of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually employs a vector quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic tokens of items, but this method fails to capture the essential neighborhood relationships that are vital for effective item modeling in recommender systems. In this paper, we propose a contrastive quantization-based semantic tokenization approach, named CoST, which harnesses both item relationships and semantic information to learn semantic tokens. Our experimental results highlight the significant impact of semantic tokenization on generative recommendation performance, with CoST achieving up to a 43% improvement in Recall@5 and 44% improvement in NDCG@5 on the MIND dataset over previous baselines.

Full text in ACM Digital Library
INDEntity-Aware Collections Ranking: A Joint Scoring Approach
by Sihao Chen (Shopee Pte. Ltd.), Sheng Li (Shopee Pte. Ltd.), Youhe Chen (Shopee Pte. Ltd.) and Dong Yang (Shopee Pte. Ltd.)

Recommender systems in academia and industry typically predict Click-Through Rate (CTR) at the item or entity level. In practical scenarios, products can take on various forms and designs. We present a novel joint scoring framework that supports the listwise ranking of a collection composed of multiple entities. It learns the best combination of entities to be displayed to the user. We also introduce a novel dual attention mechanism that better captures the user’s interest in the collection. Our approach demonstrated superior performance through offline and online experiments and has been deployed to Shopee’s Shop Ads across all markets.

Full text in ACM Digital Library
RESEvaluation and simplification of text difficulty using LLMs in the context of recommending texts in French to facilitate language learning
by Henri Jamet (University of Lausanne), Maxime Manderlier (University of Mons (UMONS)), Yash Raj Shrestha (University of Lausanne) and Michalis Vlachos (University of Lausanne)

Learning a new language can be challenging. To help learners, we built a recommendation system that suggests texts and videos based on the learners’ skill level of the language and topic interests. Our system analyzes content to determine its difficulty and topic, and, if needed, can simplify complex texts while maintaining semantics. Our work explores the holistic use of Large Language Models (LLMs) for the various sub-tasks involved for accurate recommendations: difficulty estimation and simplification, graph recommender engine, topic estimation. We present a comprehensive evaluation comparing zero-shot and fine-tuned LLMs, demonstrating significant improvements in French content difficulty prediction (18−56%), topic prediction accuracy (27%), and recommendation relevance (up to 18% NDCG increase).

Full text in ACM Digital Library
LBRExploratory Analysis of Recommending Urban Parks for Health-Promoting Activities
by Linus W. Dietz (King’s College London), Sanja Šćepanović (Nokia Bell Labs), Ke Zhou (Nokia Bell Labs) and Daniele Quercia (Nokia Bell Labs)

Parks are essential spaces for promoting urban health, and recommender systems could assist individuals in discovering parks for leisure and health-promoting activities. This is particularly important in large cities like London, which has over 1,500 named parks, making it challenging to understand what each park offers. Due to the lack of datasets and the diverse health-promoting activities parks can support (e.g., physical, social, nature-appreciation), it is unclear which recommendation algorithms are best suited for this task. To explore the dynamics of recommending parks for specific activities, we created two datasets: one from a survey of over 250 London residents, and another by inferring visits from over 1 million geotagged Flickr images taken in London parks. Analyzing the geographic patterns of these visits revealed that recommending nearby parks is ineffective, suggesting that this recommendation task is distinct from Point of Interest recommendation. We then tested various recommendation models, identifying a significant popularity bias in the results. Additionally, we found that personalized models have advantages in recommending parks beyond the most popular ones. The data and findings from this study provide a foundation for future research on park recommendations.

Full text in ACM Digital Library
INDExplore versus repeat: insights from an online supermarket
by Mariagiorgia Agnese Tandoi (Picnic Technologies) and Daniela Solis Morales (Picnic Technologies)

At online supermarket Picnic, we implemented both traditional collaborative filtering and a hybrid method to provide recipe recommendations at scale. This case study presents findings from the online evaluation of these algorithms, focusing on the repeat-explore trade-off. Our findings allow other online retailers to gain insights into the importance of thoughtful model design in navigating this important balance. We argue that even when exploiting known preferences proves highly beneficial in the short term, prioritizing exploratory content is essential for long-term customer satisfaction and sustained growth. Our research lays the groundwork for a compelling discussion on defining success in balancing the familiar and the novel in online grocery shopping.

Full text in ACM Digital Library
RESFairness Matters: A look at LLM-generated group recommendations
by Antonela Tommasel (CONICET-UNCPBA, ISISTAN)

Recommender systems play a crucial role in how users consume information, with group recommendation receiving considerable attention. Ensuring fairness in group recommender systems entails providing recommendations that are useful and relevant to all group members rather than solely reflecting the majority’s preferences, while also addressing fairness concerns related to sensitive attributes (e.g., gender). Recently, the advancements on Large Language Models (LLMs) have enabled the development of new kinds of recommender systems. However, LLMs can perpetuate social biases present in training data, posing risks of unfair outcomes and harmful impacts. We investigated LLMs impact on group recommendation fairness, establishing and instantiating a framework that encompasses group definition, sensitive attribute combinations, and evaluation methodology. Our findings revealed the interaction patterns between sensitive attributes and LLMs and how they affected recommendation. This study advances the understanding of fairness considerations in group recommendation systems, laying the groundwork for future research.

Full text in ACM Digital Library
RESGLAMOR: Graph-based LAnguage MOdel embedding for citation Recommendation
by Zafar Ali (Southeast University), Guilin Qi (Southeast University), Irfan Ullah (Shaheed Benazir Bhutto University), Adam A. Q. Mohammed (Southeast University), Pavlos Kefalas (Aristotle University of Thessaloniki) and Khan Muhammad (Sungkyunkwan University)

Digital publishing’s exponential growth has created vast scholarly collections. Guiding researchers to relevant resources is crucial, and knowledge graphs (KGs) are key tools for unlocking hidden knowledge. However, current methods focus on external links between concepts, ignoring the rich information within individual papers. Challenges like insufficient multi-relational data, name ambiguity, and cold-start issues further limit existing KG-based methods, failing to capture the intricate attributes of diverse entities. To solve these issues, we propose GLAMOR, a robust KG framework encompassing entities e.g., authors, papers, fields of study, and concepts, along with their semantic interconnections. GLAMOR uses a novel random walk-based KG text generation method and then fine-tunes the language model using the generated text. Subsequently, the acquired context-preserving embeddings facilitate superior top@k predictions. Evaluation results on two public benchmark datasets demonstrate our GLAMOR’s superiority against state-of-the-art methods especially in solving the cold-start problem.

Full text in ACM Digital Library
INDImproving Data Efficiency for Recommenders and LLMs
by Noveen Sachdeva (Google DeepMind), Benjamin Coleman (Google DeepMind), Wang-Cheng Kang (Google DeepMind), Jianmo Ni (Google DeepMind), James Caverlee (Texas A&M University), Lichan Hong (Google DeepMind), Ed Chi (Google DeepMind) and Derek Zhiyuan Cheng (Google DeepMind)

In recent years, massive transformer-based architectures have driven breakthrough performance in practical applications like autoregressive text-generation (LLMs) and click-prediction (recommenders). A common recipe for success is to train large models on massive web-scale datasets, e.g., modern recommenders are trained on billions of user-item click events, and LLMs are trained on trillions of tokens extracted from the public internet. We are close to hitting the computational and economical limits of scaling up the size of these models, and we expect the next frontier of gains to come from improving the: (i) data quality of the training dataset, and (ii) data efficiency of the extremely expensive training procedure. Inspired by this shift, we present a set of “data-centric” techniques for recommendation and language models that summarizes a dataset into a terse data summary, which is both (i) high-quality, i.e., trains better quality models, and (ii) improves the data-efficiency of the overall training procedure. We propose techniques from two disparate data frameworks: (i) data selection (a.k.a., coreset construction) methods that sample portions of the dataset using grounded heuristics, and (ii) data distillation techniques that generate synthetic examples which are optimized to retain the signals needed for training high-quality models. Overall, this work sheds light on the challenges and opportunities offered by data optimization in web-scale systems, a particularly relevant focus as the recommendation community grapples with the grand challenge of leveraging LLMs.

Full text in ACM Digital Library
LBRLess is More: Towards Sustainability-Aware Persuasive Explanations in Recommender Systems
by Thi Ngoc Trang Tran (Graz University of Technology), Seda Polat Erdeniz (Graz University of Technology), Alexander Felfernig (Graz University of Technology), Sebastian Lubos (Graz University of Technology), Merfat El Mansi (Graz University of Technology) and Viet-Man Le (Graz University of Technology)

Recommender systems play an important role in supporting the achievement of the United Nations sustainable development goals (SDGs). In recommender systems, explanations can support different goals, such as increasing a user’s trust in a recommendation, persuading a user to purchase specific items, or increasing the understanding of the reasons behind a recommendation. In this paper, we discuss the concept of “sustainability-aware persuasive explanations” which we regard as a major concept to support the achievement of the mentioned SDGs. Such explanations are orthogonal to most existing explanation approaches since they focus on a “less is more” principle, which per se is not included in existing e-commerce platforms. Based on a user study in three item domains, we analyze the potential impacts of sustainability-aware persuasive explanations. The study results are promising regarding user acceptance and the potential impacts of such explanations.

Full text in ACM Digital Library
INDLeveraging LLM generated labels to reduce bad matches in job recommendations
by Yingchi Pei (Indeed.com), Yi Wei Pang (Indeed.com) and Warren Cai (Indeed.com),
Nilanjan Sengupta (Indeed.com) and Dheeraj Toshniwal (Indeed.com)

Negative signals are increasingly employed to enhance recommendation quality. However, explicit negative feedback is often sparse and may disproportionately reflect the preferences of more vocal users. Commonly used implicit negative feedback, such as impressions without positive interactions, has the limitation of not accurately capturing users’ true negative preferences because users mainly pursue information they consider interesting. In this work, we present an approach that leverages fine-tuned Large Language Models (LLMs) to evaluate recommendation quality and generate negative signals at scale while maintaining cost efficiency. We demonstrate significant improvements in our recommendation systems by deploying a traditional classifier trained using LLM-generated labels.

Full text in ACM Digital Library
LBRLeveraging Monte Carlo Tree Search for Group Recommendation
by Antonela Tommasel (CONICET-UNCPBA, ISISTAN) and J. Andres Diaz-Pace (CONICET-UNCPBA, ISISTAN)

Group recommenders aim to provide recommendations that satisfy the collective preferences of multiple users, a challenging task due to the diverse individual tastes and conflicting interests to be balanced. This is often accomplished by using aggregation techniques that select items on which the group can agree. Traditional aggregators struggle with these complexities, as items are chosen independently, leading to sub-optimal recommendations lacking diversity, novelty, or fairness. In this paper, we propose an aggregation technique that leverages Monte Carlo Tree Search (MCTS) to enhance group recommendations. MCTS is used to explore and evaluate candidate recommendation sequences to optimize overall group satisfaction. We also investigate the integration of MCTS with LLMs aiming at better understanding interactions between user preferences and recommendation sequences to inform the search. Experimental evaluations, although preliminary, showed that our proposal outperforms existing aggregation techniques in terms of relevance and beyond-accuracy aspects of recommendations. The LLM integration achieved positive results for recommendations’ relevance. Overall, this work highlights the potential of heuristic search techniques to tackle the complexities of group recommendations.

Full text in ACM Digital Library
INDLyricLure: Mining Catchy Hooks in Song Lyrics to Enhance Music Discovery and Recommendation
by Siddharth Sharma (Amazon Inc.), Akshay Shukla (Amazon Inc.), Ajinkya Walimbe (Amazon Inc), Tarun Sharma (Amazon Inc) and Joaquin Delgado (Amazon)

Music Search encounters a significant challenge as users increasingly rely on catchy lines from lyrics to search for both new releases and other popular songs. Integrating lyrics into existing lexical search index or using lyrics vector index pose difficulties due to lyrics text length. While lexical scoring mechanisms like BM25 are inadequate and necessitates complex query planning and index schema for long text, text embedding similarity based techniques often retrieve noisy near-similar meaning lyrics, resulting in low precision. This paper introduces a proactive approach to extract catchy phrases from song lyrics, overcoming the limitations of conventional graph-based phrase extractors and deep learning models, which are primarily designed for extractive summarization or task-specific key phrase extraction from domain-specific corpora. Additionally, we employ a multi-step mechanism to mine search query logs for potential unresolved user queries containing catchy phrases from lyrics. This involves creation of word and character k-gram index for lyric chunks, careful query and lyrics domain-centric normalization (and expansion) and a re-ranking layer incorporating lexical and well as semantic similarity. Together these strategies helped us create a high retrieval source specifically for serving lyrics intent queries with high recall.

Full text in ACM Digital Library
INDOff-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)
by Katarzyna Siudek-Tkaczuk (Samsung R&D Institute Poland), Sławomir Kapka (Samsung R&D Institute Poland), Jędrzej Alchimowicz (Samsung R&D Institute Poland), Bartłomiej Swoboda (Samsung R&D Institute Poland) and Michał Romaniuk (Samsung R&D Institute Poland)

Off-Policy Selection (OPS) aims to select the best policy from a set of policies trained using offline Reinforcement Learning. In this work, we describe our custom OPS method and its successful application in Samsung Instant Plays for optimizing ad delivery timings. The motivation behind proposing our custom OPS method is the fact that traditional Off-Policy Evaluation (OPE) methods often exhibit enormous variance leading to unreliable results. We applied our OPS method to initialize policies for our custom pseudo-online training pipeline. The final policy resulted in a substantial 49% lift in the number of watched ads while maintaining similar retention rate.

Full text in ACM Digital Library
RESOn Interpretability of Linear Autoencoders
by Martin Spišák (Recombee), Radek Bartyzal (GLAMI), Antonín Hoskovec (GLAMI; Czech Technical University in Prague) and Ladislav Peška (Charles University)

We derive a novel graph-based interpretation of linear autoencoder models ease r, slim, and their approximate variants. Contrary to popular belief, we reveal that the weights of these models should not be interpreted as dichotomic item similarity but merely as its magnitude. Consequently, we propose a simple modification that considerably improves retrieval ability in sparse domains and yields interpretable inference with negative inputs, as demonstrated by both offline and online experiments. Experiment codes and extended results are available at https://osf.io/bjmuv/.

Full text in ACM Digital Library
INDPareto Front Approximation for Multi-Objective Session-Based Recommender Systems
by Timo Wilm (OTTO (GmbH & Co KG)), Philipp Normann (OTTO (GmbH & Co KG)) and Felix Stepprath (OTTO (GmbH & Co KG))

This work introduces MultiTRON, an approach that adapts Pareto front approximation techniques to multi-objective session-based recommender systems using a transformer neural network. Our approach optimizes trade-offs between key metrics such as click-through and conversion rates by training on sampled preference vectors. A significant advantage is that after training, a single model can access the entire Pareto front, allowing it to be tailored to meet the specific requirements of different stakeholders by adjusting an additional input vector that weights the objectives. We validate the model’s performance through extensive offline and online evaluation. For broader application and research, the source code1 is made available. The results confirm the model’s ability to manage multiple recommendation objectives effectively, offering a flexible tool for diverse business needs.

Full text in ACM Digital Library
RESPositive-Sum Impact of Multistakeholder Recommender Systems for Urban Tourism Promotion and User Utility
by Pavel Merinov (Free University of Bozen-Bolzano) and Francesco Ricci (Free University of Bozen-Bolzano)

When a multistakeholder recommender system (MRS) is designed to produce sustainable urban tourism promotion, two conflicting goals are of practical interest: (i) to cut down the number of visitors at popular sites and (ii) to satisfy tourists’ preferences, often biased towards popular sites. By modelling the tourists’ limited knowledge of the visited city — an important but often overlooked detail — we simulate interactions between tourists and an MRS that jointly optimises tourist’s utility and promotes less popular sites. Experiments based on data logs collected in three tourist cities reveal that such an MRS can lift tourist’s utility and at the same time reduce the number of visitors at popular sites, manifesting a so-called positive-sum impact. However, a delicate balance is crucial; under- or over-promotion of unpopular sites in the recommendation lists can be adverse to both destination and tourist’s utility.

Full text in ACM Digital Library
DEMORePlay: a Recommendation Framework for Experimentation and Production Use
by Alexey Vasilev (Sber AI Lab), Anna Volodkevich (Sber AI Lab), Denis Kulandin (Sber AmazMe), Tatiana Bysheva (Sber AmazMe) and Anton Klenitskiy (Sber AI Lab)

Using a single tool to build and compare recommender systems significantly reduces the time to market for new models. In addition, the comparison results when using such tools look more consistent. This is why many different tools and libraries for researchers in the field of recommendations have recently appeared. Unfortunately, most of these frameworks are aimed primarily at researchers and require modification for use in production due to the inability to work on large datasets or an inappropriate architecture. In this demo, we present our open-source toolkit RePlay – a framework containing an end-to-end pipeline for building recommender systems, which is ready for production use. RePlay also allows you to use a suitable stack for the pipeline on each stage: Pandas, Polars, or Spark. This allows the library to scale computations and deploy to a cluster. Thus, RePlay allows data scientists to easily move from research mode to production mode using the same interfaces.

Full text in ACM Digital Library
RESRevisiting LightGCN: Unexpected Inflexibility, Inconsistency, and A Remedy Towards Improved Recommendation
by Geon Lee (KAIST), Kyungho Kim (KAIST) and Kijung Shin (KAIST)

Graph Neural Networks (GNNs) have emerged as effective tools in recommender systems. Among various GNN models, LightGCN is distinguished by its simplicity and outstanding performance. Its efficiency has led to widespread adoption across different domains, including social, bundle, and multimedia recommendations. In this paper, we thoroughly examine the mechanisms of LightGCN, focusing on its strategies for scaling embeddings, aggregating neighbors, and pooling embeddings across layers. Our analysis reveals that, contrary to expectations based on its design, LightGCN suffers from inflexibility and inconsistency when applied to real-world data.

We introduce LightGCN++, an enhanced version of LightGCN designed to address the identified limitations. LightGCN++ incorporates flexible scaling of embedding norms and neighbor weighting, along with a tailored approach for pooling layer-wise embeddings to resolve the identified inconsistencies. Despite its remarkably simple remedy, extensive experimental results demonstrate that LightGCN++ significantly outperforms LightGCN, achieving an improvement of up to 17.81% in terms of NDCG@20. Furthermore, state-of-the-art models utilizing LightGCN as a backbone for item, bundle, multimedia, and knowledge-graph-based recommendations exhibit improved performance when equipped with LightGCN++.

Full text in ACM Digital Library
RESScaling Law of Large Sequential Recommendation Models
by Gaowei Zhang (Renmin University of China), Yupeng Hou (University of California San Diego), Hongyu Lu (Tencent), Yu Chen (Tencent), Wayne Xin Zhao (Renmin University of China) and Ji-Rong Wen (Renmin University of China)

Scaling of neural networks has recently shown great potential to improve the model capacity in various fields. Specifically, model performance has a power-law relationship with model size or data size, which provides important guidance for the development of large-scale models. However, there is still limited understanding on the scaling effect of user behavior models in recommender systems, where the unique data characteristics (e.g., data scarcity and sparsity) pose new challenges in recommendation tasks.

In this work, we focus on investigating the scaling laws in large sequential recommendation models. Specifically, we consider a pure ID-based task formulation, where the interaction history of a user is formatted as a chronological sequence of item IDs. We don’t incorporate any side information (e.g., item text), to delve into the scaling law’s applicability from the perspective of user behavior. We successfully scale up the model size to 0.8B parameters, making it feasible to explore the scaling effect in a diverse range of model sizes. As the major findings, we empirically show that the scaling law still holds for these trained models, even in data-constrained scenarios. We then fit the curve for scaling law, and successfully predict the test loss of the two largest tested model scales.

Furthermore, we examine the performance advantage of scaling effect on five challenging recommendation tasks, considering the unique issues (e.g., cold start, robustness, long-term preference) in recommender systems. We find that scaling up the model size can greatly boost the performance on these challenging tasks, which again verifies the benefits of large recommendation models.

Full text in ACM Digital Library
RESScene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction
by Wenhao Li (Huazhong University of Science and Technology; Meituan), Jie Zhou (Beihang University), Chuan Luo (Beihang University), Chao Tang (Meituan), Kun Zhang (Meituan) and Shixiong Zhao (The University of Hong Kong)

In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices, along with the short online lifespan of scenes, give rise to the complexity of effective recommendations in online and dynamic scenes. In this work, we propose Scene-wise Adaptive Network (SwAN 1), a novel approach that emphasizes high-performance cold-start online recommendations for new scenes. Our approach introduces several crucial capabilities, including scene similarity learning, user-specific scene transition cognition, scene-specific information construction for the new scene, and enhancing the diverged logical information between scenes. We demonstrate SwAN’s potential to optimize dynamic multi-scene recommendation problems by effectively online handling cold-start recommendations for any newly arrived scenes. More encouragingly, SwAN has been successfully deployed in Meituan’s online catering recommendation service, which serves millions of customers per day, and SwAN has achieved a 5.64% CTR index improvement relative to the baselines and a 5.19% increase in daily order volume proportion.

Full text in ACM Digital Library
RESSelf-Attentive Sequential Recommendations with Hyperbolic Representations
by Evgeny Frolov (AIRI), Tatyana Matveeva (HSE University), Leyla Mirvakhabova (Skolkovo Institute of Science and Technology) and Ivan Oseledets (AIRI)

In recent years, self-attentive sequential learning models have surpassed conventional collaborative filtering techniques in next-item recommendation tasks. However, Euclidean geometry utilized in these models may not be optimal for capturing a complex structure of behavioral data. Building on recent advances in the application of hyperbolic geometry to collaborative filtering tasks, we propose a novel approach that leverages hyperbolic geometry in the sequential learning setting. Our approach replaces final output of the Euclidean models with a linear predictor in the non-linear hyperbolic space, which increases the representational capacity and improves recommendation quality.

Full text in ACM Digital Library
RESSocietal Sorting as a Systemic Risk of Recommenders
by Luke Thorburn (King’s College London), Maria Polukarov (King’s College London) and Carmine Ventre (King’s College London)

Political scientists distinguish between polarization (loosely, people moving further apart along a single dimension) and sorting (an increase in the probabilistic dependence between multiple dimensions of individual difference). Among other harms, sorting can increase the risk of conflict escalation by reinforcing us-and-them group identities and reducing the prevalence of cross-cutting affiliations. In this paper, we (i) review normative arguments for high or low sortedness, (ii) summarize the mechanisms by which sortedness can change, and (iii) show that under a simple model of social media recommender-driven preference change, personalized engagement-based ranking creates a systematic tendency towards sorting, while ranking by diverse engagement (sometimes called “bridging-based ranking”) mitigates this tendency. We conclude by considering the implications for those conducting systemic risk assessments of very large online platforms under the EU Digital Services Act.

Full text in ACM Digital Library
DEMOStalactite: toolbox for fast prototyping of vertical federated learning systems
by Anastasiia Zakharova (ITMO University), Dmitriy Alexandrov (ITMO University), Maria Khodorchenko (ITMO University), Nikolay Butakov (ITMO University), Alexey Vasilev (Sber AI Lab), Maxim Savchenko (Sber AI Lab) and Alexander Grigorievskiy (Independent Researcher)

Machine learning (ML) models trained on datasets owned by different organizations and physically located in remote databases offer benefits in many real-world use cases. State regulations or business requirements often prevent data transfer to a central location, making it difficult to utilize standard machine learning algorithms. Federated Learning (FL) is a technique that enables models to learn from distributed datasets without revealing the original data. Vertical Federated learning (VFL) is a type of FL where data samples are divided by features across several data owners. For instance, in a recommendation task, a user can interact with various sets of items, and the logs of these interactions are stored by different organizations. In this demo paper, we present Stalactite – an open-source framework for VFL that provides the necessary functionality for building prototypes of VFL systems. It has several advantages over the existing frameworks. In particular, it allows researchers to focus on the algorithmic side rather than engineering and to easily deploy learning in a distributed environment. It implements several VFL algorithms and has a built-in homomorphic encryption layer. We demonstrate its use on a real-world recommendation datasets.

Full text in ACM Digital Library
LBRTLRec: A Transfer Learning Framework to Enhance Large Language Models for Sequential Recommendation Tasks
by Jiaye Lin (Tsinghua University), Shuang Peng (Zhejiang Lab), Zhong Zhang (Tencent AI Lab) and Peilin Zhao (Tencent AI Lab)

Recently, Large Language Models (LLMs) have garnered significant attention in recommendation systems, improving recommendation performance through in-context learning or parameter-efficient fine-tuning. However, cross-domain generalization, i.e., model training in one scenario (source domain) but inference in another (target domain), is underexplored. In this paper, we present TLRec, a transfer learning framework aimed at enhancing LLMs for sequential recommendation tasks. TLRec specifically focuses on text inputs to mitigate the challenge of limited transferability across diverse domains, offering promising advantages over traditional recommendation models that heavily depend on unique identities (IDs) like user IDs and item IDs. Moreover, we leverage the source domain data to further enhance LLMs’ performance in the target domain. Initially, we employ powerful closed-source LLMs (e.g., GPT-4) and chain-of-thought techniques to construct instruction tuning data from the third-party scenario (source domain). Subsequently, we apply curriculum learning to fine-tune LLMs for effective knowledge injection and perform recommendations in the target domain. Experimental results demonstrate that TLRec achieves superior performance under the zero-shot and few-shot settings.

Full text in ACM Digital Library
INDToward 100TB Recommendation Models with Embedding Offloading
by Intaik Park (Meta), Ehsan Ardestani (Meta), Damian Reeves (Meta), Sarunya Pumma (Meta), Henry Tsang (Meta), Levy Zhao (Meta), Jian He (Meta), Joshua Deng (Meta), Dennis Van der Staay (Meta), Yu Guo (Meta) and Paul Zhang (Meta)

Training recommendation models become memory-bound with large embedding tables, and fast GPU memory is scarce. In this paper, we explore embedding caches and prefetch pipelines to effectively leverage large but slow host memory for embedding tables. We introduce Locality-Aware Sharding and iterative planning that automatically size caches optimally and produce effective sharding plans. Embedding Offloading, a system that combines all of these components and techniques, is implemented on top of Meta’s open-source libraries, FBGEMM GPU and TorchRec, and it is used to improve scalability and efficiency of industry-scale production models. Embedding Offloading achieved 37x model scale to 100TB model size with only 26% training speed regression.

Full text in ACM Digital Library
LBRUnderstanding Fairness in Recommender Systems: A Healthcare Perspective
by Veronica Kecki (University of Gothenburg) and Alan Said (University of Gothenburg)

Fairness in AI-driven decision-making systems has become a critical concern, especially when these systems directly affect human lives. This paper explores the public’s comprehension of fairness in healthcare recommendations. We conducted a survey where participants selected from four fairness metrics – Demographic Parity, Equal Accuracy, Equalized Odds, and Positive Predictive Value – across different healthcare scenarios to assess their understanding of these concepts. Our findings reveal that fairness is a complex and often misunderstood concept, with a generally low level of public understanding regarding fairness metrics in recommender systems. This study highlights the need for enhanced information and education on algorithmic fairness to support informed decision-making in using these systems. Furthermore, the results suggest that a one-size-fits-all approach to fairness may be insufficient, pointing to the importance of context-sensitive designs in developing equitable AI systems.

Full text in ACM Digital Library
LBRUser knowledge prompt for sequential recommendation
by Yuuki Tachioka (Denso IT Laboratory)

The large language model (LLM) based recommendation system is effective for sequential recommendation, because general knowledge of popular items is included in the LLM. To add domain knowledge of items, the conventional method uses a knowledge prompt obtained from the item knowledge graphs and has achieved SOTA performance. However, for personalized recommendation, it is necessary to consider user knowledge, which the conventional method does not fully consider because user knowledge is not included in the item knowledge graphs; thus, we propose a user knowledge prompt, which converts a user knowledge graph into a prompt using the relationship template. The existing prompt denoising framework is extended to prevent hallucination caused by undesirable interactions between knowledge graph prompts. We propose user knowledge prompts of user traits and user preferences and associate relevant items. Experiments on three types of dataset (movie, music, and book) show the significant and consistent improvement of our proposed user knowledge prompt.

Full text in ACM Digital Library

Back to program

Wednesday Posters

RecSys 2024 (Bari)

Sapphire Supporter

Diamond Supporter

Platinum Supporter

Gold Supporter

Silver Supporter

Bronze Supporter

Women in RecSys’s Event Supporter

Challenge Sponsor

Special Supporters

About this site

RecSys 2026