• Tutorial on Large Language Models for Recommendation
    by Wenyue Hua (Rutgers University), Lei Li (Hong Kong Baptist University), Shuyuan Xu (Rutgers University), Li Chen (Hong Kong Baptist University), Yongfeng Zhang (Rutgers University)


    Foundation Models such as Large Language Models (LLMs) have significantly advanced many research areas. In particular, LLMs offer significant advantages for recommender systems, making them valuable tools for personalized recommendations. For example, by formulating various recommendation tasks such as rating prediction, sequential recommendation, straightforward recommendation, and explanation generation into language instructions, LLMs make it possible to build universal recommendation engines that can handle different recommendation tasks. Additionally, LLMs have a remarkable capacity for understanding natural language, enabling them to comprehend user preferences, item descriptions, and contextual information to generate more accurate and relevant recommendations, leading to improved user satisfaction and engagement. This tutorial introduces Foundation Models such as LLMs for recommendation. We will introduce how recommender system advanced from shallow models to deep models and to large models, how LLMs enable generative recommendation in contrast to traditional discriminative recommendation, and how to build LLM-based recommender systems. We will cover multiple perspectives of LLM-based recommendation, including data preparation, model design, model pre-training, fine-tuning and prompting, multi-modality and multi-task learning, as well as trustworthy perspectives of LLM-based recommender systems such as fairness and transparency.


    The tutorial will introduce LLM-based recommendation from five main perspectives — dataset, model, evaluation, toolkit, and real-world systems. In particular:
    – Datasets: We introduce datasets that facilitate LLM-based recommendation models. This is particularly important for data-centric machine learning such as LLM-based recommender systems, since the pre-training of LLMs largely determines the ability and utility of LLM-based recommendation.
    – Models: In this part of the tutorial, we organize and introduce recent LLM-based recommendation models, their relationships, various pre-training, fine-tuning and prompting strategies of LLM-based recommendation models, and possible directions for future improvements.
    – Evaluation: We introduce evaluation methods for LLM-based recommendation models. Because of the multitask, multimodal, and cross-data nature of LLM-based recommendation models, evaluating the models not only focus on recommendation accuracy, but also many other perspectives such as text quality, efficiency and fluency.
    – Toolkit: We introduce existing open-source models and platforms to facilitate LLM-based recommendation research, including both LLM backbones such as T5 and LLaMA, and LLM-based recommendation platforms such as OpenP5.
    – Real-world systems: Finally, we introduce existing industrial LLM systems that support recommender functionality and their advantages and problems to improve. Examples include ChatGPT, Microsoft Bing and Google Bard.

  • On Challenges of Evaluating Recommender Systems in Offline Setting
    by Aixin Sun (Nanyang Technological University, Singapore)


    In the past 20 years, the area of Recommender Systems (RecSys) has gained significant attention from both academia and industry. We are not in short of research papers on various RecSys models or online systems from industry players. However, in terms of model evaluation in offline settings, many researchers simply follow the commonly adopted experiment setup, and have not zoomed into the unique characteristics of the RecSys problem. In this tutorial, I will briefly review the commonly adopted evaluations in RecSys then discuss the challenges of evaluating recommender systems in an offline setting. The main emphasis is the consideration of global timeline in the evaluation, particularly when a dataset covers user-item interactions that have been collected from a long time period.


    This tutorial is concluded with a fresh look at RecSys evaluation on how to conduct more meaningful evaluations by considering the global timeline. Here are the topics in an itemized view:

    Part I

    Introduction (10 min)
    – Recommender system basics
    – Applications powered by RecSys

    Commonly used RecSys evaluation metrics (20 min)
    – Commonly used metrics in academic research
    – Metrics used for different applications in online settings e.g., e-commerce, advisement, video, music, and news recommendations.

    Part II

    Challenges in computing the offline metrics (40 min)
    – How RecSys works in practice with Popularity as an example
    – Data partition schemes in RecSys experiments using offline datasets
    – Data leakage due to not maintaining global timeline
    – The impact on understanding the RecSys research problem

    Part III

    Criticism on RecSys from evaluation perspective (10 min)
    – The counter-intuitive observations
    – The common pitfalls in evaluating RecSys

    More practical evaluations (10 min)
    – The meaning of fair comparison
    – The observation of global timeline



  • User Behavior Modeling with Deep Learning for Recommendation: Recent Advances
    by Weiwen Liu (Huawei Noah’s Ark Lab, China), Wei Guo (Huawei Noah’s Ark Lab, Singapore), Yong Liu (Huawei Noah’s Ark Lab, Singapore), Ruiming Tang (Huawei Noah’s Ark Lab, China), Hao Wang (University of Science and Technology of China, China)


    User Behavior Modeling (UBM) plays a critical role in user interest learning, and has been extensively used in recommender systems. The exploration of key interactive patterns between users and items has yielded significant improvements and great commercial success across a variety of recommendation tasks. This tutorial aims to offer an in-depth exploration of this evolving research topic. We start by reviewing the research background of UBM, paving the way to a clearer understanding of the opportunities and challenges. Then, we present a systematic categorization of existing UBM research works, which can be categorized into four different directions including Conventional UBM, Long-Sequence UBM, Multi-Type UBM, and UBM with Side Information. To provide an expansive understanding, we delve into each category, discussing representative models while highlighting their respective strengths and weaknesses. Furthermore, we elucidate on the industrial applications of UBM methods, aiming to provide insights into the practical value of existing UBM solutions. Finally, we identify some open challenges and future prospects in UBM. This comprehensive tutorial serves to provide a solid foundation for anyone looking to understand and implement UBM in their research or business.


    This tutorial focuses on user behavior modeling in recommender systems and will be a 90-minute tutorial. The outline of the tutorial is given as follows

    Introduction (10min)
    – Recommender system basics
    – Problem formulation of user behavior modeling
    – Taxonomy: Conventional UBM, Long-Sequence UBM, Multi-Type UBM, and UBM with Side Information

    Conventional UBM (5min)
    – Network structures: RNN, CNN, Attention

    Long-Sequence UBM (15min)
    – Memory-augmented methods
    – User behavior retrieval methods

    Multi-Type UBM (15min)
    – Behavior type definition
    – Multi-behavior fusion and prediction

    UBM with Side Information (15min)
    – Source of the side information
    – Side information utilization

    UBM with Deep Reinforcement Learning (10min)

    Industrial practices and performances of online deployment (10min)

    Summary and future prospects (10min)

  • Trustworthy Recommender Systems: Technical, Ethical, Legal, and Regulatory Perspectives
    by Markus Schedl (Johannes Kepler University Linz and Linz Institute of Technology, Austria), Vito Walter Anelli (Politecnico di Bari, Italy), Elisabeth Lex (Graz University of Technology, Austria)


    This tutorial provides an interdisciplinary overview about the topics of fairness, non-discrimination, transparency, privacy, and security in the context of recommender systems. These are important dimensions of trustworthy AI systems according to European policies, but also extend to the global debate on regulating AI technology. Since we strongly believe that the aforementioned aspects require more than merely technical considerations, we discuss these topics also from ethical, legal, and regulatory points of views, intertwining different perspectives. The main focus of the tutorial is still on presenting technical solutions that aim at addressing the mentioned topics of trustworthiness. In addition, the tutorial equips the mostly technical audience of RecSys with the necessary understanding of the social and ethical implications of their research and development, and of recent ethical guidelines and regulatory frameworks.


    The tutorial is organized into five parts: an introduction including ethical guidelines for trustworthy AI and their adoption in regulatory approaches; three subsequent parts corresponding to the main themes addressed, i.e., fairness and non-discrimination; privacy and security; transparency and explainability; rounded off with a discussion of open challenges. Throughout the three main parts, we discuss three perspectives: the system-centric perspective, the human-centric perspective, and the legal perspective, covering technical aspects, human needs, and legislators’ points of view, respectively.

  • Recommenders in the Wild / Practical Evaluation Methods
    by Kim Falk (Binary Vikings), Morten Arngren (WundermanThompson)


    Building a recommender system, from the initial idea, implementation, and offline evaluation to running a system where users will receive quality recommendations, is a long process with many practical considerations. A recommender model that produces close to state-of-the-art metrics in an offline evaluation is only a small step in creating a recommender system and often not the most important. This gap between training a recommender model and having a recommender system in production is a topic often neglected and will be the focus of this tutorial.

    We will start looking at the goal of recommender systems from the perspective of different use cases and see how those correspond with the traditional evaluation metrics. Using those and others beyond accuracy metrics and the data, we will look into how we can develop the best candidates for online testing. During this process, we will also discuss good experimental practices.

    The second part of this tutorial will look at how to take those model/system candidates and test them in production using A/B testing, Bayesian A/B testing, and Bayesian bandits. We will also provide some considerations on the cost of applying one model compared to the other. For these practical steps, we will further provide simple, applicable code in notebooks as part of the tutorial.


    What is a recommender in the wild
    Developing the Recommender
    Business considerations when deploying a recommender
    Beyond accuracy – what metrics are interesting
    Personalisation and the Myth of Long Sessions
    A/B testing
    Bayesian AB-testing
    Bayesian Bandits for AB-testing
    Handling recommenders in production


    To prepare for this tutorial, please follow the instructions listed on https://github.com/recs-in-the-wild/recsys23-tutorial

  • Customer Lifetime Value Prediction: Towards the Paradigm Shift of Recommender System Objectives
    by Chuhan Wu (Noah’s Ark Lab, Huawei), Qinglin Jia (Noah’s Ark Lab, Huawei), Zhenhua Dong (Noah’s Ark Lab, Huawei), Ruiming Tang (Noah’s Ark Lab, Huawei)


    The ultimate goal of recommender systems is satisfying users’ information needs in the long term. Despite the success of current recommendation techniques in targeting user interest, optimizing long-term user engagement and platform revenue is still challenging due to the restriction of optimization objectives such as clicks, ratings, and dwell time. Customer lifetime value (LTV) reflects the total monetary value of a customer to a business over the course of their relationship. Accurate LTV prediction can guide personalized service providers to optimize their marketing, sales, and service strategies to maximize customer retention, satisfaction, and profitability. However, the extreme sparsity, volatility, and randomness of consumption behaviors make LTV prediction rather intricate and challenging. In this tutorial, we give a detailed introduction to the key technologies and problems in LTV prediction. We present a systematic technique chronicle of LTV prediction over decades, including probabilistic models, traditional machine learning methods, and deep learning techniques. Based on this overview, we introduce several critical challenges in algorithm design, performance evaluation and system deployment from an industrial perspective, from which we derive potential directions for future exploration. From this tutorial, the RecSys community can gain a better understanding of the unique characteristics and challenges of LTV prediction, and it may serve as a catalyst to shift the focus of recommender systems from short-term targets to long-term ones.


    1. Introduction (10 mins)
    (a) Definition of LTV and LTV Prediction (2 mins)
    (b) Application Scenario of LTV Prediction (3 mins)
    (c) Challenges of LTV Prediction (4 mins)
    (d) Tutorial Organization (1 min)

    2. Technique Evolution in LTV Prediction (40 mins)
    (a) Taxonomy of LTV Prediction Methods (5 mins)
    (b) Probabilistic Models (10 mins)
    (c) Traditional Machine Learning Methods (5 mins)
    (d) Deep Learning-based Methods (10 mins)
    (e) Our Industrial Practice (10 mins)

    3. Remaining Problems in LTV Prediction (20 mins)
    (a) Delayed Feedback and Cold Start (5 mins)
    (b) Model Bias and Unrobustness (5 mins)
    (c) Downstream Application of Predicted LTVs (5 mins)
    (d) Data Resources and Offline Evaluation (5 mins)

    4. FutureWork, Conclusion and Discussion (20 mins)
    (a) Better Multi-task Optimization (2 mins)
    (b) Responsible Model Learning and Maintenance (2 mins)
    (c) Unifying Data and World Knowledge (3 mins)
    (d) Offline Evaluation and Simulation (3 mins)
    (e) Conclusion and Discussions (10 mins)



Diamond Supporter
Platinum Supporter
Amazon Science
Gold Supporter
Silver Supporter
Bronze Supporter
Challenge Sponsor
Special Supporters