Session 7: Interactive Recommendation 1

Date: Thursday September 21, 11:15 AM – 12:35 PM (GMT+8)
Room: Hall 406CX
Session Chair: Fedelucio Narducci
Parallel with: Session 8: Knowledge and Context

  • RESGoal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance Feedback
    by Yaxiong Wu (University of Glasgow), Craig Macdonald (University of Glasgow) and Iadh Ounis (University of Glasgow).

    Interactive recommendation enables users to provide verbal and non-verbal relevance feedback (such as natural-language critiques and likes/dislikes) when viewing a ranked list of recommendations (such as images of fashion products) to guide the recommender system towards their desired items (i.e. goals) across multiple interaction turns. The multi-modal interactive recommendation (MMIR) task has been successfully formulated with deep reinforcement learning (DRL) algorithms by simulating the interactions between an environment (i.e. a user) and an agent (i.e. a recommender system). However, it is typically challenging and unstable to optimise the agent to improve the recommendation quality associated with implicit learning of multi-modal representations in an end-to-end fashion in DRL. This is known as the coupling of policy optimisation and representation learning. To address this coupling issue, we propose a novel goal-oriented multi-modal interactive recommendation model (GOMMIR) that uses both verbal and non-verbal relevance feedback to effectively incorporate the users’ preferences over time. Specifically, our GOMMIR model employs a multi-task learning approach to explicitly learn the multi-modal representations using a multi-modal composition network when optimising the recommendation agent. Moreover, we formulate the MMIR task using goal-oriented reinforcement learning and enhance the optimisation objective by leveraging non-verbal relevance feedback for hard negative sampling and providing extra goal-oriented rewards to effectively optimise the recommendation agent. Following previous work, we train and evaluate our GOMMIR model by using user simulators that can generate natural-language feedback about the recommendations as a surrogate for real human users. Experiments conducted on four well-known fashion datasets demonstrate that our proposed GOMMIR model yields significant improvements in comparison to the existing state-of-the-art baseline models.

    Full text in ACM Digital Library

  • RESAlleviating the Long-Tail Problem in Conversational Recommender Systems
    by Zhipeng Zhao (Singapore Management University), Kun Zhou (School of Information, Renmin University of China), Xiaolei Wang (Gaoling School of Artificial Intelligence, Renmin University of China), Wayne Xin Zhao (Gaoling School of Artificial Intelligence, Renmin University of China), Fan Pan (Poisson Lab, Huawei), Zhao Cao (Poisson Lab, Huawei) and Ji-Rong Wen (Gaoling School of Artificial Intelligence, Renmin University of China).

    Conversational recommender systems (CRS) aim to provide the recommendation service via natural language conversations. To develop an effective CRS, high-quality CRS datasets are very crucial. However, existing CRS datasets suffer from the long-tail issue, \ie a large proportion of items are rarely (or even never) mentioned in the conversations, which are called long-tail items. As a result, the CRSs trained on these datasets tend to recommend frequent items, and the diversity of the recommended items would be largely reduced, making users easier to get bored.

    To address this issue, this paper presents \textbf{LOT-CRS}, a novel framework that focuses on simulating and utilizing a balanced CRS dataset (\ie covering all the items evenly) for improving \textbf{LO}ng-\textbf{T}ail recommendation performance of CRSs. In our approach, we design two pre-training tasks to enhance the understanding of simulated conversation for long-tail items, and adopt retrieval-augmented fine-tuning with label smoothness strategy to further improve the recommendation of long-tail items. Extensive experiments on two public CRS datasets have demonstrated the effectiveness and extensibility of our approach, especially on long-tail recommendation. All the experimental codes will be released after the review period.

    Full text in ACM Digital Library

  • RESData-free Knowledge Distillation for Reusing Recommendation Models
    by Cheng Wang (Huazhong University of Science and Technology), Jiacheng Sun (Huawei Noah’s Ark Lab), Zhenhua Dong (Huawei Noah’s Ark Lab), Jieming Zhu (Huawei Noah’s Ark Lab), Zhenguo Li (Huawei Noah’s Ark Lab), Ruixuan Li (Huazhong University of Science and Technology) and Rui Zhang (

    A common practice to keep the freshness of an offline Recommender System (RS) is to train models that fit the user’s most recent behaviours while directly replacing the outdated historical model. However, many feature engineering and computing resources are used to train these historical models, but they are underutilized in the downstream RS model training. In this paper, to turn these historical models into treasures, we introduce a model inversed data synthesis framework, which can recover training data information from the historical model and use it for knowledge transfer. This framework synthesizes a new form of data from the historical model. Specifically, we ‘invert’ an off-the-shield pretrained model to synthesize binary class user-item pairs beginning from random noise without requiring any additional information from the training dataset. To synthesize new data from a pretrained model, we update the input from random float initialization rather than one- or multi-hot vectors. An additional statistical regularization is added to further improve the quality of the synthetic data inverted from the deep model with batch normalization. The experimental results show that our framework can generalize across different types of models. We can efficiently train different types of classical Click-Through-Rate (CTR) prediction models from scratch with significantly few inversed synthetic data (2 orders of magnitude). Moreover, our framework can also work well in the knowledge transfer scenarios such as continual updating and data-free knowledge distillation.

    Full text in ACM Digital Library

  • RESOnline Matching: A Real-time Bandit System for Large-scale Recommendations
    by Xinyang Yi (Google), Shao-Chuan Wang (Google), Ruining He (Google), Hariharan Chandrasekaran (Google), Charles Wu (Google), Lukasz Heldt (Google), Lichan Hong (Google), Minmin Chen (Google) and Ed Chi (Google).

    The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users’ past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce \textsl{Online Matching}: a scalable closed-loop bandit system learning from users’ direct feedback on items in real time. We present a hybrid \textsl{offline + online} approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB — a novel extension of the LinUCB algorithm — to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.

    Full text in ACM Digital Library

Back to program

Diamond Supporter
Platinum Supporter
Amazon Science
Gold Supporter
Silver Supporter
Bronze Supporter
Challenge Sponsor
Special Supporters