Session: Sessions and Interaction

Date: Wednesday September 21, 9:00 AM – 10:30 AM (PDT)

  • REPStreaming Session-Based Recommendation: When Graph Neural Networks meet the Neighborhood
    by Sarah Latifi (University of Klangenfurt, Austria), Dietmar Jannach (University of Klagenfurt, Austria)

    In a number of application areas of recommender systems it is important to frequently update the underlying models, e.g., because of a continuous stream of new items that can be recommended or due to rapidly changing interest trends within a community. Moreover, when individual short-term user interests may also change from visit to visit, session-based recommendation techniques are required, leading to the problem of streaming session-based recommendation (SSR). Such problem settings have attracted increased interest in recent years, and different deep learning architectures were proposed that support fast updates of the underlying prediction models when new data arrive.
    In a recent paper, a method based on Graph Neural Networks (GNN) was proposed as being superior than previous methods for the SSR problem. The baselines in the reported experiments included different machine learning models. However, several studies have shown that often conceptually simpler methods, e.g., based on nearest neighbors, can be highly effective for session-based recommendation problems. In this work, we report a similar phenomenon for the streaming configuration. We first reproduce the results of the mentioned GNN method and then show that simpler methods are able to outperform this complex state-of-the-art neural method on two datasets. Overall, our work points to continued methodological issues in the academic community, e.g., in terms of the choice of baselines and reproducibility.

    Full text in ACM Digital Library

  • PASelf-Supervised Bot Play for Transcript-Free Conversational Recommendation with Rationales
    by Shuyang Li (UC San Diego, United States), Bodhisattwa Prasad Majumder (UC San Diego, United States), Julian McAuley (UC San Diego, United States)

    Conversational recommender systems offer a way for users to engage in multi-turn conversations to find items they enjoy. For users to trust an agent and give effective feedback, the recommender system must be able to explain its suggestions and rationales. We develop a two-part framework for training multi-turn conversational recommenders that provide recommendation rationales that users can effectively interact with to receive better recommendations. First, we train a recommender system to jointly suggest items and explain its reasoning via subjective rationales. We then fine-tune this model to incorporate iterative user feedback via self-supervised bot-play. Experiments on three real-world datasets demonstrate that our system can be applied to different recommendation models across diverse domains to achieve state-of-the-art performance in multi-turn recommendation. Human studies show that systems trained with our framework provide more useful, helpful, and knowledgeable suggestions in warm- and cold-start settings. Our framework allows us to use only product reviews during training, avoiding the need for expensive dialog transcript datasets that limit the applicability of previous conversational recommender agents.

    Full text in ACM Digital Library

  • PAOff-Policy Actor Critic for Recommender Systems
    by Minmin Chen (Google, United States), Can Xu (Google Inc, United States), Vince Gatto (Google, United States), Devanshu Jain (Google, United States), Aviral Kumar (Google, United States), Ed Chi (Google, United States)

    Industrial recommendation platforms are increasingly concerned with how to make recommendations that cause users to enjoy their long term experience on the platform. Reinforcement learning emerged naturally as an appealing approach for its promise in 1) combating feedback loop effect resulted from myopic system behaviors; and 2) sequential planning to optimize long term outcome. Scaling RL algorithms to production recommender systems serving billions of users and contents, however has been proven hard. Sample inefficiency and instability of online RL hinder its widespread adoption in production. Offline RL enables usage of off-policy data and batch learning. It on the other hand faces major challenges in learning due to the distribution shift.

    A REINFORCE agent [3] was successfully tested for YouTube recommendation, showing significant improvement over a sophisticated supervised learning production system. Off-policy correction was employed to learn from logged data. To control variance in learning, the authors adopted one-step approximation to the full trajectory correction. This however introduces bias in learning, producing sub-optimal policies in optimizing the defined long term outcome. Here we share the key designs in setting up an off-policy actor-critic agent for production recommender systems. It extends [3] with a critic network that estimates the value of any state-action pairs under the target learned policy through temporal difference learning, addressing the aforementioned bias. We demonstrate in offline and live experiments that the new framework out-performs baseline and improves long term user experience.

    An interesting discovery along our investigation is that recommendation agents, which commonly employ a softmax policy parameterization, can end up being too pessimistic about out-of-distribution (OOD) actions. The phenomenon contrasts with findings in the general RL community, and suggests new research directions in advancing RL for recommender systems.

    Full text in ACM Digital Library

Diamond Supporter
 
Platinum Supporters
 
 
 
Gold Supporters
 
 
 
 
 
 
Challenge Sponsor
 
Special Supporters
In-Cooperation