- PAFast Multi-Step Critiquing for VAE-based Recommender Systems
by Diego Antognini (Artificial Intelligence Laboratory École Polytechnique Fédérale de Lausanne, Switzerland) and Boi Faltings (LIA EPFL, Switzerland)
Recent studies have shown that providing personalized explanations alongside recommendations increases trust and perceived quality. Furthermore, it gives users an opportunity to refine the recommendations by critiquing parts of the explanations. On one hand, current recommender systems model the recommendation, explanation, and critiquing objectives jointly, but this creates an inherent trade-off between their respective performance. On the other hand, although recent latent linear critiquing approaches are built upon an existing recommender system, they suffer from computational inefficiency at inference due to the objective optimized at each conversation’s turn. We address these deficiencies with M&Ms-VAE, a novel variational autoencoder for recommendation and explanation that is based on multimodal modeling assumptions. We train the model under a weak supervision scheme to simulate both fully and partially observed variables. Then, we leverage the generalization ability of a trained M&Ms-VAE model to embed the user preference and the critique separately. Our work’s most important innovation is our critiquing module, which is built upon and trained in a self-supervised manner with a simple ranking objective. Experiments on four real-world datasets demonstrate that among state-of-the-art models, our system is the first to dominate or match the performance in terms of recommendation, explanation, and multi-step critiquing. Moreover, M&Ms-VAE processes the critiques up to 25.6x faster than the best baselines. Finally, we show that our model infers coherent joint and cross generation, even under weak supervision, thanks to our multimodal-based modeling and training scheme.
Full text in ACM Digital Library
|
- INLearning a Voice-based Conversational Recommender using Offline Policy Optimization
by Francois Mairesse (Amazon, United States), Zhonghao Luo (Amazon, United States), and Tao Ye (Amazon, United States)
Voice-based conversational recommenders offer a natural way to improve recommendation quality by asking the user for missing information. This talk details how we use offline policy optimization to learn a dialog manager that determines what items to present and what clarifying questions to ask, in order to maximize the success of the conversation. Counter-factual learning allows us to compare various modeling techniques using only logged conversational data. Our approach is applied to Amazon Music’s first voice browsing experience (Alexa, help me find music), which interleaves disambiguation questions and music sample suggestions. Offline policy evaluation results show that an XGBoost reward regressor outperforms linear and neural policies on held out data. A first user-facing A/B test confirms our offline results, by increasing our task completion rate by 8% relative compared to our production rule-based conversational recommender, while reducing the number of turns to complete the task by 20%. A second A/B test shows that extending the set of candidate items to present and adding an embedding-based user-item affinity action feature improves task success rate further by 4% relative, while reducing the number of turns further by 13%. These results suggest that offline policy optimization from conversation logs is a viable way to foster conversational recommender research, while minimizing the number of user-facing experiments needed to determine the optimal dialog policy.
Full text in ACM Digital Library
|
- PALarge-scale Interactive Conversational Recommendation System using Actor-Critic Framework
by Ali Montazeralghaem (University of Massachusetts Amherst, United States), James Allan (University of Massachusetts Amherst, United States), and Philip S. Thomas (University of Massachusetts Amherst, United States)
We propose AC-CRS, a novel conversational recommendation system based on reinforcement learning that better models user interaction compared to prior work. Interactive recommender systems expect an initial request from a user and then iterate by asking questions or recommending potential matching items, continuing until some stopping criterion is achieved. Unlike most existing works that stop as soon as an item is recommended, we model the more realistic expectation that the interaction will continue if the item is not appropriate. Using this process, AC-CRS is able to support a more flexible conversation with users. Unlike existing models, AC-CRS is able to estimate a value for each question in the conversation to make sure that questions asked by the agent are relevant to the target item (i.e., user needs). We also model the possibility that the system could suggest more than one item in a given turn, allowing it to take advantage of screen space if it is present. AC-CRS also better accommodates the massive space of items that a real-world recommender system must handle. Experiments on real-world user purchasing data show the effectiveness of our model in terms of standard evaluation measures such as NDCG.
Full text in ACM Digital Library
|
- REPGeneration-based vs. Retrieval-based Conversational Recommendation: A User-Centric Comparison
by Ahtsham Manzoor (University of Klagenfurt, Austria) and Dietmar Jannach (University of Klagenfurt, Austria)
In the past few years we observed a renewed interest in conversational recommender systems (CRS) that interact with users in natural language. Most recent research efforts use neural models trained on recorded recommendation dialogs between humans, supporting an end-to-end learning process. Given the user’s utterances in a dialog, these systems aim to generate appropriate responses in natural language based on the learned models. An alternative to such language generation approaches is to retrieve and possibly adapt suitable sentences from the recorded dialogs. Approaches of this latter type are explored only to a lesser extent in the current literature. In this work, we revisit the potential value of retrieval-based approaches to conversational recommendation. To that purpose, we compare two recent deep learning models for response generation with a retrieval-based method that determines a set of response candidates using a nearest-neighbor technique and heuristically reranks them. We adopt a user-centric evaluation approach, where study participants (N=60) rated the responses of the three compared systems. We could reproduce the claimed improvement of one of the deep learning methods over the other. However, the retrieval-based system outperformed both language generation based approaches in terms of the perceived quality of the system responses. Overall, our study suggests that retrieval-based approaches should be considered as an alternative or complement to modern language generation-based approaches.
Full text in ACM Digital Library
|
- PAThe role of preference consistency, defaults and musical expertise in users’ exploration behavior in a genre exploration recommender
by Yu Liang (Jheronimus Academy of Data Science, Netherlands) and Martijn C. Willemsen (Eindhoven University of Technology and Jheronimus Academy of Data Science, Netherlands)
Recommender systems are efficient at predicting users’ current preferences, but how users’ preferences develop over time is still under-explored. In this work, we study the development of users’ musical preferences. Exploring musical preference consistency between short-term and long-term preferences in data from earlier studies, we find that users with higher musical expertise have more consistent preferences at their top-listened artists and tags than those with lower musical expertise. Users typically chose to explore genres that were close to their current preferences, and this effect was stronger for expert users. Based on these findings we conducted a user study on genre exploration to investigate (1) whether it is possible to nudge users to explore more distant genres, and (2) how users’ exploration behaviors within a genre are influenced by default recommendation settings that balance personalization with genre representativeness in different ways. Our results show that users were more likely to select the more distant genres if these were presented at the top of the list. However, users with high musical expertise were less likely to do so, consistent with our earlier findings. When given a representative or mixed (balanced) default for exploration within a genre, users selected less personalized recommendation settings and explored further away from their current preferences, than with a personalized default. However, this effect was moderated by users’ slider usage behaviors. Overall, our results suggest that (personalized) defaults can nudge users to explore new, more distant genres and songs. However, the effect is smaller for those with higher musical expertise levels.
Full text in ACM Digital Library
|
- PAPartially Observable Reinforcement Learning for Dialog-based Interactive Recommendation
by Yaxiong Wu (University of Glasgow, United Kingdom), Craig Macdonald (School of Computing cience University of Glasgow, United Kingdom), and Iadh Ounis (University of Glasgow, United Kingdom)
A dialog-based interactive recommendation task is where users can express natural-language feedback when interacting with the recommender system. However, the users’ feedback, which takes the form of natural-language critiques about the recommendation at each iteration, can only allow the recommender system to obtain a partial portrayal of the users’ preferences. Indeed, such partial observations of the users’ preferences from their natural-language feedback make it challenging to correctly track the users’ preferences over time, which can result in poor recommendation performances and a less effective satisfaction of the users’ information needs when in presence of limited iterations. Reinforcement learning, in the form of a partially observable Markov decision process (POMDP), can simulate the interactions between a partially observable environment (i.e. a user) and an agent (i.e. a recommender system). To alleviate such a partial observation issue, we propose a novel dialog-based recommendation model, the Estimator-Generator-Evaluator (EGE) model, with Q-learning for POMDP, to effectively incorporate the users’ preferences over time. Specifically, we leverage an Estimator to track and estimate users’ preferences, a Generator to match the estimated preferences with the candidate items to rank the next recommendations, and an Evaluator to judge the quality of the estimated preferences considering the users’ historical feedback. Following previous work, we train our EGE model by using a user simulator which itself is trained to describe the differences between the target users’ preferences and the recommended items in natural language. Thorough and extensive experiments conducted on two recommendation datasets – addressing images of fashion products (namely dresses and shoes) – demonstrate that our proposed EGE model yields significant improvements in comparison to the existing state-of-the-art baseline models.
Full text in ACM Digital Library
|