Paper Session 3: Deep Learning for Recommender Systems

Date: Tuesday, Sept 17, 2019, 11:00-12:30
Location: Auditorium
Chair: Giovanni Semeraro

  • LPAre We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches
    by Maurizio Ferrari Dacrema, Paolo Cremonesi, Dietmar Jannach

    Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difficult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today’s research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work,we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced based on the provided code. For these methods, it however turned out that 6 of them can be often outperformed with comparably simple heuristic methods based on nearest-neighbor techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.

  • LPA Deep Learning System for Predicting Size and Fit in Fashion E-Commerce
    by Abdul-Saboor Sheikh, Romain Guigourès, Evgenii Koriagin, Yuen King Ho, Reza Shirvany, Roland Vollgraf, Urs Bergmann

    Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.

  • LPRelaxed Softmax for PU Learning
    by Ugo Tanielian, Flavian Vasile

    In recent years, the softmax model and its fast approximations have become the de-facto loss functions for deep neural networks when dealing with multi-class prediction. This loss has been extended to language modeling and recommendation, two fields that fall into the framework of learning from Positive and Unlabeled data. In this paper, we stress the different drawbacks of the current family of softmax losses and sampling schemes when applied in a Positive and Unlabeled learning setup. We propose both a Relaxed Softmax loss (RS) and a new negative sampling scheme based on a Boltzmann formulation. We show that the new training objective is better suited for the tasks of density estimation, item similarity and next-event prediction by driving uplifts in performance on textual and recommendation datasets against classical softmax.

  • LPStyle Conditioned Recommendations
    by Murium Iqbal, Kamelia Aryafar, Timothy Anderton

    We propose Style Conditioned Recommendations (SCR) and introduce style injection as a method to diversify recommendations. We use Conditional Variational Autoencoder (CVAE) architecture, where both the encoder and decoder are conditioned on a user profile learned from item content data. This allows us to apply style transfer methodologies to the task of recommendations, which we refer to as injection. To enable style injection, user profiles are learned to be interpretable such that they express users’ propensities for specific predefined styles. These are learned via label-propagation from a dataset of item content, with limited labeled points. To perform injection, the condition on the encoder is learned while the condition on the decoder is selected per explicit feedback. Explicit feedback can be taken either from a user’s response to a style or interest quiz, or from item ratings. In the absence of explicit feedback, the condition at the encoder is applied to the decoder. We show a 12% improvement on NDCG@20 over the traditional VAE based approach on the task of recommendations. We show an average 22% improvement on AUC across all classes for predicting user style profiles against our best performing baseline. After injecting styles we compare the user style profile to the style of the recommendations and show that injected styles have an average +133% increase in presence. Our results show that style injection is a powerful method to diversify recommendations while maintaining personal relevance. Our main contribution is an application of a semi-supervised approach that extends item labels to interpretable user profiles.

  • LPDeep Language-based Critiquing for Recommender Systems
    by Ga Wu, Kai Luo, Scott Sanner, Harold Soh

    Critiquing is a method for conversational recommendation that adapts recommendations in response to user preference feedback regarding item attributes. Historical critiquing methods were largely based on constraint- and utility-based methods for modifying recommendations w.r.t. these critiqued attributes. In this paper, we revisit the critiquing approach from the lens of deep learning based recommendation methods and language-based interaction. Concretely, we propose an end-to-end deep learning framework with two variants that extend the Neural Collaborative Filtering architecture with explanation and critiquing components. These architectures not only predict personalized keyphrases for a user and item but also embed language-based feedback in the latent space that in turn modulates subsequent critiqued recommendations. We evaluate the proposed framework on two recommendation datasets containing user reviews. Empirical results show that our modified NCF approach not only provides a strong baseline recommender and high-quality personalized item keyphrase suggestions, but that it also properly suppresses items predicted to have a critiqued keyphrase. In summary, this paper provides a first step to unify deep recommendation and language-based feedback in what we hope to be a rich space for future research in deep critiquing for conversational recommendation.

  • SPOPredictability Limits in Session-based Next Item Recommendation
    by Priit Järv

    Session-based recommendations are based on the user’s recent actions, for example, the items they have viewed during the current browsing session or the sightseeing places they have just visited. Closely related is sequence-aware recommendation, where the choice of the next item should follow from the sequence of previous actions. We study seven benchmarks for session-based recommendation, covering retail, music and news domains to investigate how accurately user behavior can be predicted from the session histories. We measure the entropy rate of the data and estimate the limit of predictability to be between 44% and 73% in the included datasets. We establish some algorithm-specific limits on prediction accuracy for Markov chains, association rules and k-nearest neighbors methods. With most of the analyzed methods, the algorithm design limits their performance with sparse training data. The session based k-nearest neighbors are least restricted in comparison and have room for improvement across all of the analyzed datasets.

Back to Program

Diamond Supporters
Platinum Supporters
Gold Supporters
Silver Supporters
Special Supporter