Session 2: Theory and Practice

Date: Monday 16:00-17:30 CET
Chair: Wenqi Fan (The Hong Kong Polytechnic University)

  • PANegative Interactions for Improved Collaborative Filtering: Don’t go Deeper, go Higher
    by Harald Steck (Netflix, United States) and Dawen Liang (Netflix, Inc., United States)

    The recommendation-accuracy of collaborative filtering approaches is typically improved when taking into account higher-order interactions [5, 6, 9, 10, 11, 16, 18, 24, 25, 28, 31, 34, 36, 41, 42, 44]. While deep nonlinear models are theoretically able to learn higher-order interactions, their capabilities were, however, found to be quite limited in practice [5]. Moreover, the use of low-dimensional embeddings in deep networks may severely limit their expressiveness [8]. This motivated us in this paper to explore a simple extension of linear full-rank models that allow for higher-order interactions as additional explicit input-features. Interestingly, we observed that this model-class obtained by far the best ranking accuracies on the largest data set in our experiments, while it was still competitive with various state-of-the-art deep-learning models on the smaller data sets. Moreover, our approach can also be interpreted as a simple yet effective improvement of the (linear) HOSLIM [11] model: by simply removing the constraint that the learned higher-order interactions have to be non-negative, we observed that the accuracy-gains due to higher-order interactions more than doubled in our experiments. The reason for this large improvement was that large positive higher-order interactions (as used in HOSLIM [11]) are relatively infrequent compared to the number of large negative higher-order interactions in the three well-known data-sets used in our experiments. We further characterize the circumstances where the higher-order interactions provide the most significant improvements.

    Full text in ACM Digital Library

  • INExploration in Recommender Systems
    by Minmin Chen (Google, United States)

    In the era of increasing choices, recommender systems are becoming indispensable in helping users navigate the million or billion pieces of content on recommendation platforms. As the focus of these systems shifts from attracting short-term user attention toward optimizing long term user experience on these platforms, reinforcement learning (and bandits) have emerged as appealing techniques to power these systems [5, 9, 26, 27]. The exploration-exploitation tradeoff, being the foundation of bandits and RL research, has been extensively studied [1, 2, 4, 6, 8, 10, 11, 18, 20, 21, 22, 23]. An agent is incentivized to exploit to maximize its return, i.e., by repeating actions taken in the past that produced high rewards. On the other hand, the agent needs to explore previously unseen actions in order to discover potentially better ones. Exploration has been shown to be extremely useful in solving tasks of long horizons or sparse reward in many RL applications [2, 14, 15, 16, 19]. While effective exploration is believed to positively influence the user experience on the platform, the exact value of exploration in recommender systems has not been well established.
    In this talk, we examine the roles of exploration in recommender systems in three facets: 1) system exploration to reduce system uncertainty in regions with sparse feedback; 2) user exploration to introduce users to new interests/tastes; and 3) online exploration to take into account real-time user feedback. We showcase how each aspect of exploration contributes to the long term user experience through offline and live experiments on industrial recommendation platforms. We hope this talk can inspire more follow up work in understanding and improving exploration in recommender systems.

    Full text in ACM Digital Library

  • PABlack-Box Attacks on Sequential Recommenders via Data-Free Model Extraction
    by Zhenrui Yue (Technical University of Munich, Germany), Zhankui He (UC San Diego, United States), Huimin Zeng (Technical University of Munich, Germany), and Julian McAuley (UC San Diego, United States)

    We investigate whether model extraction can be used to ‘steal’ the weights of sequential recommender systems, and the potential threats posed to victims of such attacks. This type of risk has attracted attention in image and text classification, but to our knowledge not in recommender systems. We argue that sequential recommender systems are subject to unique vulnerabilities due to the specific autoregressive regimes used to train them. Unlike many existing recommender attackers, which assume the dataset used to train the victim model is exposed to attackers, we consider a data-free setting, where training data are not accessible. Under this setting, we propose an API-based model extraction method via limited-budget synthetic data generation and knowledge distillation. We investigate state-of-the-art models for sequential recommendation and show their vulnerability under model extraction and downstream attacks.
    We perform attacks in two stages. (1) Model extraction: given different types of synthetic data and their labels retrieved from a black-box recommender, we extract the black-box model to a white-box model via distillation. (2) Downstream attacks: we attack the black-box model with adversarial samples generated by the white-box recommender. Experiments show the effectiveness of our data-free model extraction and downstream attacks on sequential recommenders in both profile pollution and data poisoning settings.

    Full text in ACM Digital Library

  • PAMatrix Factorization for Collaborative Filtering Is Just Solving an Adjoint Latent Dirichlet Allocation Model After All
    by Florian Wilhelm (inovex GmbH, Germany)

    Matrix factorization-based methods are among the most popular methods for collaborative filtering tasks with implicit feedback. The most effective of these methods do not apply sign constraints, such as non-negativity, to their factors. Despite their simplicity, the latent factors for users and items lack interpretability, which is becoming an increasingly important requirement. In this work, we provide a theoretical link between unconstrained and the interpretable non-negative matrix factorization in terms of the personalized ranking induced by these methods. We also introduce a novel, latent Dirichlet allocation-inspired model for recommenders and extend our theoretical link to also allow the interpretation of an unconstrained matrix factorization as an adjoint formulation of our new model. Our experiments indicate that this novel approach represents the unknown processes of implicit user-item interactions in the real world much better than unconstrained matrix factorization while being interpretable.

    Full text in ACM Digital Library

  • PAPessimistic Reward Models for Off-Policy Learning in Recommendation
    by Olivier Jeunen (University of Antwerp, Belgium) and Bart Goethals (University of Antwerp, Belgium)

    Methods for bandit learning from user interactions often require a model of the reward a certain context-action pair will yield – for example, the probability of a click on a recommendation. This common machine learning task is highly non-trivial, as the data-generating process for contexts and actions is often skewed by the recommender system itself. Indeed, when the deployed recommendation policy at data collection time does not pick its actions uniformly-at-random, this leads to a selection bias that can impede effective reward modelling. This in turn makes off-policy learning – the typical setup in industry – particularly challenging.
    In this work, we propose and validate a general pessimistic reward modelling approach for off-policy learning in recommendation. Bayesian uncertainty estimates allow us to express scepticism about our own reward model, which can in turn be used to generate a conservative decision rule. We show how it alleviates a well-known decision making phenomenon known as the Optimiser’s Curse, and draw parallels with existing work on pessimistic policy learning. Leveraging the available closed-form expressions for both the posterior mean and variance when a ridge regressor models the reward, we show how to apply pessimism effectively and efficiently to an off-policy recommendation use-case. Empirical observations in a wide range of environments show that being conservative in decision-making leads to a significant and robust increase in recommendation performance. The merits of our approach are most outspoken in realistic settings with limited logging randomisation, limited training samples, and larger action spaces.

    Full text in ACM Digital Library

Platinum Supporters
Gold Supporters
Silver Supporters
Special Supporter