Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances

by Yuta Saito (Cornell University, USA) and Thorsten Joachims (Cornell University, USA)

Counterfactual estimators enable the use of existing log data to estimate how some new target recommendation policy would have performed, if it had been used instead of the policy that logged the data. We say that those estimators work “off-policy”, since the policy that logged the data is different from the target policy. In this way, counterfactual estimators enable Off-policy Evaluation (OPE) akin to an unbiased offline A/B test, as well as learning new recommendation policies through Off-policy Learning (OPL). The goal of this tutorial is to summarize Foundations, Implementations, and Recent Advances of OPE/OPL. Specifically, we will introduce the fundamentals of OPE/OPL and provide theoretical and empirical comparisons of conventional methods. Then, we will cover emerging practical challenges such as how to take into account combinatorial actions, distributional shift, fairness of exposure, and two-sided market structures. We will then present Open Bandit Pipeline, an open-source package for OPE/OPL, and how it can be used for both research and practical purposes. We will conclude the tutorial by presenting real-world case studies and future directions.

The learning outcomes of this tutorial are to enable the participants (such as applied researchers, practitioners, and students):

  • to know fundamental concepts and conventional methods of OPE/OPL
  • to be familiar with recent advances to address practical challenges such as fairness of exposure
  • to understand how to implement OPE/OPL in their research and applications
  • to be aware of remaining challenges and opportunities in the area

This tutorial is aimed at an audience with intermediate experience in machine learning, information retrieval, or recommender systems who are interested in using OPE/OPL methods in their research and applications. Participants are expected to have basic knowledge of machine learning, probability theory, and statistics. The tutorial will provide practical examples based on Python code and Jupyter Notebooks.

Platinum Supporters
Gold Supporters
Silver Supporters
Special Supporter