RecSys 2020 - Session P6: Unbiased Recommendation and Evaluation - RecSys

Paper Session P6: Unbiased Recommendation and Evaluation

Session A: 00:00 – 1:30, chaired by Olfa Nasraoui and Yongfeng Zhang. Attend in Whova
Session B: 11:00 – 12:30, chaired by Pablo Castells and Martha Larson. Attend in Whova

LPUnbiased Ad Click Prediction for Position-aware Advertising Systems
by Bowen Yuan (National Taiwan University), Yaxu Liu (National Taiwan University), Jui-Yang Hsia (National Taiwan University), Zhenhua Dong (Huawei Noah’s ark lab), Chih-Jen Lin (National Taiwan University)

Click-through rate (CTR) prediction is a core problem of building advertising systems. In many real-world applications, because an ad placed in various positions has different click probabilities, the position information should be considered in both training and prediction. For such position-aware systems, existing approaches learn CTR models from clicks/not-clicks on historically displayed events by leveraging the position information in different ways. In this work, we explain that these approaches may give a heavily biased model. We first point out that in position-aware systems, two different types of selection biases coexist in displayed events. Secondly, we explain that some approaches attempting to eliminate the position effect from clicks/not-clicks may possess an additional bias. Finally, to obtain an unbiased CTR model for position-aware systems, we propose a novel counterfactual learning framework. Experiments confirm both our analysis on selection biases and the effectiveness of our proposed counterfactual learning framework.

Full text in ACM Digital Library

LPA Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets
by Yoshifumi Seki (Gunosy Inc.), Takanori Maehara (RIKEN Center for Advanced Intelligence Project)

“This paper shows a method for building and publishing datasets in commercial services. Datasets contribute to the development of research in machine learning and recommender systems. In particular, because recommender systems play a central role in many commercial services, publishing datasets from the services are in great demand from the recommender system community. However, the publication of datasets by commercial services may have some business risks to those companies. To publish a dataset, this must be approved by a business manager of the service. Because many business managers are not specialists in machine learning or recommender systems, the researchers are responsible for explaining to them the risks and benefits.
We first summarize three challenges in building datasets from commercial services: (1) anonymize the business metrics, (2) maintain fairness, and (3) reduce the popularity bias. Then, we formulate the problem of building and publishing datasets as an optimization problem that seeks the sampling weight of users, where the challenges are encoded as appropriate loss functions. We applied our method to build datasets from the raw data of our real-world mobile news delivery service. The raw data has more than 1,000,000 users with 100,000,000 interactions. Each dataset was built in less than 10 minutes. We discussed the properties of our method by checking the statistics of the datasets and the performances of typical recommender system algorithms.”

Full text in ACM Digital Library

LPUnbiased Learning for the Causal Effect of Recommendation
by Masahiro Sato (Fuji Xerox), Sho Takemori (Fuji Xerox), Janmajay Singh (Fuji Xerox), Tomoko Ohkuma (Fuji Xerox)

Increasing users’ positive interactions, such as purchases or clicks, is an important objective of recommender systems. Recommenders typically aim to select items that users will interact with. If the recommended items are purchased, an increase in sales is expected. However, the items could have been purchased even without recommendation. Thus, we want to recommend items that results in purchases caused by recommendation. This can be formulated as a ranking problem in terms of the causal effect. Despite its importance, this problem has not been well explored in the related research. It is challenging because the ground truth of causal effect is unobservable, and estimating the causal effect is prone to the bias arising from currently deployed recommenders. This paper proposes an unbiased learning framework for the causal effect of recommendation. Based on the inverse propensity scoring technique, the proposed framework first constructs unbiased estimators for ranking metrics. Then, it conducts empirical risk minimization on the estimators with propensity capping, which reduces variance under finite training samples. Based on the framework, we develop an unbiased learning method for the causal effect extension of a ranking metric. We theoretically analyze the unbiasedness of the proposed method and empirically demonstrate that the proposed method outperforms other biased learning methods in various settings.

Full text in ACM Digital Library

LPDoubly Robust Estimator for Ranking Metrics with Post-Click Conversions
by Yuta Saito (Tokyo Institute of Technology)

Post-click conversion, a pre-defined action on a web service after a click, is an essential form of feedback, as it directly contributes to the final revenue and accurately captures user preferences for items, compared with the ambiguous click. However, naively using post-click conversions can lead to severe bias when learning or evaluating recommenders because of the selection bias between clicked and unclicked data. In this study, we address the offline evaluation problem of algorithmic recommendations with biased post-click conversions. A possible solution to address this bias is to use the inverse propensity score estimator, as it can provide an unbiased evaluation even with the selection bias. However, this estimator is known to be subject to variance and instability problems, which can be severe in the recommendation setting, as feedback is often highly sparse. To address these limitations with the previous unbiased estimator, we propose a doubly robust estimator for the ground-truth ranking performance of a given recommender. The proposed estimator is unbiased against the ground-truth ranking metric and improves the variance and estimation error tail bound of the existing unbiased estimator. Finally, to evaluate the empirical efficacy of the proposed estimator, we conduct empirical evaluations using semi-synthetic and two public real-world datasets. The results show that the proposed metric reveals a better model evaluation performance compared with existing baseline metrics, particularly in a situation with severe selection bias.

Full text in ACM Digital Library

REPAre We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison
by Zhu Sun (Macquarie University), Di Yu (Shanghai University of Finance & Economics), Hui Fang (Shanghai University of Finance and Economics), Jie Yang (Delft University of Technology), Xinghua Qu (Nanyang Technological University), Jie Zhang (Nanyang Technological University), Cong Geng (Shanghai University of Finance and Economics)

With tremendous amount of recommendation algorithms proposed every year, one critical issue has attracted a considerable amount of attention: there are no effective benchmarks for evaluation, which leads to two major concerns, i.e., unreproducible evaluation and unfair comparison. This paper aims to conduct rigorous (i.e., reproducible and fair) evaluation for implicit-feedback based top-N recommendation algorithms. We first systematically review 85 recommendation papers published at eight top-tier conferences (e.g., RecSys, SIGIR) to summarize important evaluation factors, e.g., data splitting and parameter tuning strategies, etc. Through a holistic empirical study, the impacts of different factors on recommendation performance are then analyzed in-depth. Following that, we create benchmarks with standardized procedures and provide the performance of seven well-tuned state-of-the-arts across six metrics on six widely-used datasets as a reference for later study. Additionally, we release a user-friendly Python toolkit, which differs from existing ones in addressing the broad scope of rigorous evaluation for recommendation. Overall, our work sheds light on the issues in recommendation evaluation and lays the foundation for further investigation. Our code and datasets are available at GitHub (https://github.com/AmazingDD/daisyRec).

Full text in ACM Digital Library

INCounterfactual Learning for Recommender System
by Zhenhua Dong (Huawei Noah’s Ark Lab), Hong Zhu (Huawei Noah’s Ark Lab), Pengxiang Cheng (Huawei Noah’s Ark Lab), Xinhua Feng (Huawei Noah’s Ark Lab), Guohao Cai (Huawei Noah’s Ark Lab) Xiuqiang He (Huawei Noah’s Ark Lab), Jun Xu (Gaoling School of Artificial Intelligence, Renmin University of China), Jirong Wen (Gaoling School of Artificial Intelligence, Renmin University of China)

“Most commercial industrial recommender systems have built their closed feedback loops. Though it is helpful in item recommendation and model training, the closed feedback loop may lead to the so-called bias problems, including the position bias, selection bias and popularity bias. The recommendation models trained with biased may hurt the user experiences by recommending homogenous items. How to control the biases in the closed feedback loop has become one of major challenges in modern recommender systems. This talk discusses the counterfactual learning technologies for tackling the bias problem in recommendation.
The talk consists of four parts.
The first part, briefly introduces the counterfactual learning with two cases from the academic perspective [4, 5].
The second part illustrates the position bias and selection bias based on two real examples. These examples inspire us to study “How to use counterfactual technology for recommender system?” from the industry perspective.
In the third part, we firstly encourage the audiences to think an important question: “What kind of data can learn an unbiased model?” After that, we propose four counterfactual learning approaches and related studies, as shown in Figure1.
Figure 1: The four counterfactual learning approaches for recommender system.
Approach 1: Learn from counterfactual data. We need to learn full-information model with partial observed information data. The full-information model is an unbiased model, which is trained by both observed data and unobserved data (including counterfactual data), but how to model unobserved data? One common approach is direct method [2]. In this talk, we introduce a novel counterfactual learning framework [8], first, an imputation model can by learned by a small amount of unbiased uniform data, then the imputation model can be used to predict labels of all counterfactual samples, finally, we train a counterfactual recommendation model with both observed and counterfactual samples.
Approach 2: Correct biased observed data. Inverse propensity score (IPS) is a widely studied method and relatively easy to be deployed for real products. IPS is defined as the conditional probability of receiving the treatment given pre-treatment covariates by Rosenbaum and Rubin [7]. But IPS method should satisfy two assumptions: (1) overlap, and (2) unconfoundedness. Inspired by the sample reweighting work for robust deep learning [6], we proposed a novel influence function based method to reweight training samples directly.
Approach 3: Doubly robust method. The doubly robust methods [7] have two parts: IPS method part and direct method part. John Langford etc. prove that either one part of them can debias, the doubly robust method can debias. But both of the propensity and imputation model are not easy to learn, so we present a novel propensity free doubly robust method [8] for click-through-rate (CTR) prediction task. In order to solve the efficiency of full samples (including both unobserved and observed sample) learning problem, we proposed block coordinate descend and conjugate gradient method, which can reduce the time complexity of optimization from O(m*n) to O(m+n).
Approach 4: Joint learning unbiased data and biased data. In recommender system, unbiased data is collected through randomly recommendation approach. The unbiased data is less, and its collection process is expensive. Through online A/B testing, the performance of the model trained with biased data and unbiased data together is superior to the performance of the model trained with only biased data. Causal embedding [1] method is another method to learn both biased data and unbiased data for improving the accuracy of prediction model. We also propose a general knowledge distillation framework for counterfactual recommendation via uniform data [3], which propose a general framework about how to use unbiased data with four distillation methods: label distillation, sample distillation, feature distillation and model structure distillation.
We also summarize the advantages and challenges of the above approaches.
The last part emphasizes that counterfactual learning is a rich research area, and discuss several important research topics, such as optimization for counterfactual learning, counterfactual meta learning, stable learning, fairness, unbiased learning to rank, offline policy evaluation.”

Full text in ACM Digital Library

Back to Program

Select timezone:

Pacific, USA Mountain, USA Eastern, USA Rio de Janeiro, Brazil
London, UK Central Europe Moscow, Russia Indian Standard Time
Shanghai, China Tokyo, Japan Sydney, Australia

Current time in :

Paper Session P6: Unbiased Recommendation and Evaluation

RecSys 2020 (Online)

Diamond Supporter

Platinum Supporters

Gold Supporters

Silver Supporter

Special Supporter

About this site

RecSys 2026

About the photos on this site