Paper Session P6: Unbiased Recommendation and Evaluation

Session A: 00:001:30, chaired by Olfa Nasraoui and Yongfeng Zhang. Attend in Whova
Session B: 11:0012:30, chaired by Pablo Castells and Martha Larson. Attend in Whova

  • LPA Method to Anonymize Business Metrics to Publishing Implicit Feedback Datasets
    by Yoshifumi Seki (Gunosy Inc.), Takanori Maehara (RIKEN Center for Advanced Intelligence Project)

    “This paper shows a method for building and publishing datasets in commercial services. Datasets contribute to the development of research in machine learning and recommender systems. In particular, because recommender systems play a central role in many commercial services, publishing datasets from the services are in great demand from the recommender system community. However, the publication of datasets by commercial services may have some business risks to those companies. To publish a dataset, this must be approved by a business manager of the service. Because many business managers are not specialists in machine learning or recommender systems, the researchers are responsible for explaining to them the risks and benefits.
    We first summarize three challenges in building datasets from commercial services: (1) anonymize the business metrics, (2) maintain fairness, and (3) reduce the popularity bias. Then, we formulate the problem of building and publishing datasets as an optimization problem that seeks the sampling weight of users, where the challenges are encoded as appropriate loss functions. We applied our method to build datasets from the raw data of our real-world mobile news delivery service. The raw data has more than 1,000,000 users with 100,000,000 interactions. Each dataset was built in less than 10 minutes. We discussed the properties of our method by checking the statistics of the datasets and the performances of typical recommender system algorithms.”

    Full text in ACM Digital Library

  • LPUnbiased Learning for the Causal Effect of Recommendation
    by Masahiro Sato (Fuji Xerox), Sho Takemori (Fuji Xerox), Janmajay Singh (Fuji Xerox), Tomoko Ohkuma (Fuji Xerox)

    Increasing users’ positive interactions, such as purchases or clicks, is an important objective of recommender systems. Recommenders typically aim to select items that users will interact with. If the recommended items are purchased, an increase in sales is expected. However, the items could have been purchased even without recommendation. Thus, we want to recommend items that results in purchases caused by recommendation. This can be formulated as a ranking problem in terms of the causal effect. Despite its importance, this problem has not been well explored in the related research. It is challenging because the ground truth of causal effect is unobservable, and estimating the causal effect is prone to the bias arising from currently deployed recommenders. This paper proposes an unbiased learning framework for the causal effect of recommendation. Based on the inverse propensity scoring technique, the proposed framework first constructs unbiased estimators for ranking metrics. Then, it conducts empirical risk minimization on the estimators with propensity capping, which reduces variance under finite training samples. Based on the framework, we develop an unbiased learning method for the causal effect extension of a ranking metric. We theoretically analyze the unbiasedness of the proposed method and empirically demonstrate that the proposed method outperforms other biased learning methods in various settings.

    Full text in ACM Digital Library

  • LPDoubly Robust Estimator for Ranking Metrics with Post-Click Conversions
    by Yuta Saito (Tokyo Institute of Technology)

    Post-click conversion, a pre-defined action on a web service after a click, is an essential form of feedback, as it directly contributes to the final revenue and accurately captures user preferences for items, compared with the ambiguous click. However, naively using post-click conversions can lead to severe bias when learning or evaluating recommenders because of the selection bias between clicked and unclicked data. In this study, we address the offline evaluation problem of algorithmic recommendations with biased post-click conversions. A possible solution to address this bias is to use the inverse propensity score estimator, as it can provide an unbiased evaluation even with the selection bias. However, this estimator is known to be subject to variance and instability problems, which can be severe in the recommendation setting, as feedback is often highly sparse. To address these limitations with the previous unbiased estimator, we propose a doubly robust estimator for the ground-truth ranking performance of a given recommender. The proposed estimator is unbiased against the ground-truth ranking metric and improves the variance and estimation error tail bound of the existing unbiased estimator. Finally, to evaluate the empirical efficacy of the proposed estimator, we conduct empirical evaluations using semi-synthetic and two public real-world datasets. The results show that the proposed metric reveals a better model evaluation performance compared with existing baseline metrics, particularly in a situation with severe selection bias.

    Full text in ACM Digital Library

  • INCounterfactual Learning for Recommender System
    by Zhenhua Dong (Huawei Noah’s Ark Lab), Hong Zhu (Huawei Noah’s Ark Lab), Pengxiang Cheng (Huawei Noah’s Ark Lab), Xinhua Feng (Huawei Noah’s Ark Lab), Guohao Cai (Huawei Noah’s Ark Lab) Xiuqiang He (Huawei Noah’s Ark Lab), Jun Xu (Gaoling School of Artificial Intelligence, Renmin University of China), Jirong Wen (Gaoling School of Artificial Intelligence, Renmin University of China)

    “Most commercial industrial recommender systems have built their closed feedback loops. Though it is helpful in item recommendation and model training, the closed feedback loop may lead to the so-called bias problems, including the position bias, selection bias and popularity bias. The recommendation models trained with biased may hurt the user experiences by recommending homogenous items. How to control the biases in the closed feedback loop has become one of major challenges in modern recommender systems. This talk discusses the counterfactual learning technologies for tackling the bias problem in recommendation.
    The talk consists of four parts.
    The first part, briefly introduces the counterfactual learning with two cases from the academic perspective [4, 5].
    The second part illustrates the position bias and selection bias based on two real examples. These examples inspire us to study “How to use counterfactual technology for recommender system?” from the industry perspective.
    In the third part, we firstly encourage the audiences to think an important question: “What kind of data can learn an unbiased model?” After that, we propose four counterfactual learning approaches and related studies, as shown in Figure1.
    Figure 1: The four counterfactual learning approaches for recommender system.
    Approach 1: Learn from counterfactual data. We need to learn full-information model with partial observed information data. The full-information model is an unbiased model, which is trained by both observed data and unobserved data (including counterfactual data), but how to model unobserved data? One common approach is direct method [2]. In this talk, we introduce a novel counterfactual learning framework [8], first, an imputation model can by learned by a small amount of unbiased uniform data, then the imputation model can be used to predict labels of all counterfactual samples, finally, we train a counterfactual recommendation model with both observed and counterfactual samples.
    Approach 2: Correct biased observed data. Inverse propensity score (IPS) is a widely studied method and relatively easy to be deployed for real products. IPS is defined as the conditional probability of receiving the treatment given pre-treatment covariates by Rosenbaum and Rubin [7]. But IPS method should satisfy two assumptions: (1) overlap, and (2) unconfoundedness. Inspired by the sample reweighting work for robust deep learning [6], we proposed a novel influence function based method to reweight training samples directly.
    Approach 3: Doubly robust method. The doubly robust methods [7] have two parts: IPS method part and direct method part. John Langford etc. prove that either one part of them can debias, the doubly robust method can debias. But both of the propensity and imputation model are not easy to learn, so we present a novel propensity free doubly robust method [8] for click-through-rate (CTR) prediction task. In order to solve the efficiency of full samples (including both unobserved and observed sample) learning problem, we proposed block coordinate descend and conjugate gradient method, which can reduce the time complexity of optimization from O(m*n) to O(m+n).
    Approach 4: Joint learning unbiased data and biased data. In recommender system, unbiased data is collected through randomly recommendation approach. The unbiased data is less, and its collection process is expensive. Through online A/B testing, the performance of the model trained with biased data and unbiased data together is superior to the performance of the model trained with only biased data. Causal embedding [1] method is another method to learn both biased data and unbiased data for improving the accuracy of prediction model. We also propose a general knowledge distillation framework for counterfactual recommendation via uniform data [3], which propose a general framework about how to use unbiased data with four distillation methods: label distillation, sample distillation, feature distillation and model structure distillation.
    We also summarize the advantages and challenges of the above approaches.
    The last part emphasizes that counterfactual learning is a rich research area, and discuss several important research topics, such as optimization for counterfactual learning, counterfactual meta learning, stable learning, fairness, unbiased learning to rank, offline policy evaluation.”

    Full text in ACM Digital Library

Back to Program

Select timezone:

Current time in :

Diamond Supporter
 
Platinum Supporters
 
 
 
 
Gold Supporters
 
Silver Supporter
 
Special Supporter