- INOptimizing product recommendations for millions of merchants
by Kim Falk (Shopify, Canada), Chen Karako (Shopify, Canada)
At Shopify, we serve product recommendations to customers across millions of merchants’ online stores. It is a challenge to provide optimized recommendations to all of these independent merchants; one model might create an overall improvement in our metrics on aggregate, but significantly degrade recommendations for some stores. To ensure we provide high quality recommendations to all merchant segments, we develop several models that work best in different situations based on offline evaluation. Learning which strategy best works for a given segment also allows us to start off new stores with good recommendations, without necessarily needing to rely on an individual store amassing large amounts of traffic. In production, the system will start out with the best strategy for a given merchant, and then adjust to the current environment using multi-armed bandits. Collectively, this methodology allows us to optimize the types of recommendations served on each store.
Full text in ACM Digital Library
|
- INAn Incremental Learning framework for large-scale CTR prediction
by Petros Katsileros (Deeplab, Greece), Nikiforos Mandilaras (Deeplab, Greece), Dimitrios Mallis (Deeplab, Greece), Vassilis Pitsikalis (Deeplab, Greece), Stavros Theodorakis (DeepLab, Greece), Gil Chamiel (Taboola, Israel)
In this work we introduce an incremental learning framework for click-through-rate (CTR) prediction and demonstrate its effectiveness for Taboola’s massive-scale recommendation service. Our approach enables rapid capture of emerging trends through warm-starting from previously deployed models and fine tuning on “fresh” data only. Past knowledge is maintained via a teacher-student paradigm, where the teacher acts as a distillation technique, mitigating the catastrophic forgetting phenomenon. Our incremental learning framework enables significantly faster training and deployment cycles (?12 speedup). We demonstrate a consistent RPM lift over multiple traffic segments and a significant CTR increase on newly introduced items.
Full text in ACM Digital Library
|
- PAA GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models
by Yingcan Wei (NVIDIA, China), Matthias Langer (NVIDIA, China), Fan Yu (NVIDIA, China), Minseok Lee (NVIDIA, Korea, Republic of), Jie Liu (NVIDIA, China), Ji Shi (NVIDIA, China), Zehuan Wang (NVIDIA, China)
Recommendation systems are of crucial importance for a variety of modern apps and web services, such as news feeds, social networks, e-commerce, search, etc. To achieve peak prediction accuracy, modern recommendation models combine deep learning with terabyte-scale embedding tables to obtain a fine-grained representation of the underlying data. Traditional inference serving architectures require deploying the whole model to standalone servers, which is infeasible at such massive scale.
In this paper, we provide insights into the intriguing and challenging inference domain of online recommendation systems. We propose the HugeCTR Hierarchical Parameter Server (HPS), an industry-leading distributed recommendation inference framework, that combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. Among other things, our HPS features (1) a redundant hierarchical storage system, (2) a novel high-bandwidth cache to accelerate parallel embedding lookup on NVIDIA GPUs, (3) online training support and (4) light-weight APIs for integration into existing large-scale recommendation workflows. To demonstrate the capabilities of HPS, we conducted extensive studies by using both synthetically engineered and public datasets. We show that HPS can dramatically reduce the end-to-end inference latency, achieving 5~62x speedup (depending on the batch size) for popular recommendation models over CPU baseline implementations. Through multi-GPU concurrent deployment, HPS can greatly increase the inference QPS.
Full text in ACM Digital Library
|
- INEvaluation Framework for Cold-Start Techniques in Large-Scale Production Settings
by moran haham (Outbrain, Israel)
Mitigating cold-start situations is a fundamental problem in almost any recommender system. In real-life, large-scale production systems, the challenge of optimizing the cold-start strategy is even greater.
We present an end-to-end framework for evaluating and comparing different cold-start strategies. By applying this framework in Outbrain’s recommender system, we were able to reduce our cold-start costs by half, while supporting both offline and online settings. Our framework solves the pain of benchmarking numerous cold-start techniques using surrogate accuracy metrics on offline datasets – coupled with an extensive, cost-controlled online A/B test.
In my talk, I’ll start with a short introduction to the cold-start challenge in recommender systems. Next, I will explain the motivation for a framework for cold-start techniques. I will then describe – step by step – how we used the framework to cut the size of our exploration data by more than 50%.
Full text in ACM Digital Library
|
- INTimely Personalization at Peloton: A System and Algorithm for Boosting Time-Relevant Content
by Shayak Banerjee (Peloton Interactive, Inc., United States), Vijay Pappu (Peloton Interactive, Inc., United States), Nilothpal Talukder (Peloton Interactive, Inc., United States), Shoya Yoshida (Peloton Interactive, Inc., United States), Arnab Bhadury (Peloton Interactive, Inc., United States), Allison Schloss (Peloton Interactive, Inc., United States), Jasmine Paulino (Peloton Interactive, Inc., United States)
Peloton has a subscription-based service offering access to a rich catalog of high-quality fitness classes. Not only is there a wide diversity amidst these classes but this inventory is constantly changing. This dynamic inventory introduces a new challenge for our recommender systems – surfacing timely content in addition to relevant content. We are often faced with a set of classes that are filmed and timely only during a narrow window, for example, holiday-themed classes. During this time, they need to reach a sizable audience to satisfy business goals, while also preserving user engagement. However, naïvely surfacing timely content has the potential to hurt user engagement goals. We have to factor in individual interests when choosing who to display this timely content. Our recommender system, which is already aware of users’ interests, is best placed to balance relevance and timeliness.
In this talk, we will show how we have created a system and algorithms for artificially increasing the impressions of selected sets of classes, which we call boosting. We open up control over our recommender systems for our marketing and production partners, who are able to entered timed boosts for selected classes. We show how these boosts are then honored by both our batch and real-time recommendation engines to selectively display the boosted classes higher in rankings so as to get them more visibility. We will discuss a naïve boosting algorithm, which produced high lifts in impressions but at the cost of user engagement. We will then demonstrate a batch optimization approach that leads to a better balance between engagement and impressions. We will look at results of several A/B tests on our user base, and end with a summary of benefits of this system as well as emerging challenges.
Full text in ACM Digital Library
|
- PAEANA: Reducing Privacy Risk on Large-scale Recommendation Models
by Lin Ning (Google Research, United States), Steve Chien (Google Research, United States), Shuang Song (Google Research, United States), Mei Chen (Google, United States), Qiqi Xue (Google, United States), Devora Berlowitz (Google Research, United States)
Embedding-based deep neural networks (DNNs) are widely used in large-scale recommendation systems. Differentially-private stochastic gradient descent (DP-SGD) provides a way to enable personalized experiences while preserving user privacy by injecting noise into every model parameter during the training process. However, it is challenging to apply DP-SGD to large-scale embedding-based DNNs due to its effect on training speed. This happens because the noise added by DP-SGD causes normally sparse gradients to become dense, introducing a large communication overhead between workers and parameter servers in a typical distributed training framework. This paper proposes embedding-aware noise addition (EANA) to mitigate the communication overhead, making training a large-scale embedding-based DNN possible. We examine the privacy benefit of EANA both analytically and empirically using secret sharer techniques. We demonstrate that training with EANA can achieve reasonable model precision while providing good practical privacy protection as measured by the secret sharer tests. Experiments on a real-world, large-scale dataset and model show that EANA is much faster than standard DP-SGD, improving the training speed by 54X and unblocking the training of a large-scale embedding-based DNN with reduced privacy risk.
Full text in ACM Digital Library
|