Session: Large-Scale Recommendation

Date: Wednesday September 21, 2:00 PM – 3:30 PM (PDT)

  • INOptimizing product recommendations for millions of merchants
    by Kim Falk (Shopify, Canada), Chen Karako (Shopify, Canada)

    At Shopify, we serve product recommendations to customers across millions of merchants’ online stores. It is a challenge to provide optimized recommendations to all of these independent merchants; one model might create an overall improvement in our metrics on aggregate, but significantly degrade recommendations for some stores. To ensure we provide high quality recommendations to all merchant segments, we develop several models that work best in different situations based on offline evaluation. Learning which strategy best works for a given segment also allows us to start off new stores with good recommendations, without necessarily needing to rely on an individual store amassing large amounts of traffic. In production, the system will start out with the best strategy for a given merchant, and then adjust to the current environment using multi-armed bandits. Collectively, this methodology allows us to optimize the types of recommendations served on each store.

    Full text in ACM Digital Library

  • PAA GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models
    by Yingcan Wei (NVIDIA, China), Matthias Langer (NVIDIA, China), Fan Yu (NVIDIA, China), Minseok Lee (NVIDIA, Korea, Republic of), Jie Liu (NVIDIA, China), Ji Shi (NVIDIA, China), Zehuan Wang (NVIDIA, China)

    Recommendation systems are of crucial importance for a variety of modern apps and web services, such as news feeds, social networks, e-commerce, search, etc. To achieve peak prediction accuracy, modern recommendation models combine deep learning with terabyte-scale embedding tables to obtain a fine-grained representation of the underlying data. Traditional inference serving architectures require deploying the whole model to standalone servers, which is infeasible at such massive scale.
    In this paper, we provide insights into the intriguing and challenging inference domain of online recommendation systems. We propose the HugeCTR Hierarchical Parameter Server (HPS), an industry-leading distributed recommendation inference framework, that combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. Among other things, our HPS features (1) a redundant hierarchical storage system, (2) a novel high-bandwidth cache to accelerate parallel embedding lookup on NVIDIA GPUs, (3) online training support and (4) light-weight APIs for integration into existing large-scale recommendation workflows. To demonstrate the capabilities of HPS, we conducted extensive studies by using both synthetically engineered and public datasets. We show that HPS can dramatically reduce the end-to-end inference latency, achieving 5~62x speedup (depending on the batch size) for popular recommendation models over CPU baseline implementations. Through multi-GPU concurrent deployment, HPS can greatly increase the inference QPS.

    Full text in ACM Digital Library

  • INEvaluation Framework for Cold-Start Techniques in Large-Scale Production Settings
    by moran haham (Outbrain, Israel)

    Mitigating cold-start situations is a fundamental problem in almost any recommender system. In real-life, large-scale production systems, the challenge of optimizing the cold-start strategy is even greater.

    We present an end-to-end framework for evaluating and comparing different cold-start strategies. By applying this framework in Outbrain’s recommender system, we were able to reduce our cold-start costs by half, while supporting both offline and online settings. Our framework solves the pain of benchmarking numerous cold-start techniques using surrogate accuracy metrics on offline datasets – coupled with an extensive, cost-controlled online A/B test.

    In my talk, I’ll start with a short introduction to the cold-start challenge in recommender systems. Next, I will explain the motivation for a framework for cold-start techniques. I will then describe – step by step – how we used the framework to cut the size of our exploration data by more than 50%.

    Full text in ACM Digital Library

Diamond Supporter
Platinum Supporters
Gold Supporters
Challenge Sponsor
Special Supporters