Session 7: Scalable Performance

Date: Wednesday 14:45-15:30 CET
Chair: Neil Hurley (University College Dublin)

  • PALocal Factor Models for Large-Scale Inductive Recommendation
    by Longqi Yang (Microsoft, United States), Tobias Schnabel (Microsoft, United States), Paul N. Bennett (Microsoft, United States), and Susan Dumais (Microsoft, United States)

    In many domains, user preferences are similar locally within like-minded subgroups of users, but typically differ globally between those subgroups. Local recommendation models were shown to substantially improve top-K recommendation performance in such settings. However, existing local models do not scale to large-scale datasets with an increasing number of subgroups and do not support inductive recommendations for users not appearing in the training set. Key reasons for this are that subgroup detection and recommendation get implemented as separate steps in the model or that local models are explicitly instantiated for each subgroup. In this paper, we propose an End-to-end Local Factor Model (Elfm) which overcomes these limitations by combining both steps and incorporating local structures through an inductive bias. Our model can be optimized end-to-end and supports incremental inference, does not require a full separate model for each subgroup, and has overall small memory and computational costs for incorporating local structures. Empirical results show that our method substantially improves recommendation performance on large-scale datasets with millions of users and items with considerably smaller model size. Our user study also shows that our approach produces coherent item subgroups which could aid in the generation of explainable recommendations.

    Full text in ACM Digital Library

  • PAcDLRM: Look Ahead Caching for Scalable Training of Recommendation Models
    by Keshav Balasubramanian (SCIP University of Southern California, United States), Abdulla Alshabanah (SCIP University of Southern California, United States), Joshua D Choe (SCIP University of Southern California, United States), and Murali Annavaram (SCIP University of Southern California, United States)

    Deep learning recommendation models (DLRMs) are typically composed of two sets of parameters: large embedding tables to handle sparse categorical inputs, and neural networks such as multi-layer perceptrons (MLPs) to handle dense non-categorical inputs. Current DLRM training practices keep both these parameters in GPU memory. But as the size of the embedding tables grow, this practice of storing model parameters in GPU memory requires dozens or even hundreds of GPUs. This is an unsustainable trend with severe environmental consequences. Furthermore, such a design forces only a few conglomerates to be the gate keepers of model training. In this work, we propose cDLRM which democratizes recommendation model training by allowing a user to train on a single GPU regardless of the size of embedding tables by storing all embedding tables in CPU memory. A CPU based pre-processor analyzes training batches to prefetch embedding table slices accessed by those batches and caches them in GPU memory just-in-time. An associated caching protocol on the GPU enables efficiently updating the cached embedding table parameters. cDLRM decouples the embedding table size demands from the number of GPUs needed for compute. We first demonstrate that with cDLRM it is possible to train a large recommendation model using a single GPU regardless of model size. We then demonstrate that with its unique caching strategy, cDLRM enables pure data parallel training. We use two publicly available datasets to show that a cDLRM achieves identical model accuracy compared to a baseline trained completely on GPUs, while benefiting from large reduction in GPU demand.

    Full text in ACM Digital Library

  • PAReverse Maximum Inner Product Search: How to efficiently find users who would like to buy my item?
    by Daichi Amagata (Osaka University, Japan) and Takahiro Hara (Osaka University, Japan)

    The MIPS (maximum inner product search), which finds the item with the highest inner product with a given query user, is an essential problem in the recommendation field. It is usual that e-commerce companies face situations where they want to promote and sell new or discounted items. In these situations, we have to consider a question: who are interested in the items and how to find them? This paper answers this question by addressing a new problem called reverse maximum inner product search (reverse MIPS). Given a query vector and two sets of vectors (user vectors and item vectors), the problem of reverse MIPS finds a set of user vectors whose inner product with the query vector is the maximum among the query and item vectors. Although the importance of this problem is clear, its straightforward implementation incurs a computationally expensive cost.
    We therefore propose Simpfer, a simple, fast, and exact algorithm for reverse MIPS. In an offline phase, Simpfer builds a simple index that maintains a lower-bound of the maximum inner product. By exploiting this index, Simpfer judges whether the query vector can have the maximum inner product or not, for a given user vector, in a constant time. Besides, our index enables filtering user vectors, which cannot have the maximum inner product with the query vector, in a batch. We theoretically demonstrate that Simpfer outperforms baselines employing state-of-the-art MIPS techniques. Furthermore, our extensive experiments on real datasets show that Simpfer is about 500–8000 times faster than the baselines.

    Full text in ACM Digital Library

Platinum Supporters
Gold Supporters
Silver Supporters
Special Supporter