- PATransformers4Rec: Bridging the Gap between NLP and Sequential / Session-Based Recommendation
by Gabriel de Souza Pereira Moreira (NVIDIA, Brazil), Sara Rabhi (NVIDIA, Canada), Jeong Min Lee (Facebook AI, United States), Ronay Ak (NVIDIA, United States), and Even Oldridge (NVIDIA, Canada)
Much of the recent progress in sequential and session-based recommendation has been driven by improvements in model architecture and pretraining techniques originating in the field of Natural Language Processing. Transformer architectures in particular have facilitated building higher-capacity models and provided data augmentation and training techniques which demonstrably improve the effectiveness of sequential recommendation. But with a thousandfold more research going on in NLP, the application of transformers for recommendation understandably lags behind. To remedy this we introduce Transformers4Rec, an open-source library built upon HuggingFace’s Transformers library with a similar goal of opening up the advances of NLP based Transformers to the recommender system community and making these advancements immediately accessible for the tasks of sequential and session-based recommendation. Like its core dependency, Transformers4Rec is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. In order to demonstrate the usefulness of the library and the applicability of Transformer architectures in next-click prediction for user sessions, where sequence lengths are much shorter than those commonly found in NLP, we have leveraged Transformers4Rec to win two recent session-based recommendation competitions. In addition, we present in this paper the first comprehensive empirical analysis comparing many Transformer architectures and training approaches for the task of session-based recommendation. We demonstrate that the best Transformer architectures have superior performance across two e-commerce datasets while performing similarly to the baselines on two news datasets. We further evaluate in isolation the effectiveness of the different training techniques used in causal language modeling, masked language modeling, permutation language modeling and replacement token detection for a single Transformer architecture, XLNet. We establish that training XLNet with replacement token detection performs well across all datasets. Finally, we explore techniques to include side information such as item and user context features in order to establish best practices and show that the inclusion of side information uniformly improves recommendation performance. Transformers4Rec library is available at https://github.com/NVIDIA-Merlin/Transformers4Rec/
Full text in ACM Digital Library
|
- PASparse Feature Factorization for Recommender Systems with Knowledge Graphs
by Vito Walter Anelli (Polytechnic University of Bari, Italy), Tommaso Di Noia (Polytechnic University of Bari, Italy), Eugenio Di Sciascio (Politecnico di Bari, Italy), Antonio Ferrara (Politecnico di Bari, Italy), and Alberto Carlo Maria Mancino (Politecnico di Bari, Italy)
Deep Learning and factorization-based collaborative filtering recommendation models have undoubtedly dominated the scene of recommender systems in recent years. However, despite their outstanding performance, these methods require a training time proportional to the size of the embeddings and it further increases when also side information is considered for the computation of the recommendation list. In fact, in these cases we have that with a large number of high-quality features, the resulting models are more complex and difficult to train. This paper addresses this problem by presenting KGFlex: a sparse factorization approach that grants an even greater degree of expressiveness. To achieve this result, KGFlex analyzes the historical data to understand the dimensions the user decisions depend on (e.g., movie direction, musical genre, nationality of book writer). KGFlex represents each item feature as an embedding and it models user-item interactions as a factorized entropy-driven combination of the item attributes relevant to the user. KGFlex facilitates the training process by letting users update only those relevant features on which they base their decisions. In other words, the user-item prediction is mediated by the user’s personal view that considers only relevant features. An extensive experimental evaluation shows the approach’s effectiveness, considering the recommendation results’ accuracy, diversity, and induced bias. The public implementation of KGFlex is available at https://split.to/kgflex.
Full text in ACM Digital Library
|
- PAProtoCF: Prototypical Collaborative Filtering for Few-shot Recommendation
by Aravind Sankar (Computer Science University of Illinois, Urbana-Champaign, United States), Junting Wang (Computer Science University of Illinois, Urbana-Champaign, United States), Adit Krishnan (Computer Science University of Illinois, Urbana-Champaign, United States), and Hari Sundaram (Computer Science University of Illinois, United States)
In recent times, deep learning methods have supplanted conventional collaborative filtering approaches as the backbone of modern recommender systems. However, their gains are skewed towards popular items with a drastic performance drop for the vast collection of long-tail items with sparse interactions. Moreover, we empirically show that prior neural recommenders lack the resolution power to accurately rank relevant items within the long-tail. In this paper, we formulate long-tail item recommendations as a few-shot learning problem of learning-to-recommend few-shot items with very few interactions. We propose a novel meta-learning framework ProtoCF that learns-to-compose robust prototype representations for few-shot items. ProtoCF utilizes episodic few-shot learning to extract meta-knowledge across a collection of diverse meta-training tasks designed to mimic item ranking within the tail. To further enhance discriminative power, we propose a novel architecture-agnostic technique based on knowledge distillation to extract, relate, and transfer knowledge from neural base recommenders. Our experimental results demonstrate that ProtoCF consistently outperforms state-of-art approaches on overall recommendation (by 5% Recall@50) while achieving significant gains (of 60-80% Recall@50) for tail items with less than 20 interactions.
Full text in ACM Digital Library
|
- PATowards Source-Aligned Variational Models for Cross-Domain Recommendation
by Aghiles Salah (Rakuten Institute of Technology, France), Thanh Binh Tran (School of Computing and Information Systems Singapore Management University, Singapore), and Hady Lauw (School of Computing and Information Systems Singapore Management University, Singapore)
Data sparsity is a long-standing challenge in recommender systems. Among existing approaches to alleviate this problem, cross-domain recommendation consists in leveraging knowledge from a source domain or category (e.g., Movies) to improve item recommendation in a target domain (e.g., Books). In this work, we advocate a probabilistic approach to cross-domain recommendation and rely on variational autoencoders (VAEs) as our latent variable models. More precisely, we assume that we have access to a VAE trained on the source domain that we seek to leverage to improve preference modeling in the target domain. To this end, we propose a model which learns to fit the target observations and align its hidden space with the source latent space jointly. Since we model the latent spaces by the variational posteriors, we operate at this level, and in particular, we investigate two approaches, namely rigid and soft alignments. In the former scenario, the variational model in the target domain is set equal to the source variational model. That is, we only learn a generative model in the target domain. In the soft-alignment scenario, the target VAE has its variational model, but which is encouraged to look like its source counterpart. We analyze the proposed objectives theoretically and conduct extensive experiments to illustrate the benefit of our contribution. Empirical results on six real-world datasets show that the proposed models outperform several comparable cross-domain recommendation models.
Full text in ACM Digital Library
|
- PATogether is Better: Hybrid Recommendations Combining Graph Embeddings and Contextualized Word Representations
by Marco Polignano (University of Bari Aldo Moro, Italy), Cataldo Musto (University of Bari, Italy), Marco de Gemmis (Dept. of Computer Science University of Bari Aldo Moro, Italy), Pasquale Lops (Department of Computer Science University of Bari Aldo Moro, Italy), and Giovanni Semeraro (Dept. of Computer Science University of Bari ALDO MORO, Italy)
In this paper, we present a hybrid recommendation framework based on the combination of graph embeddings and contextual word representations. Our approach is based on the intuition that each of the above mentioned representation models heterogeneous (and equally important) information, that is worth to be taken into account to generate a recommendation. Accordingly, we propose a strategy to combine both the features, which is based on the following steps: first, we separately generate graph embeddings and contextual word representations by exploiting state-of-the-art techniques. Next, these embeddings are used to feed a deep architecture that learns a hybrid representation based on the combination of the single groups of features. Finally, we exploit the resulting embedding to identify suitable recommendations. In the experimental session, we evaluate the effectiveness of our strategy on two datasets and results show that the use of a hybrid representation leads to an improvement of the predictive accuracy. Moreover, our approach overcomes several competitive baselines, thus confirming the validity of this work.
Full text in ACM Digital Library
|
- PAInformation Interactions in Outcome Prediction: Quantification and Interpretation using Stochastic Block Models
by Gaël Poux-Médard (ERIC Université de Lyon, France), Julien Velcin (ERIC Université de Lyon, France), and Sabine Loudcher (ERIC Université de Lyon, France)
In most real-world applications, it is seldom the case that a result appears independently from an environment. In social networks, users’ behavior results from the people they interact with, news in their feed, or trending topics. In natural language, the meaning of phrases emerges from the combination of words. In general medicine, a diagnosis is established on the basis of the interaction of symptoms. Here, we propose the Interacting Mixed Membership Stochastic Block Model (IMMSBM), which investigates the role of interactions between entities (hashtags, words, memes, etc.) and quantifies their importance within the aforementioned corpora. We find that in inference tasks, taking them into account leads to average relative changes with respect to non-interacting models of up to 150% in the probability of an outcome and greatly improves the predictions performances. Furthermore, their role greatly improves the predictive power of the model. Our findings suggest that neglecting interactions when modeling real-world phenomena might lead to incorrect conclusions being drawn.
Full text in ACM Digital Library
|