RecSys 2022 - Session 2: Sequential Recommendation - RecSys

Session: Sequential Recommendation

Date: Monday September 19, 2:00 PM – 3:30 PM (PDT)

PAAspect Re-distribution for Learning Better Item Embeddings in Sequential Recommendation
by Wei Cai (Zhejiang university, China), Weike Pan (Shenzhen University, China), Jingwen Mao (Computer Science, China), Zhechao Yu (Zhejiang University, China), congfu xu (Zhejiang University, China)

Sequential recommendation has attracted a lot of attention from both academia and industry. Since item embeddings directly affect the recommendation results, their learning process is very important. However, most existing sequential models may introduce bias when updating the item embeddings. For example, in a sequence where all items are endorsed by a same celebrity, the co-occurrence of two items only indicates their similarity in terms of endorser, and is independent of the other aspects such as category and color. The existing models often update the entire item as a whole or update different aspects of the item without distinction, which fails to capture the contributions of different aspects to the co-occurrence pattern. To overcome the above limitations, we propose aspect re-distribution (ARD) to focus on updating the aspects that are important for co-occurrence. Specifically, we represent an item using several aspect embeddings with the same initial importance. We then re-calculate the importance of each aspect according to the other items in the sequence. Finally, we aggregate these aspect embeddings into a single aspect-aware embedding according to their importance. The aspect-aware embedding can be provided as input to a successor sequential model. Updates of the aspect-aware embedding are passed back to the aspect embeddings based on their importance. Therefore, different from the existing models, our method pay more attention to updating the important aspects. In our experiments, we choose self-attention networks as the successor model. The experimental results on four real-world datasets indicate that our method achieves very promising performance in comparison with seven state-of-the-art models. For reproduction, we will release the data and code at https://anonymous.4open.science/r/anonymity-C082.

Full text in ACM Digital Library

PADefending Substitution-based Profile Pollution Attacks on Sequential Recommenders
by Zhenrui Yue (University of Illinois Urbana-Champaign, United States), Huimin Zeng (University of Illinois Urbana-Champaign, United States), Ziyi Kou (University of Illinois Urbana-Champaign, United States), Lanyu Shang (University of Illinois Urbana-Champaign, United States), Dong Wang (University of Illinois at Urbana-Champaign, United States)

While sequential recommender systems achieve significant improvements on capturing user dynamics, we argue that sequential recommenders are vulnerable against substitution-based profile pollution attacks. To demonstrate our hypothesis, we propose a substitution-based adversarial attack algorithm, which modifies the input sequence by selecting certain vulnerable elements and substituting them with adversarial items. In both the untargeted and targeted attack scenarios, we observe significant performance deterioration using the proposed profile pollution algorithm. Motivated by such observations, we design an adversarial defense method called Dirichlet neighborhood sampling. Specifically, we sample item embeddings from a convex hull constructed by multi-hop neighbors to replace the original items in input sequences. During sampling, a Dirichlet distribution is used to approximate the probability distribution in the neighborhood such that the recommender learns to combat local perturbations. Additionally, we design an adversarial training method tailored for sequential recommender systems. In particular, we represent selected items with one-hot encodings and perform gradient ascend on the encodings to search for the worst case linear combination of item embeddings in training. As such, the embedding function learns robust representations and the trained recommender is resistant to test-time adversarial examples. Extensive experiments show the effectiveness of both our attack and defense methods, which consistently outperform baselines by a significant margin across model architectures and datasets.

Full text in ACM Digital Library

PAContext and Attribute-Aware Sequential Recommendation via Cross-Attention
by Ahmed Rashed (University of Hildesheim, Germany), Shereen Elsayed (University of Hildesheim, Germany), Lars Schmidt-Thieme (University of Hildesheim, Germany)

In sparse recommender settings, users’ context and item attributes play a crucial role in deciding which items to recommend next. Despite that, recent works in sequential and time-aware recommendations usually either ignore both aspects or only consider one of them, limiting their predictive performance. In this paper, we address these limitations by proposing a context and attribute-aware recommender model (CARCA) that can capture the dynamic nature of the user profiles in terms of contextual features and item attributes via dedicated multi-head self-attention blocks that extract profile-level features and predicting item scores. Also, unlike many of the current state-of-the-art sequential item recommendation approaches that use a simple dot-product between the most recent item’s latent features and the target items embeddings for scoring, CARCA uses cross-attention between all profile items and the target items to predict their final scores. This cross-attention allows CARCA to harness the correlation between old and recent items in the user profile and their influence on deciding which item to recommend next. Experiments on four real-world recommender system datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of item recommendation and achieving improvements of up to 53\% in Normalized Discounted Cumulative Gain (NDCG) and Hit-Ratio. Results also show that CARCA outperformed several state-of-the-art dedicated image-based recommender systems by merely utilizing image attributes extracted from a pre-trained ResNet50 in a black-box fashion.

Full text in ACM Digital Library

PAEffective and Efficient Training for Sequential Recommendation using Recency Sampling
by Aleksandr Petrov (the University of Glasgow, United Kingdom), Craig Macdonald (University of Glasgow, United Kingdom)

Many modern sequential recommender systems use deep neural networks, which can effectively estimate the relevance of items but
require a lot of time to train. Slow training increases expenses, hinders product development timescales and prevents the model from
being regularly updated to adapt to changing user preferences. Training such sequential models involves appropriately sampling
past user interactions to create a realistic training objective. The existing training objectives have limitations. For instance, next item
prediction never uses the beginning of the sequence as a learning target, thereby potentially discarding valuable data. On the other
hand, the item masking used by BERT4Rec is only weakly related to the goal of the sequential recommendation; therefore, it requires
much more time to obtain an effective model. Hence, we propose a novel Recency-based Sampling of Sequences training objective that
addresses both limitations. We apply our method to various recent and state-of-the-art model architectures – such as GRU4Rec, Caser,
and SASRec. We show that the models enhanced with our

Full text in ACM Digital Library

REPA Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation
by Aleksandr Petrov (the University of Glasgow, United Kingdom), Craig Macdonald (University of Glasgow, United Kingdom)

BERT4Rec is an effective model for sequential recommendation based on the Transformer architecture. In the original publication,
BERT4Rec claimed superiority over other available sequential recommendation approaches (e.g. SASRec), and it is now frequently
being used as a state-of-the art baseline for sequential recommendations. However, not all later publications confirmed this result
and proposed other models that were shown to outperform BERT4Rec in effectiveness. In this paper we systematically review all
publications that compare BERT4Rec with another popular Transformer-based model, namely SASRec, and show that BERT4Rec
results are not consistent within these publications. To understand the reasons behind this inconsistency, we analyse the available
implementations of BERT4Rec and show that we fail to reproduce results of the original BERT4Rec publication when using their
default configuration parameters. However, we are able to replicate the reported results with the original code if training for much
longer amount of time (up to 30x) compared to the default configuration. We also propose our own implementation of BERT4Rec
based on the Hugging Face Transformers library, which is demonstrated to replicate the originally reported results on 3 out 4 datasets,
while requiring up to 95% less training time to converge. Overall, from our systematic review and detailed experiments, we conclude
that BERT4Rec does indeed exhibit state-of-the-art effectiveness for sequential recommendation, but only when trained for a sufficient
amount of time. Additionally, we show that our implementation can further benefit from adapting other Transformer architectures
that are available in the Hugging Face Transformers library, such as DeBERTa or ALBERT. For example, on the MovieLens-1M dataset,
we demonstrate that both these models can improve BERT4Rec performance by up to 9%. Moreover, we show that an ALBERT-based
BERT4Rec model achieves better performance on that dataset than state-of-the-art results reported in the most recent publications.

Full text in ACM Digital Library

PADenoising Self-Attentive Sequential Recommendation
by Huiyuan Chen (Visa Research, United States, Visa Research, United States), Yusan Lin (Visa Research, United States, Visa Research, United States), Menghai Pan (Visa Research, United States, Visa Research, United States), Lan Wang (Visa Research, United States), Chin-Chia Michael Yeh (Visa Inc, United States, Visa Inc, United States), Xiaoting Li (Visa Research , United States, Visa Research , United States), Yan Zheng (Visa Research, United States, Visa Research, United States), Fei Wang (Visa Research, United States, Visa Research, United States), Hao Yang (Visa Research, United States, Visa Research, United States)

Transformer-based sequential recommenders are very powerful for capturing both short-term and long-term sequential item dependencies. This is mainly attributed to their unique self-attention networks to exploit pairwise item-item interactions within the sequence. However, real-world item sequences are often noisy, which is particularly true for implicit feedback. For example, a large portion of clicks do not align well with user preferences, and many products end up with negative reviews or being returned. As such, the current user action only depends on a subset of items, not on the entire sequences. Many existing Transformer-based models use full attention distributions, which inevitably assign certain credits to irrelevant items. This may lead to sub-optimal performance if Transformers are not regularized properly.

Here we propose the Rec-denoiser model for better training of self-attentive recommender systems. In Rec-denoiser, we aim to adaptively prune noisy items that are unrelated to the next item prediction. To achieve this, we simply attach each self-attention layer with a trainable binary mask to prune noisy attentions, resulting in sparse and clean attention distributions. This largely purifies item-item dependencies and provides better model interpretability. In addition, the self-attention network is typically \textsl{not} Lipschitz continuous and is vulnerable to small perturbations. Jacobian regularization is further applied to the Transformer blocks to improve the robustness of Transformers for noisy sequences. Our Rec-denoiser is a general plugin that is compatible to many Transformers. Quantitative results on real-world datasets show that our Rec-denoiser outperforms the state-of-the-art baselines.

Full text in ACM Digital Library

Session: Sequential Recommendation

RecSys 2022 (Seattle)

Diamond Supporter

Platinum Supporters

Gold Supporters

Challenge Sponsor

Special Supporters

In-Cooperation

About this site

RecSys 2026

About the photos on this site