Session: Sequential Recommendation

Date: Monday September 19, 2:00 PM – 3:30 PM (PDT)

  • PAAspect Re-distribution for Learning Better Item Embeddings in Sequential Recommendation
    by Wei Cai (Zhejiang university, China), Weike Pan (Shenzhen University, China), Jingwen Mao (Computer Science, China), Zhechao Yu (Zhejiang University, China), congfu xu (Zhejiang University, China)

    Sequential recommendation has attracted a lot of attention from both academia and industry. Since item embeddings directly affect the recommendation results, their learning process is very important. However, most existing sequential models may introduce bias when updating the item embeddings. For example, in a sequence where all items are endorsed by a same celebrity, the co-occurrence of two items only indicates their similarity in terms of endorser, and is independent of the other aspects such as category and color. The existing models often update the entire item as a whole or update different aspects of the item without distinction, which fails to capture the contributions of different aspects to the co-occurrence pattern. To overcome the above limitations, we propose aspect re-distribution (ARD) to focus on updating the aspects that are important for co-occurrence. Specifically, we represent an item using several aspect embeddings with the same initial importance. We then re-calculate the importance of each aspect according to the other items in the sequence. Finally, we aggregate these aspect embeddings into a single aspect-aware embedding according to their importance. The aspect-aware embedding can be provided as input to a successor sequential model. Updates of the aspect-aware embedding are passed back to the aspect embeddings based on their importance. Therefore, different from the existing models, our method pay more attention to updating the important aspects. In our experiments, we choose self-attention networks as the successor model. The experimental results on four real-world datasets indicate that our method achieves very promising performance in comparison with seven state-of-the-art models. For reproduction, we will release the data and code at

    Full text in ACM Digital Library

  • PAContext and Attribute-Aware Sequential Recommendation via Cross-Attention
    by Ahmed Rashed (University of Hildesheim, Germany), Shereen Elsayed (University of Hildesheim, Germany), Lars Schmidt-Thieme (University of Hildesheim, Germany)

    In sparse recommender settings, users’ context and item attributes play a crucial role in deciding which items to recommend next. Despite that, recent works in sequential and time-aware recommendations usually either ignore both aspects or only consider one of them, limiting their predictive performance. In this paper, we address these limitations by proposing a context and attribute-aware recommender model (CARCA) that can capture the dynamic nature of the user profiles in terms of contextual features and item attributes via dedicated multi-head self-attention blocks that extract profile-level features and predicting item scores. Also, unlike many of the current state-of-the-art sequential item recommendation approaches that use a simple dot-product between the most recent item’s latent features and the target items embeddings for scoring, CARCA uses cross-attention between all profile items and the target items to predict their final scores. This cross-attention allows CARCA to harness the correlation between old and recent items in the user profile and their influence on deciding which item to recommend next. Experiments on four real-world recommender system datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of item recommendation and achieving improvements of up to 53\% in Normalized Discounted Cumulative Gain (NDCG) and Hit-Ratio. Results also show that CARCA outperformed several state-of-the-art dedicated image-based recommender systems by merely utilizing image attributes extracted from a pre-trained ResNet50 in a black-box fashion.

    Full text in ACM Digital Library

  • PAEffective and Efficient Training for Sequential Recommendation using Recency Sampling
    by Aleksandr Petrov (the University of Glasgow, United Kingdom), Craig Macdonald (University of Glasgow, United Kingdom)

    Many modern sequential recommender systems use deep neural networks, which can effectively estimate the relevance of items but
    require a lot of time to train. Slow training increases expenses, hinders product development timescales and prevents the model from
    being regularly updated to adapt to changing user preferences. Training such sequential models involves appropriately sampling
    past user interactions to create a realistic training objective. The existing training objectives have limitations. For instance, next item
    prediction never uses the beginning of the sequence as a learning target, thereby potentially discarding valuable data. On the other
    hand, the item masking used by BERT4Rec is only weakly related to the goal of the sequential recommendation; therefore, it requires
    much more time to obtain an effective model. Hence, we propose a novel Recency-based Sampling of Sequences training objective that
    addresses both limitations. We apply our method to various recent and state-of-the-art model architectures – such as GRU4Rec, Caser,
    and SASRec. We show that the models enhanced with our

    Full text in ACM Digital Library

  • REPA Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation
    by Aleksandr Petrov (the University of Glasgow, United Kingdom), Craig Macdonald (University of Glasgow, United Kingdom)

    BERT4Rec is an effective model for sequential recommendation based on the Transformer architecture. In the original publication,
    BERT4Rec claimed superiority over other available sequential recommendation approaches (e.g. SASRec), and it is now frequently
    being used as a state-of-the art baseline for sequential recommendations. However, not all later publications confirmed this result
    and proposed other models that were shown to outperform BERT4Rec in effectiveness. In this paper we systematically review all
    publications that compare BERT4Rec with another popular Transformer-based model, namely SASRec, and show that BERT4Rec
    results are not consistent within these publications. To understand the reasons behind this inconsistency, we analyse the available
    implementations of BERT4Rec and show that we fail to reproduce results of the original BERT4Rec publication when using their
    default configuration parameters. However, we are able to replicate the reported results with the original code if training for much
    longer amount of time (up to 30x) compared to the default configuration. We also propose our own implementation of BERT4Rec
    based on the Hugging Face Transformers library, which is demonstrated to replicate the originally reported results on 3 out 4 datasets,
    while requiring up to 95% less training time to converge. Overall, from our systematic review and detailed experiments, we conclude
    that BERT4Rec does indeed exhibit state-of-the-art effectiveness for sequential recommendation, but only when trained for a sufficient
    amount of time. Additionally, we show that our implementation can further benefit from adapting other Transformer architectures
    that are available in the Hugging Face Transformers library, such as DeBERTa or ALBERT. For example, on the MovieLens-1M dataset,
    we demonstrate that both these models can improve BERT4Rec performance by up to 9%. Moreover, we show that an ALBERT-based
    BERT4Rec model achieves better performance on that dataset than state-of-the-art results reported in the most recent publications.

    Full text in ACM Digital Library

Diamond Supporter
Platinum Supporters
Gold Supporters
Challenge Sponsor
Special Supporters