Session 1: Beyond the Headlines and Harmonies: The Focus on Music and News on Recommendation Generation and Evaluation
Date: Tuesday September 23, 11:30–13:00 (GMT+2)
Session Chair: Christine Bauer
- RESA Language Model-Based Playlist Generation Recommender System
by Enzo Charolois-Pasqua, Eléa Vellard, Youssra Rebboud, Pasquale Lisena, Raphaël TroncyThe title of a playlist often reflects an intended mood or theme, allowing creators to easily locate their content and enabling other users to discover music that matches specific situations and needs. This work presents a novel approach to playlist generation using language models to leverage the thematic coherence between a playlist title and its tracks. Our method consists in creating semantic clusters from text embeddings, followed by fine-tuning a transformer model on these thematic clusters. Playlists are then generated considering the cosine similarity scores between known and unknown titles and applying a voting mechanism. Performance evaluation, combining quantitative and qualitative metrics, demonstrates that using the playlist title as a seed provides useful recommendations, even in a zero-shot scenario.
- RESBiases in LLM-Generated Musical Taste Profiles for Recommendation
by Bruno Sguerra, Elena V. Epure, Harin Lee, Manuel MoussallamOne particularly promising use case of Large Language Models (LLMs) for recommendation is the automatic generation of Natural Language (NL) user taste profiles from consumption data. These profiles offer interpretable and editable alternatives to opaque collaborative filtering representations, enabling greater transparency and user control. However, it remains unclear whether users consider these profiles to be an accurate representation of their taste, which is crucial for trust and usability. Moreover, because LLMs inherit societal and data-driven biases, profile quality may systematically vary across user and item characteristics. In this paper, we study this issue in the context of music streaming, where personalization is challenged by a large and culturally diverse catalog. We conduct a user study in which participants rate NL profiles generated from their own listening histories. We analyze whether identification with the profiles is biased by user attributes (e.g., mainstreamness, taste diversity) and item features (e.g., genre, country of origin). We also compare these patterns to those observed when using the profiles in a downstream recommendation task. Our findings highlight both the potential and limitations of scrutable, LLM-based profiling in personalized systems.
- RESD-RDW: Diversity-Driven Random Walks for News Recommender Systems
by Runze Li, Lucien Heitz, Oana Inel, Abraham BernsteinThis paper introduces Diversity-Driven Random Walks (D-RDW), a lightweight algorithm and re-ranking technique that generates diverse news recommendations. D-RDW is a societal recommender, which combines the diversification capabilities of the traditional random walk algorithms with customizable target distributions of news article properties. In doing so, our model provides a transparent approach for editors to incorporate norms and values into the recommendation process. D-RDW shows enhanced performance across key diversity metrics that consider the articles’ sentiment and political party mentions when compared to state-of-the-art neural models. Furthermore, D-RDW proves to be more computationally efficient than existing approaches.
- RESFeedback-Driven Gradual Discovery for Expanding Musical Preferences
by Alec Nonnemaker, Ralvi Isufaj, Zoltán Szlávik, Cynthia LiemMany current recommender system techniques reinforce established tastes, leaving little room for venturing into unfamiliar music. A key challenge is our uncertainty about user preferences for previously unconsumed content, making it safer to build upon known preferences. To address this, we propose an incremental, feedback-driven method that gradually introduces users to new genres. By dynamically balancing recommendations between verified preferences and content with uncertain appeal, our approach maintains engagement while progressively expanding musical horizons. Adopting a Bayesian active learning approach, we update belief states iteratively as users provide feedback on new items. In a user study with data from a commercial music video platform, participants gradually discovered a previously unfamiliar music genre of their choosing. Comparing our method to both immediate genre introduction and passive small-step strategies without real-time adaptation, we observed significant improvements. Participants showed higher engagement with new music, stronger affinity for unfamiliar genres, and a greater sense of control, demonstrating the effectiveness of our iterative, feedback-informed strategy for broadening musical tastes. Supplementary code is available here1.
- RESIP2: Entity-Guided Interest Probing for Personalized News Recommendation
by Youlin Wu, Yuanyuan Sun, Xiaokun Zhang, Haoxi Zhan, Bo Xu, Liang Yang, Hongfei LinNews recommender systems aim to provide personalized news reading experiences for users based on their reading history. Behavioral science studies suggest that screen-based news reading contains three successive steps: scanning, title reading, and then clicking. Adhering to these steps, we find that intra-news entity interest dominates the scanning stage, while the inter-news entity interest guides title reading and influences click decisions. Unfortunately, current methods overlook the unique utility of entities in news recommendation. To this end, we propose a novel method called IP2 to probe entity-guided reading interest at both intra- and inter-news levels. At the intra-news level, a Transformer-based entity encoder is devised to aggregate mentioned entities in the news title into one signature entity. Then, a signature entity-title contrastive pre-training is adopted to initialize entities with proper meanings using the news story context, which in the meantime facilitates us to probe for intra-news entity interest. As for the inter-news level, a dual tower user encoder is presented to capture inter-news reading interest from both the title meaning and entity sides. In addition to highlighting the contribution of inter-news entity guidance, a cross-tower attention link is adopted to calibrate title reading interest using inter-news entity interest, thus further aligning with real-world behavior. Extensive experiments on two real-world datasets demonstrate that our IP2 achieves state-of-the-art performance in news recommendation.
- RESLANCE: Exploration and Reflection for LLM-based Textual Attacks on News Recommender Systems
by Yuyue Zhao, Jin Huang, Shuchang Liu, Jiancan Wu, Xiang Wang, Maarten de RijkeNews recommender systems rely on rich textual information from news articles to generate user-specific recommendations. This reliance may expose these systems to potential vulnerabilities through textual attacks. To explore this vulnerability, we propose LANCE, a LArge language model-based News Content rEwriting framework, designed to influence news rankings and highlight the unintended promotion of manipulated news. LANCE consists of two key components: an explorer and a reflector. The explorer first generates rewritten news using diverse prompts, incorporating different writing styles, sentiments, and personas. We then collect these rewrites, evaluate their ranking impact within news recommender systems, and apply a filtering mechanism to retain effective rewrites. Next, the reflector fine-tunes an open-source LLM using the successful rewrites, enhancing its ability to generate more effective textual attacks. Experimental results demonstrate the effectiveness of LANCE in manipulating rankings within news recommender systems. Unlike attacks in other recomendation domains, negative and neutral rewrites consistently outperform positive ones, revealing a unique vulnerability specific to news recommendation. Once trained, LANCE successfully attacks unseen news recommender systems (i.e., those for which LANCE received no information during training), highlighting its generalization ability and exposing shared vulnerabilities across different systems. Our work underscores the urgent need for research on textual attacks and paves the way for future studies on defense strategies.
- RESMitigating Latent User Biases in Pre-trained VAE Recommendation Models via On-demand Input Space Transformation
by David Penz, Gustavo Junior Escobedo Ticona, Markus SchedlRecommender systems can unintentionally encode protected attributes (e.g., gender, country, or age) in their learned latent user representations. Current in-processing debiasing approaches, notably adversarial training, effectively reduce the encoded information on private user attributes. These approaches modify the model parameters during training. Thus, to alternate between biased and debiased model, two separate models have to be trained. In contrast, we propose a novel method to debias recommendation models post-training, which allows switching between biased and debiased model at inference time. Focusing on state-of-the-art variational autoencoder (VAE) architectures, our method aims to reduce bias at input level (user–item interactions) by learning a transformation from input space to a debiased subspace. As the output of this transformation lies in the same space as the original input vector, we can use transformed (debiased) input vectors without the need to fine-tune the pre-trained model. We evaluate the effectiveness of our method on three datasets, MovieLens-1M, LFM2b-DemoBias, and EB-NeRD, from the movie, music, and news domains, respectively. Our experiments show that the proposed method achieves task performance (in terms of NDCG) and debiasing strength (in terms of balanced accuracy of an attacker network) that are comparable to applying adversarial training during the initial training procedure, while providing the added functionality of alternating between biased and debiased model at inference time.
- REPRInformfully Recommenders – Reproducibility Framework for Diversity-aware Intra-session Recommendations
by Lucien Heitz, Runze Li, Oana Inel, Abraham BernsteinNorm-aware recommender systems have gained increased attention, especially for diversity optimization. The recommender systems community has well-established experimentation pipelines that support reproducible evaluations by facilitating models’ benchmarking and comparisons against state-of-the-art methods. However, to the best of our knowledge, there is currently no reproducibility framework to support thorough norm-driven experimentation at the pre-processing, in-processing, post-processing, and evaluation stages of the recommender pipeline. To address this gap, we present Informfully Recommenders, a first step towards a normative reproducibility framework that focuses on diversity-aware design built on Cornac. Our extension provides an end-to-end solution for implementing and experimenting with normative and general-purpose diverse recommender systems that cover 1) dataset pre-processing, 2) diversity-optimized models, 3) dedicated intra-session item re-ranking, and 4) an extensive set of diversity metrics. We demonstrate the capabilities of our extension through an extensive offline experiment in the news domain.
- REPRYambda-5B — A Large-Scale Multi-Modal Dataset for Ranking and Retrieval
by Alexander Ploshkin, Vladislav Tytskiy, Alexey Pismenny, Vladimir Baikalov, Evgeny Taychinov, Artem Permiakov, Daniil Burlakov, Eugene KroftoWe present Yambda-5B, a large-scale open dataset sourced from the Yandex.Music streaming platform. Yambda-5B contains 4.79 billion user-item interactions from 1 million users across 9.39 million tracks. The dataset includes two primary types of interactions: implicit feedback (listening events) and explicit feedback (likes, dislikes, unlikes and undislikes). In addition, we provide audio embeddings for most tracks, generated by a convolutional neural network trained on audio spectrograms. A key distinguishing feature of Yambda-5B is the inclusion of the is_organic flag, which separates organic user actions from recommendation-driven events. This distinction is critical for developing and evaluating machine learning algorithms, as Yandex.Music relies on recommender systems to personalize track selection for users. To support rigorous benchmarking, we introduce an evaluation protocol based on a Global Temporal Split, allowing recommendation algorithms to be assessed in conditions that closely mirror real-world use. We report benchmark results for standard baselines (ItemKNN, iALS) and advanced models (SANSA, SASRec) using a variety of evaluation metrics. By releasing Yambda-5B to the community, we aim to provide a readily accessible, industrial-scale resource to advance research, foster innovation, and promote reproducible results in recommender systems.
RecSys 2025 (Prague)
- About the Conference
- Program at Glance
- Program
- Registration
- Accommodation
- Important Dates
- Call for Contributions
- Accepted Contributions
- Keynotes
- Challenge
- Workshops
- Tutorials
- Women in RecSys
- Committees
- Location
- Inclusion
- Grants
- Student Volunteers
- Summer School
- Sponsors
- Card Game Rules
- Conference App
- Awards





















