Tuesday Poster & Coffee Break Session

Date: Tuesday 15:30 – 16:00 CET
Chair: To be announced

  • DMMulti-Step Critiquing User Interface for Recommender Systems
    by Diana Andreea Petrescu (EPFL, Switzerland), Diego Antognini (École Polytechnique Fédérale de Lausanne, Switzerland), and Boi Faltings (EPFL, Switzerland)

    Recommendations with personalized explanations have been shown to increase user trust and perceived quality and help users make better decisions. Moreover, such explanations allow users to provide feedback by critiquing them. Several algorithms for recommender systems with multi-step critiquing have therefore been developed. However, providing a user-friendly interface based on personalized explanations and critiquing has not been addressed in the last decade. In this paper, we introduce four different web interfaces (available under https://lia.epfl.ch/critiquing/) helping users making decisions and finding their ideal item. We have chosen the hotel recommendation domain as a use case even though our approach is trivially adaptable for other domains. Moreover, our system is model-agnostic (for both recommender systems and critiquing models) allowing a great flexibility and further extensions. Our interfaces are above all a useful tool to help research in recommendation with critiquing. They allow to test such systems on a real use case and also to highlight some limitations of these approaches to find solutions to overcome them.

    Full text in ACM Digital Library

  • LBROptimizing the Selection of Recommendation Carousels with Quantum Computing
    by Maurizio Ferrari Dacrema (Politecnico di Milano, Italy), Nicolò Felicioni (Politecnico di Milano, Italy), and Paolo Cremonesi (Politecnico di Milano, Italy)

    It has been long known that quantum computing has the potential to revolutionize the way we find solutions of problems that are difficult to solve on classical computers. It was only recently that small but functional quantum computers have become available on the cloud, allowing to test their potential. In this paper we propose to leverage their capabilities to address an important task for recommender systems providers, the optimal selection of recommendation carousels. In many video-on-demand and music streaming services the user is provided with a homepage containing several recommendation lists, i.e., carousels, each built with a certain criteria (e.g., artist, mood, Action movies etc.). Choosing which set of carousels to display is a difficult problem because it needs to account for how the different recommendation lists interact, e.g., avoiding duplicate recommendations, and how they help the user explore the catalogue. We focus in particular on the adiabatic computing paradigm and use the D-Wave quantum annealer, which is able to solve NP-hard optimization problems, can be programmed by classical operations research tools and is freely available on the cloud. We propose a formulation of the carousel selection problem for black box recommenders, that can be solved effectively on a quantum annealer and has the advantage of being simple. We discuss its effectiveness, limitations and possible future directions of development.

    Full text in ACM Digital Library

  • LBRAuditing the Effect of Social Network Recommendations on Polarization in Geometrical Ideological Spaces
    by Pedro Ramaciotti Morales (Sciences Po, France) and Jean-Philippe Cointet (Sciences Po, France)

    The prevalence of algorithmic recommendations has raised public concern about undesired societal effects. A central threat is the risk of polarization, which is difficult to conceptualize and to measure, making it difficult to assess the role of Recommender Systems in this phenomenon. These difficulties have yielded two types of analyses: 1) purely topological approaches that study how recommenders isolate or connect types of nodes in a graph, and 2) spatial opinion approaches that study how recommenders change the distribution of users on a given opinion scale. The former analyses prove inadequate in settings where users are not classified into categorical types (e.g., in two-party systems with binary social divides), while the latter rely on synthetic data due to the unobservability of opinions. To overcome both difficulties we present the first analysis of friend recommendations acting on real-world sub-graphs of the Twitter network where users are embedded in multidimensional ideological spaces and in which dimensions are indicators of attitudes towards issues in the public debate. We present a polarization metric adapted to these dual topological and spatial states of social network, and use it to track both the evolution of polarization on Twitter networks where the graph evolves following well-known Recommender Systems, and opinions co-evolve following a DeGroot opinion model. We show that different recommendation principles can sometimes drive or mitigate polarization appearing in real social networks.

    Full text in ACM Digital Library

  • LBREstimating and Penalizing Preference Shift in Recommender Systems
    by Micah Carroll (UC Berkeley, United States), Dylan Hadfield-Menell (UC Berkeley, United States), Stuart Russell (UC Berkeley, United States), and Anca Dragan (UC Berkeley, United States)

    Recommender systems trained via long-horizon optimization (e.g., reinforcement learning) will have incentives to actively manipulate user preferences through the recommended content. While some work has argued for making systems myopic to avoid this issue, even such systems can induce systematic undesirable preference shifts. Thus, rather than artificially stifling the capabilities of the system, in this work we explore how we can make capable systems that explicitly avoid undesirable shifts. We advocate for (1) estimating the preference shifts that would be induced by recommender system policies, and (2) explicitly characterizing what unwanted shifts are and assessing before deployment whether such policies will produce them – ideally even actively optimizing to avoid them. These steps involve two challenging ingredients: (1) requires the ability to anticipate how hypothetical policies would influence user preferences if deployed; instead, (2) requires metrics to assess whether such influences are manipulative or otherwise unwanted. We study how to do (1) from historical user interaction data by building a user predictive model that implicitly contains their preference dynamics; to address (2), we introduce the notion of a “safe policy”, which defines a trust region within which behavior is believed to be safe. We show that recommender systems that optimize for staying in the trust region avoid manipulative behaviors (e.g., changing preferences in ways that make users more predictable), while still generating engagement.

    Full text in ACM Digital Library

  • LBRAn Analysis Of Entire Space Multi-Task Models For Post-Click Conversion Prediction
    by Conor O’Brien (Twitter, United Kingdom), Kin Sum Liu (Twitter, United States), James Neufeld (Twitter, United States), Rafael Barreto (Twitter, United States), and Jonathan J Hunt (Twitter, United Kingdom)

    Industrial recommender systems are frequently tasked with approximating probabilities for multiple, often closely related, user actions. For example, predicting if a user will click on an advertisement and if they will then purchase the advertised product. The conceptual similarity between these tasks has promoted the use of multi-task learning: a class of algorithms that aim to bring positive inductive transfer from related tasks. Here, we empirically evaluate multi-task learning approaches with neural networks for an online advertising task. Specifically, we consider approximating the probability of post-click conversion events (installs) (CVR) for mobile app advertising on a large-scale advertising platform, using the related click events (CTR) as an auxiliary task. We use an ablation approach to systematically study recent approaches that incorporate both multitask learning and “entire space modeling” which train the CVR on all logged examples rather than learning a conditional likelihood of conversion given clicked. Based on these results we show that several different approaches result in similar levels of positive transfer from the data-abundant CTR task to the CVR task and offer some insight into how the multi-task design choices address the two primary problems affecting the CVR task: data sparsity and data bias. Our findings add to the growing body of evidence suggesting that standard multi-task learning is a sensible approach to modelling related events in real-world large-scale applications and suggest the specific multitask approach can be guided by ease of implementation in an existing system.

    Full text in ACM Digital Library

  • LBRA Constrained Optimization Approach for Calibrated Recommendations
    by Sinan Seymen (Northwestern University, United States), Himan Abdollahpouri (Northwestern University, United States), and Edward C. Malthouse (Northwestern University, United States)

    In recommender systems (RS) it is important to ensure that the various (past) areas of interest of a user are reflected with their corresponding proportions in the recommendation lists. In other words, when a user has watched, say, 60 romance movies and 40 Comedy movies, then it is reasonable to expect the personalized list of recommended movies to contain about 60% romance and 40% comedy movies as well. This property is known as calibration, and it has recently received much attention in the RS community. Greedy heuristic approaches have been proposed to calibrate recommendations, and although they provide great improvements, they can result in inefficient solutions in that a better one can be missed because of the myopic nature of these algorithms. This paper addresses the calibration problem from a constrained optimization perspective and provides a model to combine both accuracy and calibration. Experimental results show that our approach outperforms the state-of-the-art heuristics for calibration in most cases on both accuracy of the recommendations and the level of calibrations the recommendation lists achieve. We give a small example to illustrate why the heuristic fails to find the optimal solution.

    Full text in ACM Digital Library

  • DSBiases in Recommendation System
    by Saumya Bhadani (University of South Florida, United States)

    Recommendation systems shape what people consume and experience online, which makes it critical to assess their effect on society and whether they are affected by any potential source of bias. My research focuses on a specific source of bias — popularity — that is especially relevant in two online contexts: news consumption, and cultural markets. Social media newsfeeds, which are designed to optimize the engagement of users, may inadvertently promote inaccurate and partisan news domains, since these are often popular among like-minded and polarized audiences. More generally, in any type of cultural market, popularity bias may lead to the suppression of niche quality products, which may not attract enough attention. In the context of online newsfeeds, I am investigating whether the political diversity of the audience of a news website can be used as signal of journalistic quality. In an analysis of a comprehensive dataset of news source reliability ratings and web browsing histories, I have shown that the diversity of the audience of a news website is a valuable signal to counter popularity bias and to promote journalistic quality. To further validate these results experimentally, I propose to have direct interactions with social media users through surveys. Here, I provide the details of a field study that I am planning to undertake using an experimental social media platform. More generally, in the context of any online cultural market, political audience diversity may not be applicable but the idea of diversity as a signal for higher quality might still be useful. For example, movies which are liked by an “age diverse” population or books which are read by racially diverse audience may have better quality than other items. However, in this research I have explored audience diversity in the context of social media newsfeeds only. For any online cultural market, I propose a method to quantify the popularity bias through an empirical analysis, using data from several existing markets. Accurately estimating the impact of popularity bias can help us advance our understanding of machine biases and also lead to the development of more robust recommendation systems.

    Full text in ACM Digital Library

  • DSLearning Dynamic Insurance Recommendations from Users’ Click Sessions
    by Simone Borg Bruun (University of Copenhagen, Denmark)

    While personalised recommendations have been most successful in domains like retail due to large volume of users’ feedback on items, it is challenging to implement traditional recommender systems into the insurance domain where such prior information is very small in volume. This work addresses the problem of sparse feedback by studying users’ click sessions as signals for learning insurance recommendations. Our preliminary results show limitations in representing click sessions by manually engineered features. The proposed framework uses an autoencoder approach to automatically learns representation of sessions, then a neural network approach to model dependencies across sessions that can be used to predict recommendations. Thereby, it is further able to capture users’ dynamic needs of insurance products evolving over time.

    Full text in ACM Digital Library

  • INPersonalised Outfit Recommendations: Use Cases, Challenges and Opportunities
    by Nick Landia (Dressipi, United Kingdom)

    Recommender systems for fashion have gained in popularity in recent years. An exciting novel application for recommender systems is outfit personalisation. This work discusses the problem of personalised outfit recommendations and presents use cases, challenges and opportunities. This is still a nascent application area in many ways and there are opportunities for innovation in generating outfits, personalising them and displaying them in a coherent way to the user.
    Outfits are different from showing complimentary items (e.g. printer paper when you buy ink, socks when you buy shoes). Complimentary items are mostly about upselling the user to buy further items with their main purchase. Outfits can be that, but they are also about presenting the main item in different contexts. Showing the same dress in a work outfit, an evening outfit and a casual outfit showcases the use and value the user would get out of purchasing the dress (the main item). In retail words, outfits help sell the main item and are not necessarily about upselling the other items in the outfit as well. Outfits are often classified as complimentary item retrieval, however there are some important aspects of outfits that are closer to the concepts of contextual recommendations and full page optimisation (what things do you show next to each other).
    This extended abstract talks about the different use cases of outfits, gives an overview of the challenges in this domain, and presents scoped problem definitions that are the building blocks that need to be addressed in order to solve personalised outfits.

    Full text in ACM Digital Library

  • INOffline Evaluation Standards for Recommender Systems
    by Chin Lin Wong (AIPS Seek, Australia), Diego De Oliveira (AIPS Seek, Australia), Farhad Zafari (AIPS Seek, Australia), Fernando Mourão (AIPS Seek, Australia), Rafael Colares (AIPS Seek, Australia), and Sabir Ribas (AIPS Seek, Australia)

    Offline evaluation has nowadays become a major step in developing Recommendation Systems in both academia and industry [4, 5]. While academia anchors on offline evaluation due to the lack of proper environments for conducting online tests with real users, the industry uses offline evaluation to filter the most promising solutions for further online testing, aiming at reducing costs and potential damage to customers. Despite the blunt advances observed on this topic recently, consolidating a reliable, replicable, flexible and efficient offline evaluation process capable of satisfactorily predicting online test results remains an open challenge [2]. The community still lacks an integrated and updated view on this topic, useful for practitioners to inspect and refine their current offline evaluation stack.
    The main Recommendation Systems venues have plenty of studies with relevant findings, presenting new challenges, pitfalls and divergent guidelines for better offline evaluation procedures [3, 5]. However, inspecting all those studies and keeping an updated perspective about where they agree is impractical, especially for the industry, given the need for fast iterations and deliveries. Thus, it is not rare to observe professionals struggle to obtain solid answers to practical and high-impact questions, such as: What are the main existing pitfalls we should be aware of when setting up an offline evaluation in a given domain? What is the desired evaluation framework for a given recommendation task? How reliable is a given offline evaluation stack, and how far is it from an ideal setting?
    In this work, we bring an updated snapshot of offline evaluation standards for Recommendation Systems. For this, we reviewed dozens of studies published in the main Recommendation Systems venues in the last five years, dealing with recurring questions related to offline evaluation design and compiling the main findings in the literature. Then, we contrasted this curated body of knowledge against practical issues we face internally at SEEK, aiming to identify the most valuable guidelines. As a result of this process, we propose an integrated evaluation framework for offline stacks, a reliability score to monitor signs of progress on our stack over time, and a list of best practices to bear in mind when starting a new evaluation. Hence, we have organised the work into three parts:
    Part I – Integrated Evaluation Framework. We present an offline evaluation framework that compiles the primary directives, pitfalls, and knowledge raised in the last five years by representative studies in the Recommendation Systems literature. This framework aims to compile the main steps, flaws and decisions to be aware of when designing offline tests. Also, it aims to present the leading solutions suggested in the literature for each known issue. The proposed framework can be seen as an extension of Cañamares’ work [1], in which we expand the factors, steps and decisions related to the design of offline experiments for recommenders. Figure 1 depicts the main steps of the framework along with some of the main pitfalls recurrently related to each step. It is noteworthy that this framework should not be deemed as a rigid and thorough set of steps and rules that all professionals must consider in every scenario. It is rather an organized collection of concerns raised in different situations, in which the strength and potential impact of each of them should be carefully inspected through the lens of each evaluation scenario.

    Part II – Reliability Score. We also propose a Reliability Score to quantify how close a given offline evaluation setting is from the idealised framework instantiated to a given domain and task. This score is derived from a question-driven process that estimates the current state, effort, and impact that each known issue has for each team or company. These questions represent a non-closed set of concerns related to distinct steps of the evaluation process that should be addressed by a reliable evaluation framework. The final score ranges from 0 to 1 and the higher its value, the more reliable a given offline evaluation setting is, considering the specific needs and perspectives of a team or company. Further, this score allows teams of professionals to monitor progress in their offline evaluation settings over time. The proposed score empowers companies to compare the maturity of different teams w.r.t. offline assessments using a unified view. In order to illustrate the practical utility of the Reliability Score, we also present a few internal use cases that demonstrate how the proposed score helped us at SEEK to identify the main flaws in our offline settings and outline strategies for refining our current evaluation stack.
    Part III – Best Practices & Limitations. Finally, we compiled a list of best practices derived from academic works, experience reports from other companies, and our own experience at SEEK. We expect the proposed list to serve as a starting point for practitioners to qualitatively review their decisions when designing offline assessments, as well as that these professionals would contribute to refining and growing it over time.

    Full text in ACM Digital Library

  • LBRSiamese Neural Networks for Content-based Cold-Start Music Recommendation.
    by Michael Pulis (University of Malta, Malta) and Josef Bajada (University of Malta, Malta)

    Music recommendation systems typically use collaborative filtering to determine which songs to recommend to their users. This mechanism matches a user with listeners that have similar tastes, and uses their listening history to find songs that the user will probably like. The fundamental issue with this approach is that artists already need to have a significant user following to get a fair chance of being recommended. This is known as the music cold-start problem. In this work, we investigate the possibility of making music recommendations based on audio content so that new artists still get a good chance of being recommended, even if they do not have a sufficient number of listeners yet.
    We propose the use of Siamese Neural Networks (SNNs) to determine the similarity between two audio clips. Each clip is first pre-processed into a Mel-Spectrogram, which is then used as input to an SNN consisting of two identical Convolutional Neural Networks (CNNs). The output of each CNN is then compared together to determine whether two songs are similar or not. These were trained using audio from the Free Music Archive, with the genre used as a heuristic to determine the similarity between song pairs.
    A query-by-multiple-example (QBME) music recommendation system was developed that makes use of the proposed content-based similarity metric to find songs that match the user’s tastes. This was packaged inside an online blind-test survey, which first prompts participants to select a set of preferred songs, and then recommends a number of songs which the subject is expected to listen to and rate on a Likert scale. The recommendations from the proposed algorithm were stochastically interleaved with songs selected randomly from the preferred genres of the user, as a baseline for comparison. The participants were not aware that the recommendations came from two different algorithms.
    Our findings show that 60.7% of the 150 participants gave higher ratings to the recommendations made by the proposed SNN-based algorithm. Findings also show that 55% of the recommended songs had less than 1,500 listens, demonstrating that the proposed content-based approach can provide a fairer exposure to all artists based on their music, independent of their fame and popularity.

    Full text in ACM Digital Library

Platinum Supporters
Gold Supporters
Silver Supporters
Special Supporter