
Industry Session 1: Core Algorithms
Date: Wednesday, Oct 3, 2018, 14:00-15:30
Location: Parq D/E/F
Chairs: Alexandros Karatzoglou, Ben Frederickson
Variational Learning to Rank (VL2R)
by Keld Lundgaard (SalesForce)
We present Variational Learning to Rank (VL2R), a combination of variational inference and learning to rank. The combination provides a natural way to balance exploration and exploitation of the algorithm by introducing shuffling of product search/category listings according to the model’s relevance uncertainty for each product. Simply put, we perturb (newer) products with higher uncertainty on the relevance more than (older) products which have a lower uncertainty on the relevance.
Our formalism makes it possible to train an end-to-end model that optimizes for both ranking and shuffling, compared to known state-of-the-art systems where ranking and shuffling are treated as separate problems. VL2R provides an integrated way of doing propensity scoring during the offline learning phase, thus reducing selection bias. The system is simple, yet powerful and flexible. We have implemented it within the Salesforce Commerce Cloud; a platform 500 million unique online shoppers interact with each month across 2,750 websites in 53+ countries as of FY18.
In this talk, we will go into the details of our variational learning to rank system and share our early experiences with optimizing VL2R and running it in production. We hope that by sharing VL2R with the recommendation systems community, we will foster more research in this direction, and result in systems that are faster at learning user preferences for changing catalogs.
About the Speaker
Keld Lundgaard is a senior data scientist at Salesforce Commerce Cloud Einstein. He has developed and implemented on a number of recommendation systems that are currently served across Commerce Cloud websites. Prior to Salesforce, Keld was a postdoctoral fellow at Stanford University, where he developed machine learning models to improve the accuracy of surface science simulations used for screening new material compounds for batteries, fuel cells, and artificial photosynthesis. Keld holds a Ph.D. from Technical University of Denmark.
Adapting Session Based Recommendation for Features Through Transfer Learning
by Even Oldridge (realtor.com)
This industry talk covers the deep learning architecture developed at Realtor.com to recommend real estate listings to our userbase. The recommendation of homes is a different problem than most other domains both in the sense that listings are unique and that there are additional geographic and time constraints that increase the sparsity of interactions and make recommendation of individual listings more challenging. In particular time on market in a hot area can be limited to weeks or even days, and listing cold-start is critical to providing up to date market information. Thankfully the structured feature data for listings is incredibly rich and provides a framework from which to map listings into a meaningful vector space. User first impressions are also incredibly important in this highly competitive field, and offline recommendation or models that don’t adapt during the users session are less desirable.
In order to solve this recommendation problem we have developed a model based off of session based recommendation. The architecture utilizes state of the art techniques from Natural Language Processing, including the AWD-LSTM language model developed by Salesforce. To solve for cold-start of listings a structured data based denoising autoencoder was adapted from the methodology described in the winning entry of the Puerto Segurno Safe Driver Kaggle Competition. This model is not used in the common way of generating fixed feature vectors, but rather the entire head of the autoencoder model, from the feature inputs to the middle layer commonly used as the vector output, is first trained to encode listing features, and then becomes the input to the AWD-LSTM architecture. This style of transfer learning is common in Computer Vision, and has recently been utilized in NLP to achieve state of the art results for text classification. By including the head we are able to further optimize the listing encoder network and embeddings to take user interactions into account. As in traditional session based recommendation users are represented as the sequence of listings that they view, however those listings are fed into the model as the sequence of features.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
The final system consists of several components. The first attempts to calculate and maintain the users’ feature vector and model hidden weights in near realtime, providing a representation for the user within the system. This representation is used by several downstream components, most notably the search rerank and recommendation modules which calculate users’ interest in listings both in the context of the output of more traditional elasticsearch queries via cosine similarity of user/listing vectors and through approximate nearest neighbor vector space searches for relevant listings which form the input set for a pointwise scoring model trained on time on listing as done by Youtube.
About the Speaker
Even Oldridge is a Principal Data Scientist at Realtor.com where he leads the search rerank and recommendation efforts for a userbase of 60 million monthly active users searching for their perfect home. Prior to Realtor he worked for one of the largest online dating websites in the world, Plenty of Fish, where he developed many of the recommendation algorithms used to match users in realtime and led the data science team. Through his work at Plenty of Fish he is the holder of two patents and four pending patents in the field of matching and recommendation in online dating. He has a Ph.D. in Electrical Engineering with a focus on Computational Photography and Human Computer Interaction from the University of British Columbia, and an M.A.Sc from the same in Field Programmable Gate Array Architecture design. He is also an active student of deep learning, studying under Jeremy Howard’s Fast.ai course Cutting Edge Deep Learning for Coders from the University of San Francisco Data Institute.
Hulu Video Recommendation: From Relevance to Reasoning
by Xiaoran Xu (Hulu)
Online Video Streaming services such as Hulu hosts tens of millions of premium videos, which requires an effective recommendation system to help viewers discover what they enjoy. In this talk, we will introduce Hulu’s recent technical progresses in recommender systems and deep-dive into the topic of generating recommendation reason from knowledge graph.
We have two user scenarios: the store-shelf and autoplay. The first requires a list of videos to maximize the chance that a viewer would pick one of them to watch. The second requires a sequence of video recommendations such that the viewer would continuously watch within the current session.
In the model layer, we designed specific models to match with each user scenario, balancing both exploitation and exploration. For example, we leverage the contextual-bandit model in the store-shelf scenario to adapt the ranking strategy to various types of user feedbacks. To optimize exploitation, we tested several granularity levels for parameter sharing among the arms. For more effective exploration, we incorporate Thomason sampling. For the autoplay scenario, we use a contextual recurrent neural network to predict the next video that the viewer is going to watch.
In the feature and data layer, we train embeddings for content, user and contextual info. For example, to train content embeddings, we collect factual tags from metadata, sentiment tags from reviews, and keywords from the captions and object/action recognized using computer vision techniques.
Next we will deep-dive into one important topic: generating recommendation reason from knowledge graph.
A fact is defined by a tuple of related entities and their relation, which is normally a pair of entities tagged by a relationship. In our problem setting, recommendation results are the targets, viewed as inputs for the reasoning task, consisting of pairs of relevant entities, i.e. a source node and a destination node in a knowledge graph. The recommendation reasoning task is to learn a path or a small directed acyclic subgraph, connecting the source node to the destination node.
Since the facts in a knowledge graph have different confidence values for different reasoned targets, we need to conduct a probabilistic inference. The challenge is we do not know a pre-defined set of logic rules to guide the search through the knowledge graph, which prevents us from directly applying the probabilistic logic methods. Inspired by recent advances in deep learning and reinforcement learning, especially in graph neural networks, attention mechanism and deep generative models, we propose two ways to model the reasoning process: the differentiable reasoning approach and the stochastic reasoning approach.
Differentiable reasoning approaches are based on graph neural networks with attention flow and information flow. The attention dynamics is an iterative process of redistributing and aggregating attention over the knowledge graph, starting at the source node. The final attention aggregated at the destination node serves for the prediction to compute the loss. Instead of the prediction accuracy, we care more about how the learned attention dynamics draws its reasoning track in a knowledge graph.
Stochastic reasoning approaches frame the reasoning process as learning a probabilistic graphical model consisting of stochastic discrete operations, such as selecting a node and selecting an edge, to build a reason subgraph extracted from the knowledge graph. The model is known as stochastic computation graphs (SCGs), and to learn it, we propose a generalized back-propagation framework Backprop-Q to overcome the gradient-blocking issues in applying standard back-propagation.
In summary, we give an overview of the recommendation research in Hulu and deep-dive into our differentiable reasoning approach and stochastic reasoning approach for generating recommendation reasons based on a knowledge graph.
About the Speaker
Xiaoran Xu is a Researcher at Hulu. He is a member of the Recommendation Research team and affiliated with Hulu Innovation Lab. He focuses on differentiable reasoning and stochastic reasoning approaches that bring better interpretability for recommendation and other tasks. He has developed a generalized backpropagation framework, called Backprop-Q, which makes large stochastic systems trainable in an end-to-end fashion. He also proposed a new attention mechanism, called Reasoning with Attention Flow (RAF), to solve differentiable reasoning problems effectively.
Hybrid search: Incorporating Contextual Signals in Recommendations at Pinterest
by Jenny Liu (Pinterest)
Many modern recommender systems use collaborative filtering or historical engagement data to serve the best recommendations for each item. However, the context of each recommendation instance can be very different. Some users may be casually browsing, while others are searching with high intent. At Pinterest, we realized that building our system solely on aggregated historical data or pin-board collaborative filtering would not be able to capture these differences. Incorporating contextual signals helps us serve better recommendations for every instance.
Pinterest Related Pins is an item-to-item recommender system that accounts for 40 percent of engagement on Pinterest. On Pinterest, Related Pins appears as a feed of content relevant to the Pin a user has clicked on. Users arrive at Related Pins feeds from a variety of surfaces, such as their Home Feed, Search results, or Boards. As expected, these users often have different intents. Users coming from Search have already executed a specific text query and clicked on one of the Pins in the Search results. This context tells us that the user has high intent and is interested in something related to both the Search query as well as the clicked Pin. This context is very different from a user who is casually scrolling through their Home Feed and clicks on a Pin that happens to catch their eye. The Related Pins recommendations for each of these clicked Pins should therefore also differ accordingly.
Related Pins are generally relevant to the clicked Pin. The recommendations for a women’s dress shoe Pin will be other shoes of similar style, some of which may be paired with matching outfits. However, if the user searched in particular for “red ballet flats with sequins”, the Related Pins may not be specific enough to be useful to the user. In order to address this, we developed a hybrid search that takes both the text search query and the clicked pin image and metadata as inputs, and outputs a set of results tailored to both. We found that this improved user engagement for Related Pins from Search by 20% on top of the previous production recommendation system. Following this exciting launch, we are planning to further incorporate contextual signals by adding them as features in our model.
About the Speaker
Jenny Liu is currently a software engineer on the Discovery team at Pinterest, where she leads candidate generation for the Related Pins feature–an item-to-item based recommendation system. Her contributions to Related Pins also include automating their model training pipeline, feature development, and content activation. Before that, she worked on acquiring new users on web and iOS as part of the Pinterest Growth team. Prior to Pinterest, Jenny graduated from Harvard, where she received a Bachelors in computer science with a minor in statistics.
Learning Content and Usage Factors Simultaneously to Reduce Clickbaits
by Arnab Bhadury (Flipboard)
Recommending news and content is often more difficult than classic recommendation problems. At recommendation time, there is often less high quality explicit usage signals like upvotes, shares, dislikes, etc. because articles are relevant for a very short amount of time. Solely relying on implicit usage signals (views) in collaborative filtering for news articles often yields low quality documents optimized for views and clicks. Traditionally, content based filtering methods such as topic modeling, named entity extraction etc. are often used to counter or mitigate these issues but result in poorer recommendations on their own, and hybrid solutions of ensembles of content and collaborative filtering are difficult to optimize.
This talk proposes learning factorized representations of documents using both the content and usage signals simultaneously. Using both signals simultaneously encourages the content and usage signals to act as regularizers for each other. Also, this serves to keep the recommendation quality high while reducing the number of click-baits. This avoids the additional step of tuning often-used ensembled content and collaborative filtering based hybrid models.
This research explores learning these shared factorized representations between the two views using the traditional matrix factorization framework as well as probabilistic approaches based on topic modeling. This talk shares the lessons learned from using both approaches and shows the impact of using these learned representations on recommendation quality.
About the Speaker
Arnab Bhadury is a machine learning engineer and data scientist on the data products team at Flipboard, Vancouver. He is the main author of the current multilingual topic extraction pipeline at Flipboard and is currently working closely with the recommendations team to improve news recommendation in multiple languages and locales. Mr. Bhadury holds an MSc from Tsinghua University where he worked on Bayesian Topic Modeling and Large Scale Bayesian Inference. His current research interests include recommender systems, natural language processing and Bayesian machine learning. Dr. Aanchan Mohan is a machine learning scientist and software engineer at Synaptitude Brain Health. He is currently working on software and machine learning methods to encourage circadian regulation with the goal of improving an individual’s brain health. His current research interests include problems in natural language processing. Dr. Mohan has worked on Bayesian and deep learning methods applied to time series signals across multiple domains. He holds a PhD from McGill University where he focused on transfer learning and parameter sharing in acoustic models for speech recognition. He supervises students and actively publishes in the area of speech processing. He is a named co-inventor on two issued patents in the area of speech processing, and one filed patent in the area of wearable devices.