
Industry Session 2: Novel Approaches
Date: Tuesday, Sept 17, 2019, 14:00-15:30
Location: Auditorium
Chair: Kim Falk
Groupon Finally Explains Why We Showed Those Offers
by Sasank Channapragada, Harshit Syal and Ibrahim Maali (Groupon)
Groupon has a large inventory of offers as varied as local taquerias, massages, concert tickets, and trips to Costa Rica. Our Search & Recommendations team continues to develop algorithmic recommendations systems, machine-learned query understanding models, and increasingly sophisticated personalization and sales conversion estimations. Across an inventory of millions of offers, including many highly localized and geographically-specific ones unique to Groupon’s Local business, we strive to balance inventory exploration and matching our users with the exact right item. Our Recommendations models take a variety of factors into account so that we can make the most relevant suggestions to our customers in their neighborhood, or while traveling in one of our hundreds of domestic and international markets. Our system must index millions of items, including the many specific to a user’s location; score the deals based on estimated conversion; and finally make adjustments for personalization, exploration, and diversity before delivering our ranked list of inventory to the platform. Yet despite our efforts, many of our customers are unaware of how highly considered their Groupon App and Emails are. In numerous customer interviews we found a huge perception gap that had to be addressed. Customers expressed that our central scrollable home feed felt ‘cluttered’, ‘disorganized’, and ‘like a garage sale’. It was clear to us that the next great sophisticated recommendation feature meant nothing if our customers couldn’t appreciate it.
Home Page Personalization at Spotify
by Oguz Semerci (Spotify)
We aim to surface the best of Spotify for each user on the Home page by providing a personalized space where users can find recommendations of playlists, albums, artists, podcasts tailored to their individual preferences. Hundreds of millions of users listen to music on Spotify each month, with more than 50 million daily active users on the Homepage alone. The quality of the recommendations on Home depends on a multi-armed bandit framework that balances exploration and exploitation and allows us to adapt quickly to changes in user preferences. We employ counterfactual training and reasoning to evaluate new algorithms without having to always rely on A/B testing or randomized data collection experiments. In this talk, we explain the methods and technologies used in the end-to-end process of homepage personalization and demonstrate a case study where we show improved user satisfaction over a popularity-based baseline. In addition, we present some of the challenges we faced in implementing such machine learning solutions in a production environment at scale and the approaches used to address them. The first challenge stems from the fact that training and offline evaluation of machine learning methods from incomplete logged feedback data requires robust off-policy estimators that account for several forms of bias. The ability to quickly sanity check and gain confidence in the methods we use in the production system is a crucial foundation for developing and maintaining effective algorithms. We demonstrate how we used a single-feature model, optimized for impression-to-click rate, to validate, and improve if necessary, the methods we use for off-policy estimation and accounting for position bias. Lastly, the business metrics we optimize for do not always reflect the expectations of all users of the Home page at a granular level. Consider a niche, daily podcast producing independent, fact-based news every morning. A small segment of Spotify customers might want to see that content on top of their Home page every morning. We present simple but informative metrics we developed to validate our model’s ability to account for such habitual behaviors of our customers.
Recommendation in Home Improvement Industry, Challenges and Opportunities
by Khalifeh Aljadda (The Home Depot)
Retail industry has been disrupted by the e-commerce revolution more than any other industry. Some giant retailers went out of business or filed for bankruptcy as a result of that like Sears and Toys R Us. However, some verticals in the retail industry are still robust and not been disrupted due to the lack of e-commerce solutions that convinced customers to turn their back to the existing physical stores in favor of the online experience. Home improvement is the best example of such vertical where e-commerce has not “yet” disrupted the domain and caused problems to the leading companies which still rely heavily on physical stores. That being said, home improvement retailers recognized the risk of not investing in building a robust online business that support their physical stores in a seamless experience so most of the leading retailers in this hundred-billion-dollar industry started building their in-house solutions for all the challenging problems to give their shoppers a seamless experience when they shop online. Recommender systems playing crucial role in this industry like any other online retailers. Therefore, it is very important to invest in building personalized, scalable, and reliable recommender system that proactively help shoppers discover products that engage them and match their intent and interest while on the website then re-engage them with products and content that align with their interest after they leave the website via email or social media. As a Sr. Manager of Core Recommendations team at The Home Depot which is the largest home improvement retailer in the world, I deal with the challenges of building such recommender system utilizing the cutting-edge technologies in AI, machine learning, and data science. In this talk I would like to discuss and highlight the following challenges in the recommendations for home improvement:
- Project-based recommendations: One of the unique aspects on home improvement retail is project-based shopping. Most of the visitors of home improvement retails are classified as ‘Do It Yourself’ where those customers who are non-home improvement professionals, but they are interested in building or fixing something in their home themselves. For those customers they prefer to go to the physical store most of the time, so they can talk to a store associate about their project and get the associate help in getting the needed tools and materials for their project. It’s very challenging to build similar experience online so I’ll talk about what we have done at Home Depot to build a project-based recommendation utilizing multi-modal learning to achieve that goal.
- Item Related Groups (IRG): One of the most important recommendations on the home improvement portals is the Item Related Groups (IRG) which includes accessories (water filter is an accessory for a fridge), collections (faucet has shower head, towel bar, and towel ring which match the style as collection), and Parts (handler of a drawer). The challenges in recommending those different IRG vary from visual compatibility to functionality understanding.
I will discuss how we are leveraging computer vision, Deep Learning, NLP, NLU, and domain knowledge to tackle these problems and generate high quality IRG recommendations. I will also cover in this talk the other challenges that face recommender systems in home improvement industry like the velocity of changing interest and intent and the sparsity of interactions between customers and products.
Recommendation Systems Compliant with Legal and Editorial Policies: The BBC+ App Journey
by Maria Panteli (BBC)
The BBC produces thousands of pieces of content every day and numerous BBC products deliver this content to millions of users. For many years the content has been manually curated (this is evident in the selection of stories on the front page of the BBC News website and app for example). To support content creation and curation, a set of editorial guidelines have been developed to build quality and trust in the BBC. As personalisation becomes more important for audience engagement, we have been exploring how algorithmically-driven recommendations could be integrated in our products. In this talk we describe how we developed recommendation systems for the BBC+ app that comply with legal and editorial policies and promote the values of the organisation. We also discuss the challenges we face moving forward, extending the use of recommendation systems for a public service media organisation like the BBC.
The BBC+ app is the first product to host in-house recommendations in a fully algorithmically-driven application. The app surfaces short video clips and is targeted at younger audiences. The first challenge we dealt with was content metadata. Content metadata are created for different purposes and managed by different teams across the organisation making it difficult to have reliable and consistent information. Metadata enrichment strategies have been applied to identify content that is considered to be editorially sensitive, such as political content, current legal cases, archived news, commercial content, and content unsuitable for an under 16 audience. Metadata enrichment is also applied to identify content that due care has not been taken such as poor titles, and spelling and grammar mistakes. The first versions of recommendation algorithms exclude all editorially risky content from the recommendations, the most serious of which is avoiding contempt of court. In other cases we exclude content that could undermine our quality and trustworthiness. The General Data Protection Regulation (GDPR) that recently came into effect had strong implications on the design of our system architecture, the choice of the recommendation models, and the implementation of specific product features. For example, the user should be able to delete their data or switch off personalisation at any time. Our system architecture should allow us to trace down and delete all data from that user and switch to non-personalised content. The recommendations should also be explainable and this led us to sometimes choosing a simpler model so that it is possible to more easily explain why a user was recommended a particular type of content. Specific product features were also added to enhance transparency and explainability. For example, the user could view their history of watched items, delete any item, and get an explanation of why a piece of content was recommended to them.
At the BBC we aim to not only entertain our audiences but also to inform and educate. These BBC values are also reflected in our evaluation strategies and metrics. While we aim to increase audience engagement we are also responsible for providing recent and diverse content that meets the needs of all our audiences. Accuracy metrics such as Hit Rate and Normalized Discounted Cumulative Gain (NDCG) can give a good estimate of the predictive performance of the model. However, recency and diversity metrics have sometimes more weight in our products, especially in applications delivering news content. What is more, qualitative evaluation is very important before releasing any new model into production. We work closely with editorial teams who provide feedback on the quality of the recommendations and flag content not adhering to the BBC’s values or the legal and editorial policies. The development of the BBC+ app has been a great journey. We learned a lot about our content metadata, the implications of GDPR in our system, and our evaluation strategies. We created a minimum viable product that is compliant with legal and editorial policies. However, a lot needs to be done to ensure the recommendations meet the quality standards of the BBC. While excluding editorially sensitive content has limited the risk of contempt of court, algorithmic fairness and impartiality still need to be addressed. We encourage the community to look more into these topics and help us create the way forward towards applications with responsible machine learning.
Incorporating Intent Propensities in Personalized Next Best Action Recommendation
by Kexin Xie and Yuxi Zhang (Salesforce.com), presented by Jonathan Budd
Next best action (NBA) is a technique that is widely considered as the best practice in modern personalized marketing. It takes users’ unique characteristics into consideration and recommends next actions that help users progress towards business goals as quickly and smoothly as possible. Many NBA engines are built with rules handcrafted by marketers based on experience or gut feelings. It is not effective. In this proposal,we show our machine learning based approach for such a real-time recommendation engine, detail our design choices, and discuss evaluation techniques. In practice, there are several key challenges to consider. (a) It needs to be able to deal with historical feedback that is typically incomplete and skewed towards a small set of actions; (b) Actions are typically dynamic. They can be added or removed anytime due to seasonal changes or shifts in business strategies; (c) The optimization objective is typically complex. It usually consists of reaching a set of target events or moving users to more preferred stages. The engine needs to account for all these aspects. Standard classification or regression models are not suitable to use, because only bandit feedback is available and sampling bias presented in historical data can not be handled properly.
Conventional multi-armed bandit model can address some of the challenges. But it lacks the ability to model multiple objectives. We present a propensity variant hybrid contextual multi-armed bandit model (PV-MAB) that can address all three challenges. PV-MAB consists of two components: an intent propensity model (I-Prop) and a hybrid contextual MAB (H-Bandit). H-Bandit can be considered as a multi-policy contextual MAB, where we model different aspects of user engagement separately and cater the policies to each unique characteristic. I-Prop leverages user intent signals to target different users toward specific goals that are most relevant to them. It acts as a policy selector, to inform H-Bandit to choose the best strategy for different users at different points in the journey. I-Prop is trained separately with features extracted from user profile affinities and past behaviors. To illustrate this design, we will focus our discussion on how to incorporate two common distinct objectives in H-bandit. The first one is to target and drive users to reach a small set of high-value goals (e.g. purchase, become super fan), called goal-oriented policy. The second is to promote progression into more advanced stages in a consumer journey (e.g. from login to complete profile). We call it stage-advancement policy. In the goal-oriented policy, were ward reaching the goals accordingly, and use classification predictor as kernel function to predict the probabilities for achieving those goals. In the stage-advancement policy, we use the progression of stages as reward. Customers can move forward in their journey, skip a few stages or go back to previous stages doing more research or re-evaluation. The reward strategy is designed in the way that we reward higher for bigger positive stage progression and not reward zero or negative stage progression. Both policies incorporate Thompson Sampling with Gaussian kernel for better exploration. One big difference between our hybrid model and regular contextual bandit model, is that besides context information, we also mix user profile affinities in the model. It tells us the user intent and interest, and how their typical journey path looks like.
With these special features, our model is able to recommend different actions for users that shows different interests (i.e. football ticket purchase v.s. jersey purchase). Similarly, for fast shoppers who usually skip a few stages, our model recommends actions that quickly triggers goal achievement; while for research type of users, the model offers actions that move them gradually towards next stages. This hybrid strategy provides us with better understanding of user intent and behaviors, so as to make more personalized recommendations. We designed a time-sensitive rolling evaluation mechanism for offline evaluation of the system with various hyper parameters that simulate behaviors in practice. Despite the lack of online evaluation, our strategy allows researchers and prospects to gain confidence through bounded expected performance. Evaluated on real-world data, we observed about 120% of reward gain, with an overall confidence of around 0.95. A big portion of the improvement is contributed by the goal-oriented policy. It well demonstrated the discovery functionality of the intent propensity model.
Driving Content Recommendations by Building a Knowledge Base Using Weak Supervision and Transfer Learning
by Sanghamitra Deb (Chegg)
With 2.2 million subscribers and two hundred million content views, Chegg is a centralized hub where students come to get help with writing, science, math, and other educational needs. In order to impact a student’s learning capabilities we present personalized content to students. Student needs are unique based on their learning style, studying environment and many other factors. Most students will engage with a subset of the products and contents available at Chegg. In order to recommend personalized content to students we have developed a generalized Machine Learning Pipeline that is able to handle training data generation and model building for a wide range of problems. We generate a knowledge base with a hierarchy of concepts and associate student-generated content, such as chatroom data, equations, chemical formulae, reviews, etc with concepts in the knowledge base. Collecting training data to generate different parts of the knowledge base is a key bottleneck in developing NLP models. Employing subject matter experts to provide annotations is prohibitively expensive. Instead, we use weak supervision and active learning techniques, with tools such as snorkel, an open source project from Stanford, to make training data generation dramatically easier. With these methods, training data is generated by using broad stroke filters and high precision rules. The rules are modeled probabilistically to incorporate dependencies. Features are generated using transfer learning from language models for classification tasks. We explored several language models and the best performance was from sentence embeddings with skip-thought vectors predicting the previous and the next sentence. The generated structured information is then used to improve product features, and enhance recommendations made to students. In this presentation I will talk about efficient methods of tagging content with categories that come from a knowledge base. Using this information we provide relevant content recommendations to students coming to Chegg for online tutoring, studying flashcards and practicing problems.