Industry Track


Distributed, Real-Time Bayesian Learning in Online Services

by Ralf Herbrich (Facebook)

Abstract: The last ten years have seen a tremendous growth in Internet-based online services such as search, advertising, gaming and social networking. Today, it is important to analyze large collections of user interaction data as a first step in building predictive models for these services as well as learn these models in real-time.

One of the biggest challenges in this setting is scale: not only does the sheer scale of data necessitate parallel processing but it also necessitates distributed models; with over 900 million active users at Facebook, any user-specific sets of features in a linear or non-linear model yields models of a size bigger than can be stored in a single system.

In this talk, I will give a hands-on introduction to one of the most versatile tools for handling large collections of data with distributed probabilistic models: the sum-product algorithm for approximate message passing in factor graphs. I will discuss the application of this algorithm for the specific case of generalized linear models and outline the challenges of both approximate and distributed message passing including an in-depth discussion of expectation propagation and Map-Reduce.

In the second half of the talk, I will discuss industry applications of these models to the problem of gamer ranking in online gaming services such as Xbox Live and collaborative filtering for content recommendation at Facebook.

Bio: Ralf Herbrich is Engineering Manager at Facebook where he is working on large-scale, distributed ranking systems & services for information distribution.

Before joining Facebook, he was heading the Bing Personalization team which focused on prototyping and enabling personalized experiences across Microsoft's Online Services Division. Prior this his work on Bing, Ralf was Director of Microsoft's Future Social Experiences (FUSE) Labs UK working on new social experiences powered by computational intelligence technologies on large online data collections. Ralf joined Microsoft Research in 2000 as a Postdoctoral researcher and Research Fellow of the Darwin College Cambridge. During his time at Microsoft Research, Ralf was working in the area of machine learning, information retrieval, game theory, artificial intelligence and social network analysis. Prior to joining Microsoft, Ralf worked at the Technical University Berlin as a teaching assistant where he obtained both a diploma degree in Computer Science and a Ph.D. degree in Statistics.

Ralf's research interests include Bayesian inference and decision making, computer games, kernel methods and statistical learning theory. He co-authored over 50 journal and conference papers in these areas. Ralf is one of the inventors of the Drivatars? system used in the Forza Motorsport series as well as the TrueSkill? ranking and matchmaking system in Xbox 360 Live. He also co-invented the click-prediction technology used in Bing's online advertising system.

Recommendation Challenges in Web Media Settings

by Ronny Lempel (Yahoo! Research)

Bio: Ronny Lempel joined Yahoo! Research in October 2007 as the director of Yahoo! Israel Research Ltd., where he oversees R&D activities at the cutting edge of Web search. Prior to joining Yahoo! Research, Ronny spent 4.5 years at IBM's Haifa Research Lab with the Information Retrieval Group, where his duties included research and development in the area of enterprise search systems. Prior to joining IBM, Ronny received his BSc, MSc and PhD from the Faculty of Computer Science at Technion, Israel Institute of Technology in 1997, 1999 and 2003 respectively. Both his MSc and PhD focused on search engine technology. During his PhD studies, Ronny spent two summer internships at the AltaVista search engine.

Abstract: The talk calls out several research challenges in the art of recommendation technology as applied in Web media sites. One particular characteristic of such recommendation settings is the relative low cost of falsely recommending an irrelevant item, which means that recommendation schemes can be less conservative and more exploratory. This also creates opportunities for better item cold-start handling. Other technical difficulties include analyzing offline data that is heavily biased by the site's appearance, and in a related vein -- once a recommendation module's appearance has been designed -- defining the correct metrics by which to measure it. Also called out are tradeoffs between personalization and contextualization, as are novel schemes that aim at recommending sets and sequences of items.

Recommendations and Discovery at StumbleUpon

by Sumanth Kolar (StumbleUpon)

Abstract: It's human nature to be curious, to learn new things, to want to find out more. Discovery is an innate human need, and with the rise of the Web, the urge to learn more has increased by leaps and bounds. According to David Hornik, investor at August Capital, "The massive scale of the Web not only creates huge challenges for search, it also cripples discovery. Gone are the good old days in which fortuity would lead to the unearthing of interesting new websites." Indeed, we live in the age of "infovores" and there is definitely a need for a service that provides serendipity.

StumbleUpon started in 2002 with the mission to bring back discovery to the web, predicting today's online personalization trend. Now, StumbleUpon is the largest personalized content discovery engine on the Web, delivering more than 1 billion personalized recommendations per month. Many sites attempt to deliver personalized recommendations for one vertical, such as movies, music or books. But StumbleUpon uniquely tackles the greater challenge of delivering personalized recommendations for all kinds of media in both broad topics (e.g., photography, news) and focused niche topics (e.g., Evolutionary Algorithms, Game of Thrones). StumbleUpon's engineers conduct complex data elicitation, develop dynamic and temporal user models, ensure novelty, balance the exploration and exploitation of user preferences and much more. Also, analyzing different kinds of media presents our team with data that is highly unstructured, so prioritizing the signals amongst the noise is a consistent focus.

Providing serendipitous discovery that can inform, entertain and enlighten our users is of utmost importance to StumbleUpon. This talk will focus on how StumbleUpon uses several machine learning techniques such as collaborative filtering techniques, active learning, decision trees, Bayesian models and more to solve complex problems involving classification, user behavior analysis, modelling, anti-spam and recommendations. An average StumbleUpon user spends over 7 hours per month using the product, equating to hundreds of varied recommendations and ample feedback. The talk will also provide insights into some of StumbleUpon's rich data and how we can use scale to accomplish what would otherwise not be possible. We will look at innovative ways that StumbleUpon figures out the right metrics to evaluate recommender systems - a very complex problem. We will also discuss our research on StumbleUpon's mobile activity, which is growing 800% year over year and is the fastest growing part of our business, and how mobile recommendations are unique and important.

Bio: As Engineering Director at StumbleUpon, Sumanth Kolar leads the applied research team, overseeing recommendations, anti-spam, content analysis, user modeling, data sciences and infrastructure. ?Sumanth tackles very interesting and challenging research problems as StumbleUpon delivers more than 1 billion personalized recommendations a month to its more than 25 million users. Prior to joining the company in 2009, Sumanth engineered bidding and computer vision systems at Yahoo! and Adobe Research. Sumanth holds a masters degree in computer science from the University of California at Santa Cruz.

Recommender Systems & The Social Web

by Anmol Bhasin (LinkedIn)

Abstract: The pervasiveness of social networks has magnified the utility of recommender systems and all three classical dimensions users, items and modes of interactions i.e. click or buy etc. have exploded in scale: more users, more heterogeneous items, and diverse interactions.

In this talk we present the challenges and opportunities of applying simple to sophisticated machine learning, data mining, and statistical modeling techniques to the world of recommender problems in social networks. Using real world example applications deployed on LinkedIn, we build from foundational literature on content based recommendations, collaborative filtering, and behavioral targeting techniques to arrive at the formalism of Social Filtering. We then cover critical aspects of developing of a web scale social recommender systems including infrastructure, feature engineering and model fitting. We describe some of the most fascinating challenges faced in the real-world setting of operating recommender systems including scalability, offline vs online tradeoffs, A/B Testing, and Multiple Objective Optimization. Finally, conclude with some new and unique paradigms of virtual profiling, social referral and intent-interest modeling, in the context of the LinkedIn recommender system.

Bio: Anmol Bhasin is a Senior Engineering Manager at LinkedIn, where he leads a team working on recommender systems, computational advertising and personalization. His team's contributions include LinkedIn's various personalized recommendation products (e.g., "Jobs You Might Be Interested In"), social news ("LinkedIn Today"), and systems for ad targeting and click through rate prediction. His team also built the content processing pipeline and online experimentation framework used for LinkedIn's suite of data products.

Prior to LinkedIn, Anmol worked at business search engine Business.com, where he developed the crawler, indexing systems, and retrieval algorithms. Anmol has also authored mobile gaming applications, including the award-winning Tecmo Bowl. Anmol received a Masters in Computer Science from the State University of New York at Buffalo, where he focused on text mining and applied machine learning for cross document learning.

Towards Personality-Based Personalization

by Thore Graepel (Microsoft Research)

Abstract: With the ever growing amount of data available about users through digital records of their online behaviour, it becomes possible to incorporate more and more specific knowledge about users into recommendation systems and personalized services. In this talk, I describe Matchbox, a Bayesian recommendation engine that combines collaborative and feature-based aspects, thus allowing the system to make use of user and item specific information that helps generalize across users and items, and can help mitigate the cold-start problem. I describe Matchbox in terms of a probabilistic graphical model and show how inference is performed using approximate message passing algorithms. I then report how a system based on Matchbox has been deployed to 40M users on Microsoft's Xbox Live Marketplace to deliver recommendations on movies and games.

Looking forward, I describe research in which we build predictors for users' personality profiles based on traces of their online behaviour which we train on personality data obtained through psychometric questionnaires. I conclude by discussing the potential of using psychometric predictions for recommendation and personalization, and by pointing out the implications for users' privacy. This is joint work with Michal Kosinski, David Stillwell, Pushmeet Kohli, Yoram Bachrach, Ralf Herbrich, David Stern, Nir Nice, and Ulrich Paquet.

Bio: Thore Graepel is a Principal Researcher at Microsoft Research Cambridge (MSRC), UK, and heads the Online Services and Advertising (OSA) group. Thore's research interests are in machine learning and probabilistic modelling applied to a wide range of tasks including web search, online advertising, recommender systems, games, and social networks. He has a strong academic track record with over 60 peer-reviewed publications in these areas. Thore was also involved in the development of several machine learning algorithms that are now used by millions of users in Microsoft's online services, including Bing's AdPredictor, the TrueSkill ranking and matchmaking algorithm in Microsoft's online gaming service Xbox Live, and the recommendation system for games and videos on the Xbox Live marketplace.

Thore studied physics in Hamburg, London, and Berlin and obtained his PhD in computer science from the Technical University of Berlin. He held post-doctoral positions at the ETH Zurich and Royal Holloway College, London, before Joining Microsoft Research in 2003. He is a Senior Member of Wolfson College, Cambridge, and a Senior Member of IEEE. Thore is on the editorial board of the Journal of Machine Learning Research (JMLR), Springer's Machine Learning Journal (MLJ) and he is a founding editor of Chapman-Hall's Machine Learning & Pattern Recognition book series. In his spare research time he thinks about the problem of creating an algorithm that can beat professional players at the ancient Chinese board game of Go.

I've got 10 million songs in my pocket. Now what?

by Paul Lamere (The Echo Nest)

Abstract: The proverbial 'celestial jukebox' has become a reality. With today's online music services a music fan is never more than a few clicks away from being able to listen to nearly any song that has ever been recorded. Recommender systems can play a key role in this new music ecosystem, helping listeners explore, discover, organize and share music. However, in many ways music recommendation is very different than recommendation in other well-studied domains such as books and movies. In this talk we explore how recommender systems can be used in the music space, and the particular challenges that the music domain presents to the designers of recommender systems.

Bio: Paul Lamere is the Director of Developer Community at The Echo Nest, a research-focused music intelligence startup that provides music information services to developers and partners through a data mining and machine listening platform. Paul is especially interested in hybrid music recommenders and using visualizations to aid music discovery.

We are leaving the age of information and entering the age of recommendation.

Chris Anderson in The Long Tail