RecSys 2020 - Session P9: Real-World Applications III - RecSys

Paper Session P9: Real-World Applications III

Session A: 00:00 – 1:30, chaired by Yongfeng Zhang and Joe Konstan. Attend in Whova
Session B: 11:00 – 12:30, chaired by Dietmar Jannach and Pablo Castells. Attend in Whova

LPLearning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication
by Xu He (Nanyang Technological University), Bo An (Nanyang Technological University), Yanghua Li (Alibaba Group), Haikai Chen (Alibaba Group), Rundong Wang (Nanyang Technological University), Xinrun Wang (Nanyang Technological University), Runsheng Yu (Nanyang Technological University), Xin Li (Alibaba Group), Zhirong Wang (Alibaba Group)

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents’ exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

Full text in ACM Digital Library

LPContextual User Browsing Bandits for Large-Scale Online Mobile Recommendation
by Xu He (Nanyang Technological University), Bo An (Nanyang Technological University), Yanghua Li (Alibaba Group), Haikai Chen (Alibaba Group), Qingyu Guo (Nanyang Technological University), Xin Li (Alibaba Group), Zhirong Wang (Alibaba Group)

Online recommendation services recommend multiple commodities to users. Nowadays, a considerable proportion of users visit e-commerce platforms by mobile devices. Due to the limited screen size of mobile devices, positions of items have a significant influence on clicks: 1) Higher positions lead to more clicks for one commodity. 2) The ‘pseudo-exposure’ issue: Only a few recommended items are shown at first glance and users need to slide the screen to browse other items. Therefore, some recommended items ranked behind are not viewed by users and it is not proper to treat this kind of items as negative samples. While many works model the online recommendation as contextual bandit problems, they rarely take the influence of positions into consideration and thus the estimation of the reward function may be biased. In this paper, we aim at addressing these two issues to improve the performance of online mobile recommendation. Our contributions are four-fold. First, since we concern the reward of a set of recommended items, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set. Second, we propose a novel contextual combinatorial bandit method called UBM-LinUCB to address two issues related to positions by adopting the User Browsing Model (UBM), a click model for web search. Third, we provide a formal regret analysis and prove that our algorithm achieves sublinear regret independent of the number of items. Finally, we evaluate our algorithm on two real-world datasets by a novel unbiased estimator. An online experiment is also implemented in Taobao, one of the most popular e-commerce platforms in the world. Results on two CTR metrics show that our algorithm outperforms the other contextual bandit algorithms.

Full text in ACM Digital Library

LPOffline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation
by Mawulolo K. Ameko (University of Virginia), Miranda L. Beltzer (University of Virginia), Lihua Cai (University of Virginia), Mehdi Boukhechba (University of Virginia), Bethany A. Teachman (University of Virginia), Laura E. Barnes (University of Virginia)

Delivering treatment recommendations via pervasive electronic devices such as mobile phones has the potential to be a viable and scalable treatment medium for long-term health behavior management. But active experimentation of treatment options can be time-consuming, expensive and altogether unethical in some cases. There is a growing interest in methodological approaches that allow an experimenter to learn and evaluate the usefulness of a new treatment strategy before deployment. We present the first development of a treatment recommender system for emotion regulation using real-world historical mobile digital data from n = 114 high socially anxious participants to test the usefulness of new emotion regulation strategies. We explore a number of offline contextual bandits estimators for learning and propose a general framework for learning algorithms. Our experimentation shows that the proposed doubly robust offline learning algorithms performed significantly better than baseline approaches, suggesting that this type of recommender algorithm could improve emotion regulation. Given that emotion regulation is impaired across many mental illnesses and such a recommender algorithm could be scaled up easily, this approach holds potential to increase access to treatment for many people. We also share some insights that allow us to translate contextual bandit models to this complex real-world data, including which contextual features appear to be most important for predicting emotion regulation strategy effectiveness.

Full text in ACM Digital Library

LPExploring Clustering of Bandits for Online Recommendation System
by Liu Yang (Hong Kong University of Science and Technology), Bo Liu (AI Group, WeBank), Leyu Lin (WeiXin Group, Tencent), Feng Xia (WeiXin Group, Tencent), Kai Chen (Hong Kong University of Science and Technology), Qiang Yang (AI Group, WeBank)

Cluster-of-bandit policy leverages contextual bandits in a collaborative filtering manner and aids personalized services in the online recommendation system (RecSys). When facing insufficient observations, the cluster-of-bandit policy could achieve more outstanding performance because of knowledge sharing. Cluster-of-bandit policy aims to maximize the cumulative feedback, e.g., clicks, from users. Nevertheless, in the way of their goal exist two kinds of uncertainties. First, cluster-of-bandit algorithms make recommendations according to their uncertain estimation of user interests. Second, cluster-of-bandit algorithms transfer relevant knowledge upon uncertain and noisy user clusters. Existing algorithms only consider the first one, while leaving the latter one untouched. To address the two challenges together, in this paper, we propose the ClexB policy for online RecSys. On the one hand, ClexB estimates user clustering more accurately and with less uncertainty via explorable-clustering. On the other hand, ClexB also exploits and explores user interests by sharing information within and among user clusters. In summary, ClexB explores knowledge transfer and further aids the inferences about user interests. Besides, we provide extensive empirical experiments on both the synthetic and real-world datasets and regret analysis, further consolidating the superiority of ClexB.

Full text in ACM Digital Library

INBuilding a Reciprocal Recommendation System at Scale From Scratch: Learnings from One of Japan’s Prominent Dating Applications
by R. Ramanathan (SBX Technologies Corporation), Nicolas K Shinada (SBX Technologies Corporation), Sucheendra K. Palaniappan (SBX Technologies Corporation)

Online dating platforms have changed the paradigm of how people seek potential relationships. In this context, reciprocal recommendation systems consider the mutual ’match’ potential between users, i.e users who are likely to interact and potentially ’like’ each other. We present our experiences on how we devised algorithms to overcome data specific nuances, built and deployed the system from scratch in a relatively short time-span for one of the prominent dating applications in Japan.

Full text in ACM Digital Library

Back to Program

Select timezone:

Pacific, USA Mountain, USA Eastern, USA Rio de Janeiro, Brazil
London, UK Central Europe Moscow, Russia Indian Standard Time
Shanghai, China Tokyo, Japan Sydney, Australia

Current time in :

Paper Session P9: Real-World Applications III

RecSys 2020 (Online)

Diamond Supporter

Platinum Supporters

Gold Supporters

Silver Supporter

Special Supporter

About this site

RecSys 2026

About the photos on this site