Session 15: Off-policy Learning
Date: Thursday October 17, 12:30 PM – 13:15 PM (GMT+2)
Room: Petruzzelli Theater
Session Chair: Alan Said
- RES 🕓15Multi-Objective Recommendation via Multivariate Policy Learning
by Olivier Jeunen (ShareChat), Jatin Mandav (ShareChat), Ivan Potapov (ShareChat), Nakul Agarwal (ShareChat), Sourabh Vaid (ShareChat), Wenzhe Shi (ShareChat) and Aleksei Ustimenko (ShareChat) - RES 🕓15Optimal Baseline Corrections for Off-Policy Contextual Bandits
by Shashank Gupta (University of Amsterdam), Olivier Jeunen (ShareChat), Harrie Oosterhuis (Radboud University) and Maarten de Rijke (University of Amsterdam) - RES 🕓15Effective Off-Policy Evaluation and Learning in Contextual Combinatorial Bandits
by Tatsuhiro Shimizu (Independent Researcher) and Koichi Tanaka (Keio Univercity),
Ren Kishimoto (Tokyo Institute of Technology), Haruka Kiyohara (Cornell University), Masahiro Nomura (CyberAgent, Inc.) and Yuta Saito (Cornell University)