Session 12: Real-World Concerns

Date: Thursday 18:00 – 19:30 CET
Chair: Alan Said (University of Gothenburg)

  • PATops, Bottoms, and Shoes: Building Capsule Wardrobes via Cross-Attention Tensor Network
    by Huiyuan Chen (Visa Research, United States), Yusan Lin (Visa Research, United States), Fei Wang (AI Visa Research, United States), and Hao Yang (Visa Research, United States)

    Fashion is more than Paris runways. Fashion is about how people express their interests, identity, mood, and cultural influences. Given an inventory of candidate garments from different categories, how to assemble them together would most improve their fashionability? This question presents an intriguing visual recommendation challenge to automatically create capsule wardrobes. Capsule wardrobe generation is a complex combinatorial problem that requires the understanding of how multiple visual items interact. The generative process often needs fashion experts to manually tease the combinations out, making it hard to scale.
    We introduce TensorNet, an approach that captures the key ingredients of visual compatibility among tops, bottoms, and shoes. TensorNet aims to provide actionable advice for full-body clothing outfits that mix and match well. Our TensorNet consists of two core modules: a Cross-Attention Message Passing module and a Wide&Deep Tensor Interaction module. As such, TensorNet is able to characterize the local region-based patterns as well as the global compatibility of the entire outfits. Our experimental results on the real-word datasets indicate that the proposed method is capable of learning visual compatibility and outperforms all the baselines. TensorNet opens up opportunities for fashion designers to narrow down the search space for multi-clothes combinations.

    Full text in ACM Digital Library

  • PASemi-Supervised Visual Representation Learning for Fashion Compatibility
    by Ambareesh Revanur (Robotics Institute Carnegie Mellon University, United States), Vijay Kumar (Walmart Global Tech, India), and Deepthi Sharma (Walmart, India)

    We consider the problem of complementary fashion prediction. Existing approaches focus on learning an embedding space where fashion items from different categories that are visually compatible are closer to each other. However, creating such labeled outfits is intensive and also not feasible to generate all possible outfit combinations, especially with large fashion catalogs. In this work, we propose a semi-supervised learning approach where we leverage large unlabeled fashion corpus to create pseudo positive and negative outfits on the fly during training. For each labeled outfit in a training batch, we obtain a pseudo-outfit by matching each item in the labeled outfit with unlabeled items. Additionally, we introduce consistency regularization to ensure that representation of the original images and their transformations are consistent to implicitly incorporate colour and other important attributes through self-supervision. We conduct extensive experiments on Polyvore, Polyvore-D and our newly created large-scale Fashion Outfits datasets, and show that our approach with only a fraction of labeled examples performs on-par with completely supervised methods.

    Full text in ACM Digital Library

  • PALarge-Scale Modeling of Mobile User Click Behaviors Using Deep Learning
    by Xin Zhou (Google Research, United States) and Yang Li (Google Research, United States)

    Modeling tap or click sequences of users on a mobile device can improve our understandings of interaction behavior and offers opportunities for UI optimization by recommending next element the user might want to click on. We analyzed a large-scale dataset of over 20 million clicks from more than 4,000 mobile users who opted in. We then designed a deep learning model that predicts the next element that the user clicks given the user’s click history, the structural information of the UI screen, and the current context such as the time of the day. We thoroughly investigated the deep model by comparing it with a set of baseline methods based on the dataset. The experiments show that our model achieves 48% and 71% accuracy (top-1 and top-3) for predicting next clicks based on a held-out dataset of test users, which significantly outperformed all the baseline methods with a large margin. We discussed a few scenarios for integrating the model in mobile interaction and how users can potentially benefit from the model.

    Full text in ACM Digital Library

Platinum Supporters
 
 
Gold Supporters
 
 
 
 
 
Silver Supporters
 
 
 
 
Special Supporter