Accepted Contributions

List of all papers accepted for RecSys 2014 (in alphabetical order)

  • A Framework for Matrix Factorization based on General Distributions

    by Josef Bauer and Alexandros Nanopoulos

    In this paper we extend the current state-of-the-art matrix factorization method for recommendations to general probability distributions. As shown in previous work, the standard method called “Probabilistic Matrix Factorization” is based on a normal distribution assumption. While there exists work in which this method is extended to other distributions, these extensions are restrictive and we experimentally show on the basis of a real data set that it is worthwhile considering more general distributions which have not been used in the literature. Our contribution lies in providing a flexible and easy-to-use framework for matrix factorization with almost no limitation on the form of the distribution used. Our approach is based on maximum likelihood estimation and a key ingredient of our proposed method is automatic differentiation. This allows for the automatic derivation of the corresponding optimization algorithm, without the need to derive it manually for each distributional assumption while simultaneously being computationally efficient. Thus, with our method it is very easy to use a wide range of even complicated distributions for any data set.

  • A Parameter-free Algorithm for an Optimized Tag Recommendation List Size

    by Modou Gueye, Talel Abdessalem and Hubert Naacke

    Tag recommendation is a major aspect of collaborative tagging systems. It aims to recommend suitable tags to a user for tagging an item. One of its main challenges is the effectiveness of its recommendations. Existing works focus on techniques for retrieving the most relevant tags to give beforehand, with a fixed number of tags in each recommended list. In this paper, we follow another direction in order to improve the efficiency of the recommendations. We propose a parameter-free algorithm for determining the optimal size of the recommended list. Thus we introduced some relevance measures to find the most relevant sublist from a given list of recommended tags. More precisely, we improve the quality of our recommendations by discarding some unsuitable tags and thus adjusting the list size. Our approach seems to be new, since we are not aware of any other work addressing this problem. Our solution is an add-on one, which can be implemented on top of many kinds of tag recommenders. The experiments we did on five datasets, using four categories of tag recommenders, demonstrate the efficiency of our technique. For instance, the algorithm we propose outperforms the results of the task 2 of the ECML PKDD Discovery Challenge 2009. By using the same tag recommender than the winners of the contest, we reach a F1 measure of 0.366 while the latter got 0.356. Thus, our solution yields significant improvements on the lists obtained from the tag recommenders.

  • A Robust Model for Paper-Reviewer Assignment

    by Xiang Liu, Torsten Suel and Nasir Memon

    Automatic expert assignment is a common problem encountered in both industry and academia. For example, for conference program chairs and journal editors, in order to collect “good” judgments for a paper, it is necessary for them to assign the paper to the most appropriate reviewers. Choosing appropriate reviewers of course includes a number of considerations such as expertise and authority, but also diversity and avoiding conflicts. In this paper, we explore the expert retrieval problem and implement an automatic paper-reviewer recommendation system that considers aspects of expertise, authority, and diversity. In particular, a graph is first constructed on the possible reviewers and the query paper, incorporating expertise and authority information. Then a Random Walk with Restart (RWR) model is employed on the graph with a sparsity constraint, incorporating diversity information. Extensive experiments on two reviewer recommendation benchmark datasets show that the proposed method obtains performance gains over the state-of-the-art reviewer recommendation systems in terms of expertise, authority, diversity, and, most importantly, relevance as judged by human experts.

  • Attacking Item-Based Recommender Systems with Power Items

    by Carlos Seminario and David Wilson

    Recommender Systems (RS) are vulnerable to attack by malicious users who intend to bias the recommendations for their own benefit. Research in this area has developed attack models, detection methods, and mitigation schemes to understand and protect against such attacks. For Collaborative Filtering RSs, model-based approaches such as item-based and matrix-factorization were found to be more robust to many types of attack. Advice in designing for system robustness has thus been to employ model-based approaches. Our recent work with the Power User Attack (PUA), however, determined that attackers disguised as influential users can successfully attack (from the attacker’s viewpoint) SVD-based recommenders, as well as user-based. Though item-based systems remained robust to the PUA. In this paper we investigate a new, complementary attack model, the Power Item Attack (PIA), that uses influential items to successfully attack RSs. We show that the PIA is able to impact not only user-based and SVD-based recommenders but also the heretofore highly robust item-based approach, using a novel multi-target attack vector.

  • Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers

    by Maria Pera and Yiu-Kai Ng

    The academic performance of students is affected by their reading ability, which explains why reading is one of the most important aspects of school curriculums. Promot- ing good reading habits among K-12 students is essential, given the enormous influence of reading on students’ de- velopment as learners and members of society. In doing so, it is indispensable to provide readers with engaging and motivating reading selections. Unfortunately, existing book recommenders have failed to offer adequate choices for K- 12 readers, since they either ignore the reading abilities of their users or cannot acquire the much-needed information to make recommendations due to privacy issues. To address these problems, we have developed Rabbit, a book recom- mender that emulates the readers’ advisory service offered at school/public libraries. Rabbit considers the readability levels of its readers and determines the facets, i.e., appeal factors, of books that evoke subconscious, emotional reac- tions on a reader. The design of Rabbit is unique, since it adopts a multi-dimensional approach to capture the reading abilities, preferences, and interests of its readers, which goes beyond the traditional book content/topical analysis. Con- ducted empirical studies have shown that Rabbit outper- forms a number of (readability-based) book recommenders.

  • Bayesian Binomial Mixture Model for Collaborative Prediction with Non-Random Missing Data

    by Yong-Deok Kim and Seungjin Choi

    In real-world datasets, the presence of a certain amount of missing data is inevitable. If the data is not missing at random (MAR), the missing mechanism can not be ignored, and has to be modelled precisely as to obtain correct results. However, although there are strong possibilities of violation of MAR assumption on rating datasets collected from recommendation systems, most prior researches on collaborative prediction ignore the missing data mechanism. Exceptions are recent works on multinomial mixture model with CPT-v and Logit-vd, which employ conditional Bernoulli selection models for the response variables. In this paper, we present a Bayesian binomial mixture model for collaborative prediction with non-random missing data. We consider three factors for reason of observation: user, item, and rating value. Each factor is modelled by Bernoulli random variable, and the observation of rating is determined by the OR operation of three binary variables. Because of the property of OR operation, one of three factors can override others, hence it naturally models the 80-20 rule, which commonly arise in the recommendation systems. We develop efficient variational inference algorithms with closed-form update rules for all variational parameters, where the computational complexity depends on the number of observation, not on the size of the rating data matrix. Finally, we present experimental results showing that 1) binomial mixture model is more suitable than multinomial mixture model for modelling discrete, finite, and ordered rating values; 2) our model find meaningful solutions instead of boundary solutions, if hyper-parameters are estimated by empirical Bayes; 3) our model can capture different rating trend between domain (e.g. songs and movies).

  • Beyond Clicks: Dwell Time for Personalization

    by Xing Yi, Liangjie Hong, Erheng Zhong, Nathan Liu and Suju Rajan

    Many internet companies, such as Yahoo, Facebook, Google and Twitter, rely on content recommendation systems to deliver the most relevant content items to individual users through personalization. Delivering such personalized user experiences is believed to increase the long term engagement of users. While there has been a lot of progress in designing effective personalized recommender systems, by exploiting user interests and historical interaction data through implicit (item click) or explicit (item rating) feedback, directly optimizing for users satisfaction with the system remains challenging. In this paper, we explore the idea of using item-level dwell time as a proxy to quantify how likely a content item is relevant to a particular user. We describe a novel method to compute accurate dwell time based on client-side and server-side logging and demonstrate how to normalize dwell time across different devices and contexts. In addition, we demonstrate how to incorporate dwell time into state-of-the-art learning to rank techniques and collaborative filtering models and obtain competitive performances in both offline and online settings. This paper is the first work to go beyond “clicks” for optimizing for large-scale content recommendation, and outlines a number of interesting future research directions.

  • Cold-start News Recommendation with Domain-dependent Browse Graph

    by Michele Trevisiol, Luca Maria Aiello, Rossano Schifanella and Alejandro Jaimes

    Online social networks and mash-up services create opportunities to connect different web services otherwise isolated. Specifically in the case of news, users are very much exposed to news articles while performing other activities, such as social networking or web searching. Browsing behaviour aimed to the consumption of news, especially in relation to the visits coming from other domains, has been mainly overlooked in previous work. To address that, we build a BrowseGraph out of the collective browsing traces extracted from a large viewlog of Yahoo News (0.5B entries) and we define the ReferrerGraph as its subgraph induced by the sessions with the same referrer domain. The structural and temporal properties of the graph show that browsing behavior in news is highly dependent on the referrer URL of the session, in terms of type of content consumed and time of consumption. We build on this observation and propose a news recommender that addresses the cold-start problem: given a user landing on a page of the site for the first time, we aim to predict the page she will visit next. We compare 24 flavors of recommenders belonging to the families of content-based, popularity-based, and browsing-based models. We show that the browsing-based recommender that take into account the referrer URL is the best performing, achieving a prediction accuracy of 61% in conditions of heavy data sparsity.

  • Comparative Recommender System Evaluation: Benchmarking Recommendation Frameworks

    by Alan Said and Alejandro Bellogin

    Recommender systems research is often based on comparisons of predictive accuracy: the better the evaluation scores, the better the recommender. However, it is difficult to compare results from different recommender systems due to the many options in design and implementation of an evaluation strategy. Additionally, algorithm implementations can diverge from the standard formulation due to manual tuning and modifications that work better in some situations. In this work we compare common recommendation algorithms as implemented in three popular recommendation frameworks. To provide a fair comparison, we have complete control of the evaluation dimensions being benchmarked: dataset, data splitting, evaluation strategies, and metrics. We also include results using the internal evaluation mechanisms of these frameworks. Our analysis points to large differences in recommendation accuracy across frameworks and strategies, i.e. the same baselines may perform orders of magnitude better or worse across frameworks. Our results show the necessity of clear guidelines when reporting evaluation of recommender systems to ensure reproducibility and comparison of results.

  • Context Adaptation in Interactive Recommender Systems

    by Negar Hariri, Bamshad Mobasher and Robin Burke

    Contextual factors can greatly influence the utility of recommendations for users. In many recommendation and personalization applications, particularly in domains where user context changes dynamically, it is difficult to represent and model contextual factors directly, but it is often possible to observe their impact on user preferences during the course of users’ interactions with the system. In this paper, we introduce an interactive recommender system that can detect and adapt to changes in context based on the user’s ongoing behavior. The system, then, dynamically tailors its recommendations to match the user’s most recent preferences. We formulate this problem as a multi-armed bandit problem and use Thompson sampling heuristic to learn a model for the user. Following the Thompson sampling approach, the user model is updated after each interaction as the system observes the corresponding rewards for the recommendations provided during that interaction. To generate contextual recommendations, the user’s preference model is monitored for changes at each step of interaction with the user and is updated incrementally. We will introduce a mechanism for detecting significant changes in the user’s preferences and will describe how it can be used to improve the performance of the recommender system.

  • Coverage, Redundancy and Size-Awareness in Genre Diversity for Recommender Systems

    by Saul Vargas, Linas Baltrunas, Alexandros Karatzoglou and Pablo Castells

    There is increasing awareness in the Recommender Systems field that diversity is as a key property that enhances the usefulness of recommendations. Genre information can serve as a means to measure and enhance the diversity of recommendations and is readily available in domains such as movies, music or books. In this work we propose a new Binomial framework for defining genre diversity in recommender systems that takes into account three key properties: genre coverage, genre redundancy and recommendation list size-awareness. We show that methods previously proposed for measuring and enhancing recommendation diversity -including those adapted from search result diversification- fail to address adequately these three properties. We also propose an efficient greedy optimization technique to optimize Binomial diversity. Experiments with the Netflix dataset show the properties of our framework and comparison with state of the art methods.

  • Ensemble Contextual Bandits for Personalized Recommendation

    by Liang Tang, Yexi Jiang, Lei Li and Tao Li

    The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from the cold-start problem. However, due to high-dimensional user/item features and the underlying characteristics of bandit policies, it is often difficult for service providers to obtain and deploy an appropriate algorithm to achieve acceptable and robust economic profit. In this paper, we explore ensemble strategies of multiple contextual bandit algorithms to obtain robust predicted click-through rate (CTR) of web objects. Specifically, the ensemble is acquired by aggregating different pulling policies of bandit algorithms, rather than forcing the agreement of prediction results or learning a unified predictive model. To this end, we employ a meta-bandit paradigm that places a hyper bandit over the base bandits, to explicitly explore/exploit the relative importance of base bandits based on user feedbacks. Extensive empirical experiments on two real-world data sets (news recommendation and online advertising) demonstrate the effectiveness of our proposed approach in terms of CTR.

  • Evaluating Recommender Behavior For New Users

    by Daniel Kluver and Joseph Konstan

    The new user experience is one of the important problems in recommender systems. Past work on recommending for new users has focused on the process of gathering information from the user. Our work focuses on how different algorithms behave for new users. We describe a methodology that we use to compare representatives of three common families of algorithms along eleven different metrics. We find that for the first few ratings a baseline algorithm performs better than three common collaborative filtering algorithms. Once we have a few ratings, we find that Funk’s SVD algorithm has the best overall performance. We also find that ItemItem, a very commonly deployed algorithm, performs very poorly for new users. Our results can inform the design of interfaces and algorithms for new users.

  • Exploiting Sentiment Homophily for Link Prediction

    by Guangchao Yuan, Pradeep Murukannaiah, Zhe Zhang and Munindar Singh

    Link prediction system has been extensively studied and adopted to recommendation systems on social media. With the increasing popularity of sentiment analysis on social network, knowing the relationship between users’ sentiments and link prediction is important. In this paper, we study how to exploit sentiment homophily in link prediction. We have gathered political campaign dataset of one-month on Twitter. We define a set of sentiment-based features that quantify the likelihood of two users becoming friends based on their sentiments toward topics. Our evaluation in a supervised learning framework demonstrates the benefits of sentiment-based features in link prediction. Further, Adamic-Adar and Euclidean distance based measures are the best predictors. We propose a factor graph model that incorporates the sentiment-based cognitive balance theory. Our evaluation shows how our model offers help in link prediction on different kinds of graphs, compared to traditional machine learning techniques. Our work offers new insights for real-world link recommendation systems.

  • Exploiting Temporal Influence in Online Recommendation

    by Robert Palovics, Andras Benczur, Tamas Kiss, Levente Kocsis and Erzsebet Frigo

    In this paper we give methods for time aware music recommendation in a social media service with the potential of exploiting immediate temporal influences between users. We consider events when a user listens to an artist the first time and this event follows some friend listening to the same artist short time before. We train a blend of matrix factorization methods that model the relation of the influencer, the influenced and the artist, both the individual factor decompositions and their weight learned by variants of stochastic gradient descent (SGD). Special care is taken since events of influence form a subset of the positive implicit feedback data and hence we have to cope with two different definitions of the positive and negative implicit training data. In addition, in the time aware setting we have to use online learning and evaluation methods. While SGD can easily be trained online, evaluation is cumbersome by traditional measures since we will have potentially different top recommendations at different times. Our experiments are carried over the two-year scrobble history of 70,000 Last.fm users and show a 4% increase in recommendation quality by predicting temporal influences.

  • Explore-Exploit in Top-N Recommender Systems via Gaussian Processes

    by Hastagiri Prakash Vanchinathan, Isidor Nikolic, Fabio De Bona and Andreas Krause

    We address the challenge of ranking recommendation lists based on click feedback by efficiently encoding similarities among users and among items. The key challenges are threefold: (1) combinatorial number of lists; (2) sparse feedback and (3) context dependent recommendations. We propose the CGPRANK algorithm, which exploits prior information specified in terms of a Gaussian process kernel function, which allows to share feedback in three ways: Between positions in a list, between items, and between contexts. Under our model, we provide strong performance guarantees and empirically evaluate our algorithm on data from two large scale recommendation tasks: Yahoo! news article recommendation, and Google books. In our experiments, our CGPRANK approach significantly outperforms state-of-the-art multi-armed bandit and learning-to-rank methods, with an 18% increase in clicks.

  • Factored MDPs for Detecting Topics of User Sessions

    by Maryam Tavakol and Ulf Brefeld

    Recommender systems aim to capture interests of users to provide tailored recommendations. User interests are however often unique and depend on many unobservable factors including a user’s mood and the local weather. We take a contextual session-based approach and propose a sequential framework using factored Markov decision processes (fMDPs) to detect the user’s goal (the topic) of a session. We show that an independence assumption on the attributes of items leads to a set of independent models that can be optimised efficiently. Our approach results in interpretable topics that can be effectively turned into recommendations. Empirical results on a real world click log from a large e-commerce company exhibit highly accurate topic prediction rates of about 90%. Translating our approach into a topic-driven recommender system outperforms collaborative filtering methods by one order of magnitude.

  • GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Completion via Graph Partitioning

    by Fabio Petroni and Leonardo Querzoni

    Matrix completion latent factors models are known to be an effective method to build recommender systems. Currently, stochastic gradient descent (SGD) is considered one of the best latent factor-based algorithm for matrix completion. In this paper we discuss GASGD, a distributed asynchronous variant of SGD for large-scale matrix completion, that (i) leverages data partitioning schemes based on graph partitioning techniques, (ii) exploits specific characteristics of the input data and (iii) introduces an explicit parameter to tune synchronization frequency among the computing nodes. We empirically show how, thanks to these features, GASGD achieves a fast convergence rate incurring in smaller communication cost with respect to current asynchronous distributed SGD implementations.

  • Gradient Boosting Factorization Machines

    by Chen Cheng, Fen Xia, Tong Zhang, Irwin King and Michael Lyu

    Recommendation techniques have been well developed in the past decades. Most of them build models only based on user item rating matrix. However, in real world, there is plenty of auxiliary information available in recommendation systems. We can utilize these information as additional features to improve recommendation performance. We refer recommendation with auxiliary information as context-aware recommendation. Context-aware Factorization Machines (FM) is one of the most successful context-aware recommendation models. FM models pairwise interactions between all features, in such way, a certain feature latent vector is shared to compute the factorized parameter it involved. In practice, there are tens of context features and not all the pairwise feature interactions are useful. Thus, one important challenge for context-aware recommendation is how to effectively select “good” interaction features. In this paper, we focus on solving this problem and propose a greedy interaction feature selection algorithm based on gradient boosting. Then we propose a novel Gradient Boosting Factorization Machine (GBFM) model to incorporate feature selection algorithm with Factorization Machines into a unified framework. The experimental results on both synthetic and real datasets demonstrate the efficiency and effectiveness of our algorithm compared to other state-of-the-art methods.

  • Improving Sales Diversity by Recommending Users to Items

    by Saul Vargas and Pablo Castells

    Sales diversity is considered a key feature of Recommender Systems from a business perspective. Sales diversity is also linked with the long-tail novelty of recommendations, a quality dimension from the user perspective. We explore the inversion of the recommendation task as a means to enhance sales diversity -and indirectly novelty -by selecting which users an item should be recommended to instead of the other way around. We address the inverted task by two approaches: a) inverting the rating matrix, and b) defining a probabilistic reformulation which isolates the popularity component of arbitrary recommendation algorithms. We find that the first approach gives rise to interesting reformulations of nearest-neighbor algorithms, which essentially introduce a new neighbor selection policy. The second approach, as well as the first, ultimately result in substantial sales diversity enhancements, and improved trade-offs with recommendation precision and novelty. Two experiments on movie and music recommendation datasets show the effectiveness of the resulting approach, even when compared to direct optimization approaches of the target metrics proposed in prior work.

  • Improving The Discriminative Power Of Inferred Content Information Using Segmented Virtual Profile

    by Haishan Liu, Anuj Goyal, Trevor Walker and Anmol Bhasin

    We present a novel component of a hybrid recommender system at LinkedIn, where item features are augmented by a virtual profile based on observed user-item interactions. The concept of virtual profiles is generating a representation of an item in the user feature space by leveraging the over-represented user features from users that interacted with the item. It is a way to think about Collaborative Filtering with content features. The core principle is that if the feature occurs with high probability for the users who interacted with an item (henceforth termed as relevant users) versus those who did not (henceforth termed non-relevant users), then that feature is a good candidate to be included in the virtual profile of the item in question. However this scheme suffers from the data imbalance problem, given that observed relevant users are usually an extremely small minority group compared to the whole user base. Feature selection in this skewed setting is prone to noise from the overwhelming non-relevant examples that belong to the majority class. To alleviate the problem, we propose a method to select the most relevant non-relevant examples from the majority class by segmenting users on certain intelligently selected feature dimensions. The resulting virtual profile from the method is called the segmented virtual profile. Empirical evaluation on real-world large scale recommender system at LinkedIn shows that simple strategies for the segmentation yield significantly better performance.

  • Item Cold-Start Recommendations: Learning Local Collective Embeddings

    by Martin Saveski and Amin Mantrach

    Recommender systems suggest to users items that they might like (e.g., news articles, songs, movies) and, in doing so, they help users deal with information overload and enjoy a personalized experience. One of the main problems of these systems is the item cold-start, i.e., when a new item is introduced in the system and no past information is available, then no effective recommendations can be produced. The item cold-start is a very common problem in practice: modern online platforms have hundreds of new items published every day. To address this problem, we propose to learn Local Collective Embeddings — a matrix factorization that exploits items’ properties and past user preferences while enforcing the manifold structure exhibited by the collective embeddings. We present a learning algorithm based on multiplicative update rules that are efficient and easy to implement. Experiments on two item cold-start use cases: news recommendation and email recipient recommendation, demonstrate the effectiveness of this approach and show that it significantly outperforms six state-of-the-art methods for item cold-start.

  • LinkedIn Skills: Large-Scale Topic Extraction and Inference

    by Mathieu Bastian, Matthew Hayes, William Vaughan, Sam Shah, Peter Skomoroch, Sal Uryasev, Hyungjin Kim and Christopher Lloyd

    “Skills and Expertise” is a data-driven feature on LinkedIn, the world’s largest professional online social network, which allows members to tag themselves with topics representing their areas of expertise. In this work, we present our experiences developing this large-scale topic extraction pipeline, which includes constructing a folksonomy of skills and expertise and implementing an inference and recommender system for skills. We also discuss a consequent set of applications, such as Endorsements, which allows members to tag themselves with topics representing their areas of expertise and for their connections to provide social proof, via an “endorse” action, of that member’s competence in that topic.

  • Offline and Online Evaluation of News Recommender Systems at swissinfo.ch

    by Florent Garcin, Boi Faltings, Olivier Donatsch, Ayar Alazzawi, Christophe Bruttin and Amr Huber

    We report on the live evaluation of various news recommender systems conducted on the website swissinfo.ch. We demonstrate that there is a major difference between offline and online accuracy evaluations. In an offline setting, recommending most popular stories is the best strategy, while in a live environment this strategy is the poorest. For online setting, context-tree recommender systems which profile the users in real-time improve the click-through rate by up to 35%. The visit length also increases by a factor of 2.5. Our experience holds important lessons for the evaluation of recommender systems with offline data as well as for the use of the click-through rate as a performance indicator.

  • On Over-Specialization and Concentration Biases of Recommendations: Probabilistic Neighborhood Selection in Collaborative Filtering Systems

    by Panagiotis Adamopoulos and Alexander Tuzhilin

    Focusing on the problems of over-specialization and concentration bias, this paper presents a novel probabilistic method for recommending items in the neighborhood-based collaborative filtering framework. For the probabilistic neighborhood selection phase, we use an efficient method for weighted sampling of k neighbors that takes into consideration the similarity levels between the target user (or item) and the candidate neighbors. We conduct an empirical study showing that the proposed method increases the diversity, dispersion, and mobility of recommendations by selecting diverse sets of neighbors. We also demonstrate that the proposed method outperforms popular methods in terms of item prediction accuracy, utility-based ranking, and other measures, across various experimental settings. This performance improvement is in accordance with ensemble learning theory and the phenomenon of “hubness” in recommender systems.

  • Question Recommendation for Collaborative Question Answering Systems with RankSLDA

    by Jose San Pedro and Alexandros Karatzoglou

    Collaborative question answering (CQA) communities rely on user participation for their success. This paper presents a supervised Bayesian approach to model expertise in on-line CQA communities with application to question recommendation, aimed at reducing waiting times for responses and avoiding question starvation. We propose a novel algorithm called RankSLDA which extends the supervised Latent Dirichlet Allocation (sLDA) model by considering a learning-to-rank paradigm. This allows us to exploit the inherent collaborative effects that are present in CQA communities where users tend to answer questions in their topics of expertise. Users can thus be modeled on the basis of the topics in which they demonstrate expertise. In the supervised stage of the method we model the pairwise order of expertise of users on a given question. We compare RankSLDA against several alternative methods on data from the Cross Validate community, part of the Stack Exchange CQA network. RankSLDA outperforms all alternative methods by a significant margin.

  • Question Recommendation with Constraints for Massive Open Online Courses

    by Diyi Yang, David Adamson and Carolyn Rose

    Massive Open Online Courses (MOOCs) have experienced a recent boom in interest. Although the number of students that have registered for MOOCs is remarkably high, the fraction of those who actively participate in course discussion forums is startlingly low. By recommending relevant forum discussions and questions to students, their engagement and participation may increase, to the benefit of both the student and the course community. This problem has not been thoroughly explored by existing recommender systems. In contrast to traditional product recommendation, question recommendation in discussion forums should consider constraints on both students and questions. These considerations include (1)Load Balancing – students should not be over-burdened with too many requests; and (2) Expertise Matching – matching students’ abilities to the difficulty of unanswered questions, which in turn positions students to contribute meaningfully to the forum. In this work, we propose a novel constrained question recommendation problem to address the above considerations, with the intent to improve the learning experience for course participants. We first design a context-aware matrix factorization model to predict students’ preferences over questions, then build a max cost flow model to address the constraints. Experimental results on three MOOC datasets demonstrate that our method significantly outperforms baseline methods in optimizing overall forum welfare, and in predicting which questions students might be interested in.

  • Ratings Meet Reviews, a Combined Approach to Recommend

    by Guang Ling, Michael Lyu and Irwin King

    Most existing recommender systems focus on modeling the ratings while ignoring the abundant information embedded in the review text. In this paper, we propose a unified model that combines content-based filtering with collaborative filtering, harnessing the information of both ratings and reviews. We apply topic modeling techniques on the review text and align the topics with rating dimensions to improve prediction accuracy. With the information embedded in the review text, we can alleviate the cold-start problem. Furthermore, our model is able to learn latent topics that are interpretable. With these interpretable topics, we can explore the prior knowledge on items or users and recommend completely “cold” items. Empirical study on 27 classes of real-life datasets show that our proposed model lead to significant improvement compared with strong baseline methods, especially for datasets which are extremely sparse where rating-only methods cannot make accurate predictions.

  • Recommending User Generated Item Lists

    by Yidan Liu, Min Xie and Laks V.S. Lakshmanan

    Existing recommender systems mostly focus on recommending individual items which users may be interested in. User-generated item lists on the other hand have become a popular feature in many applications. E.g., Goodreads provides users with an interface for creating and sharing interesting book lists. These user-generated item lists complement the main functionality of the corresponding application, and intuitively become an alternative way for users to browse and discover interesting items to be consumed. Unfortunately, existing recommender systems are not designed for recommending user-generated item lists. In this work, we study properties of these user-generated item lists and propose a Bayesian ranking model, called \LRM for recommending them. The proposed \LRM model takes into consideration users’ previous interactions with both item lists and with individual items. Furthermore, we propose in \LRM a novel way of weighting items within item lists based on both position of items, and personalized list consumption pattern. Through extensive experiments on real item list dataset from Goodreads, we demonstrate the effectiveness of our proposed \LRM model.

  • Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization

    by Smriti Bhagat, Udi Weinsberg, Stratis Ioannidis and Nina Taft

    Recommender systems leverage user demographic information, such as age, gender, etc., to personalize recommendations and better place their targeted ads. Oftentimes, users do not volunteer this information due to privacy concerns, or due to a lack of initiative in filling out their online profiles. We illustrate a new threat in which a recommender learns private attributes of users who do not voluntarily disclose them. We design both passive and active attacks that so- licit ratings for strategically selected items, and could thus be used by a recommender system to pursue this hidden agenda. Our methods are based on a novel usage of Bayesian matrix factorization in an active learning setting. Evaluations on multiple datasets illustrate that such attacks are indeed feasible and use significantly fewer rated items than static inference methods. Importantly, they succeed without sacrificing the quality of recommendations to users.

  • Social Influence Bias in Recommender Systems: A Methodology for Learning, Analyzing, and Mitigating Bias in Ratings

    by Sanjay Krishnan, Jay Patel, Michael Franklin, and Ken Goldberg

    To facilitate browsing and selection, almost all recommender systems display an aggregate statistic (the average/mean or median rating value) for each item. This value has potential to influence a participant’s individual rating for an item due to what is known in the survey and psychology literature as Social Influence Bias; the tendency for individuals to conform to what they perceive as the norm in a community. As a result, ratings can be closer to the average and less diverse than they would be otherwise. We propose a methodology to 1) learn, 2) analyze, and 3) mitigate the effect of social influence bias in recommender systems. In the Learning phase, a baseline dataset is established with an initial set of participants by allowing them to rate items twice: before seeing the median rating, and again after seeing it. In the Analysis phase, a new non-parametric significance test based on the Wilcoxon statistic can quantify the extent of social influence bias in this data. If this bias is significant, we propose a Mitigation phase where mathematical models are constructed from this data using polynomial regression and the Bayesian Information Criterion (BIC) and then inverted to produce a filter that can reduce the effect of social influence bias. As a case study, we apply this methodology to the California Report Card (CRC), a new recommender system that encourages political engagement. After the Learning phase collected 9390 ratings, the non-parametric test in the Analysis phase rejected the null hypothesis, identifying significant social influence bias: ratings after display of the median were on average 19.3% closer to the median value. In the Mitigating phase, the learned polynomial models were able to predict changed ratings with a normalized RMSE of 12.8% and reduce bias by 76.3%. Results suggest that social influence bias can be significant in recommender systems and that this bias can be substantially reduced with machine learning.

  • Speeding Up the Xbox Recommender System Using a Euclidean Transformation for Inner-Product Spaces

    by Yoram Bachrach, Yehuda Finkelstein, Ran Gilad-Bachrach, Liran Katzir, Noam Koenigstein, Nir Nice and Ulrich Paquet

    A prominent approach in collaborative filtering based recommender systems is using dimensionality reduction (matrix factorization) techniques to map users and items into low-dimensional vectors. In such systems, a higher inner product between a user vector and an item vector indicates that the item better suits the user’s preference. Traditionally, retrieving the most suitable items was done by scoring and sorting all items. Real world online recommender systems must adhere to strict response-time constraints. Therefore, when the number of items is too large, scoring all items becomes infeasible. We propose a novel order preserving transformation, mapping the maximum inner product search problem to Euclidean space nearest neighbor search problem. Utilizing this transformation, we study the efficiency of several (approximate) nearest neighbor data structures. Our final solution is based on a novel use of the PCA-Tree data structure in which results are augmented using paths one hamming distance away from the query (neighborhood boosting). The end result is a system which allows approximate matches (items with relatively high inner product, but not necessarily the highest one). We evaluate our techniques on two large-scale recommendation datasets, Xbox Movies and Yahoo~Music, and show that this technique allow trading off a slight degradation in the recommendation quality for a significant improvement in the retrieval time.

  • Towards a Dynamic Top-N Recommendation Framework

    by Xin Liu

    Real world large-scale recommender systems are always dynamic: new users and items continuously enter the system, and the status of old ones (e.g., users’ preference and items’ popularity) evolve over time. In order to handle such dynamics, we propose a recommendation framework consisting of an online component and an offline component, where the newly arrived items are processed by the online component such that users are able to get suggestions for fresh information, and the influence of longstanding items is captured by the offline component. Based on individual users’ past rating behavior, recommendations from the two components are combined to provide top-N recommendation. We formulate recommendation problem as a ranking problem where learning to rank is applied to extend upon a latent factor model to optimize recommendation rankings by minimizing a pairwise loss function. Furthermore, to more accurately model interactions between users and items, Latent Dirichlet Allocation is incorporated to fuse rating information and textual information. Real data based experiments demonstrate that our approach outperforms the state-of-the-art models by at least 61.21% and 50.27% in terms of mean average precision (MAP) and normalized discounted cumulative gain (NDCG) respectively.

  • Unifying Nearest Neighbors Collaborative Filtering

    by Koen Verstrepen and Bart Goethals

    We study collaborative filtering for applications in which there exists for every user a set of items about which the user has given binary, positive-only feedback (one-class collaborative filtering). Take for example an on-line store that knows all past purchases of every customer. An important class of algorithms for one-class collaborative filtering are the nearest neighbors algorithms, typically divided into user-based and item-based algorithms. We introduce a reformulation that unifies user- and item-based nearest neighbors algorithms and use this reformulation to propose a novel algorithm that incorporates the best of both worlds and outperforms state-of-the-art algorithms. Additionally, we propose a method for naturally explaining the recommendations made by our algorithm and show that this method is also applicable to existing user-based nearest neighbors methods.

  • User Perception of Differences in Movie Recommendation Algorithms

    by Michael Ekstrand, F. Maxwell Harper, Martijn Willemsen and Joseph Konstan

    Recent developments in user evaluation of recommender systems have brought forth powerful new tools for understanding what makes recommendations effective and useful. We apply these methods to understand how users evaluate recommendation lists for the purpose of selecting an algorithm for finding movies. This paper reports on an experiment in which we asked users to compare lists produced by three common collaborative filtering algorithms on the dimensions of novelty, diversity, accuracy, satisfaction, and degree of personalization, and to select a recommender that they would like to use in the future. We find that satisfaction is negatively dependent on novelty and positively dependent on diversity in this setting, and that satisfaction predicts the user’s final selection. We also compare users’ subjective perceptions of recommendation properties with objective measures of those same characteristics. To our knowledge, this is the first study that applies modern survey design and analysis techniques to a within-subjects, direct comparison study of recommender algorithms.

More details can be found in the Program

Diamond Supporters
 
Platinum Supporters
 
 
 
 
Gold Supporters
 
 
Silver Supporter