The challenge of building a good recommendation system is deeply connected to missing data---unknown features and labels to suggest the most "valuable" items to the user. The mysterious properties of the power law distributions that generally arises out of recommender (and social systems in general) create skewed and long-tailed consumption patterns that are often still puzzling to many of us. Missing data and skewed distributions create not just accuracy and recall problems, but also capacity allocation problems, which are at the roots of recent debate on inclusiveness and responsibility. So how do we move forward in the face of these immense conceptual and practical issues?
In our work, we have been asking ourselves ways to deriving insights from first principles and drawing inspiration from fields like statistical physics. Surprised, one might ask---what does the field of physics has to do with missing data in ranking and recommendations? As we all know, in the field of information systems, concepts like information entropy and probability have a rich intellectual history. This history is deeply connected to the greatest discoveries of science in the 19th century---statistical mechanics, thermodynamics, and specific concepts like thermal equilibrium.
In this talk, I will take us on a journey connecting Boltzmann distribution and partition functions from statistical mechanics with importance weighting for learning better softmax functions, and then further to reinforcement learning, where we can plan better explorations using off-policy correction with policy gradient approaches. As I shall show, these techniques enable us to reason about missing data features, labels, and time dynamic patterns from our data.
The fashion domain is a magnet for computer vision. New vision problems are emerging in step with the fashion industry's rapid evolution towards an online, social, and personalized business. Style models, trend forecasting, interactive search, and recommendation all require visual understanding with rich detail and subtlety. As a result, research in this area is poised to have great influence on how people shop, how the fashion industry analyzes its enterprise, and how we model the cultural trends revealed by what people wear.
In this talk, I will present our work over the last few years developing computer vision methods for fashion. To begin, we explore how to discover styles from Web photos, learning how people assemble their outfits and the latent themes they share. Leveraging such styles, we show how to infer compatibility of new garments, optimize personalized mix-and-match capsule wardrobes, suggest minimal edits to make an outfit more fashionable, and recommend clothing that flatters diverse human body shapes.
Next, turning to the world stage, we investigate fashion forecasting and influence. Given photos of fashion products, we learn to forecast what looks and styles will be popular in the future. We further boost those forecasts by modeling the spatio-temporal style influences between 44 major world cities. Throughout, by learning models from unlabeled Web photos, our approaches sidestep subjective manual annotations in favor of direct observations of what people choose to wear.
Veridical data science extracts reliable and reproducible information from data, with an enriched technical language to communicate and evaluate empirical evidence in the context of human decisions and domain knowledge. Building and expanding on principles of statistics, machine learning, and the sciences, we propose the predictability, computability, and stability (PCS) framework forveridical data science. Our framework is comprised of both a workflow and documentation and aims to provide responsible, reliable, reproducible, and transparent results across the entire data science life cycle. Moreover, we propose the PDR desiderata for interpretable machine learning as part of veridical data science (with PDR standing for predictive accuracy, predictive accuracy and relevancy to a human audience and a particular domain problem).
The PCS framework will be illustrated through the development of the DeepTune framework for characterizing V4 neurons. DeepTune builds predictive models using DNNs and ridge regression and applies the stability principle to obtain stable interpretations of 18 predictive models. Finally, a general DNN interpretaion method based on contexual decomposition (CD) will be discussed with applications to sentiment analysis and cosmological parameter estimation.
The word "deep learning" is generally regarded as a synonym of "deep neural networks (DNNs)". In this talk, we will discuss on essentials in deep learning and claim that deep learning is not necessarily to be realized by neural networks and differentiable modules. We will then present an exploration to non-NN style deep learning, where the building blocks are non-differentiable modules and the training process does not rely on backpropagation or gradient-based adjustment. We will also talk about some recent advances and challenges in this direction of research.
Product search forms an indispensable component of any e-commerce service, and helps customers find products of their interest from a large catalog on these websites. When products that are irrelevant to the search query are surfaced, it leads to a poor customer experience, thus reducing user trust and increasing the likelihood of churn. While identifying and removing such results from product search is crucial, doing so is a burdensome task that requires large amounts of human annotated data to train accurate models. This problem is exacerbated when products are cross-listed across countries that speak multiple languages, and customers specify queries in multiple languages and from different cultural contexts. In this work, we propose a novel multi-lingual multi-task learning framework, to jointly train product search models on multiple languages, with limited amount of training data from each language. By aligning the query and product representations from different languages into a language-independent vector space of queries and products, respectively, the proposed model improves the performance over baseline search models in any given language. We evaluate the performance of our model on real data collected from a leading e-commerce service. Our experimental evaluation demonstrates up to 23% relative improvement in the classification F1-score compared to the state-of-the-art baseline models.
In this paper, we study the graph-based semi-supervised learning for classifying nodes in attributed networks, where the nodes and edges possess content information. Recent approaches like graph convolution networks and attention mechanisms have been proposed to ensemble the first-order neighbors and incorporate the relevant neighbors. However, it is costly (especially in memory) to consider all neighbors without a prior differentiation. We propose to explore the neighborhood in a reinforcement learning setting and find a walk path well-tuned for classifying the unlabelled target nodes. We let an agent (of node classification task) walk over the graph and decide where to move to maximize classification accuracy. We define the graph walk as a partially observable Markov decision process (POMDP). The proposed method is flexible for working in both transductive and inductive setting. Extensive experiments on four datasets demonstrate that our proposed method outperforms several state-of-the-art methods. Several case studies also illustrate the meaningful movement trajectory made by the agent.
Attributed network embedding is the task to learn a lower dimensional vector representation of the nodes of an attributed network, which can be used further for downstream network mining tasks. Nodes in a network exhibit community structure and most of the network embedding algorithms work well when the nodes, along with their attributes, adhere to the community structure of the network. But real life networks come with community outlier nodes, which deviate significantly in terms of their link structure or attribute similarities from the other nodes of the community they belong to. These outlier nodes, if not processed carefully, can even affect the embeddings of the other nodes in the network. Thus, a node embedding framework for dealing with both the link structure and attributes in the presence of outliers in an unsupervised setting is practically important. In this work, we propose a deep unsupervised autoencoders based solution which minimizes the effect of outlier nodes while generating the network embedding. We use both stochastic gradient descent and closed form updates for faster optimization of the network parameters. We further explore the role of adversarial learning for this task, and propose a second unsupervised deep model which learns by discriminating the structure and the attribute based embeddings of the network and minimizes the effect of outliers in a coupled way. Our experiments show the merit of these deep models to detect outliers and also the superiority of the generated network embeddings for different downstream mining tasks. To the best of our knowledge, these are the first unsupervised non linear approaches that reduce the effect of the outlier nodes while generating Network Embedding.
Recommendation is one of the critical applications that helps users find information relevant to their interests. However, a malicious attacker can infer users' private information via recommendations. Prior work obfuscates user-item data before sharing it with recommendation system. This approach does not explicitly address the quality of recommendation while performing data obfuscation. Moreover, it cannot protect users against private-attribute inference attacks based on recommendations. This work is the first attempt to build a Recommendation with Attribute Protection (RAP) model which simultaneously recommends relevant items and counters private-attribute inference attacks. The key idea of our approach is to formulate this problem as an adversarial learning problem with two main components: the private attribute inference attacker, and the Bayesian personalized recommender. The attacker seeks to infer users' private-attribute information according to their items list and recommendations. The recommender aims to extract users' interests while employing the attacker to regularize the recommendation process. Experiments show that the proposed model both preserves the quality of recommendation service and protects users against private-attribute inference attacks.
Network embedding, that aims to learn low-dimensional vector representation of nodes such that the network structure is preserved, has gained significant research attention in recent years. However, most state-of-the-art network embedding methods are computationally expensive and hence unsuitable for representing nodes in billion-scale networks. In this paper, we present LouvainNE, a hierarchical clustering approach to network embedding. Precisely, we employ Louvain, an extremely fast and accurate community detection method, to build a hierarchy of successively smaller subgraphs. We obtain representations of individual nodes in the original graph at different levels of the hierarchy, then we aggregate these representations to learn the final embedding vectors. Our theoretical analysis shows that our proposed algorithm has quasi-linear run-time and memory complexity. Our extensive experimental evaluation, carried out on multiple real-world networks of different scales, demonstrates both (i) the scalability of our proposed approach that can handle graphs containing tens of billions of edges, as well as (ii) its effectiveness in performing downstream network mining tasks such as network reconstruction and node classification.
\beginabstract We analyze comparative questions, i.e., questions asking to compare different items, that were submitted to Yandex in 2012. Responses to such questions might be quite different from the simple "ten blue links'' and could, for example, aggregate pros and cons of the different options as direct answers. However, changing the result presentation is an intricate decision such that the classification of comparative questions forms a highly precision-oriented task.
From a year-long Yandex log, we annotate a random sample of 50,000~questions; 2.8%~of which are comparative. For these annotated questions, we develop a precision-oriented classifier by combining carefully hand-crafted lexico-syntactic rules with feature-based and neural approaches---achieving a recall of~0.6 at a perfect precision of~1.0. After running the classifier on the full year log (on average, there is at least one comparative question per second), we analyze 6,250~comparative questions using more fine-grained subclasses (e.g., should the answer be a "simple'' fact or rather a more verbose argument) for which individual classifiers are trained. An important insight is that more than 65%~of the comparative questions demand argumentation and opinions, i.e., reliable direct answers to comparative questions require more than the facts from a search engine's knowledge graph.
In addition, we present a qualitative analysis of the underlying comparative information needs (separated into 14~categories likeconsumer electronics orhealth ), their seasonal dynamics, and possible answers from community question answering platforms. \endabstract
Learning to Rank, a central problem in information retrieval, is a class of machine learning algorithms that formulate ranking as an optimization task. The objective is to learn a function that produces an ordering of a set of documents in such a way that the utility of the entire ordered list is maximized. Learning-to-rank methods do so by learning a function that computes a score for each document in the set. A ranked list is then compiled by sorting documents according to their scores. While such a deterministic mapping of scores to permutations makes sense during inference where stability of ranked lists is required, we argue that its greedy nature during training leads to less robust models. This is particularly problematic when the loss function under optimization---in agreement with ranking metrics---largely penalizes incorrect rankings and does not take into account the distribution of raw scores. In this work, we present a stochastic framework where, instead of a deterministic derivation of permutations from raw scores, permutations are sampled from a distribution defined by raw scores. Our proposed sampling method is differentiable and works well with gradient descent optimizers. We analytically study our proposed method and demonstrate when and why it leads to model robustness. We also show empirically, through experiments on publicly available learning-to-rank datasets, that the application of our proposed method to a class of ranking loss functions leads to significant model quality improvements.
Predicting the popularity of online content on social platforms is an important task for both researchers and practitioners. Previous methods mainly leverage demographics, temporal and structural patterns of early adopters for popularity prediction. However, most existing methods are less effective to precisely capture the cascading effect in information diffusion, in which early adopters try to activate potential users along the underlying network. In this paper, we consider the problem of network-aware popularity prediction, leveraging both early adopters and social networks for popularity prediction. We propose to capture the cascading effect explicitly, modeling the activation state of a target user given the activation state and influence of his/her neighbors. To achieve this goal, we propose a novel method, namely CoupledGNN, which uses two coupled graph neural networks to capture the interplay between node activation states and the spread of influence. By stacking graph neural network layers, our proposed method naturally captures the cascading effect along the network in a successive manner. Experiments conducted on both synthetic and real-world Sina Weibo datasets demonstrate that our method significantly outperforms the state-of-the-art methods for popularity prediction.
One emerging benefit of voice assistants is to facilitate product search experience, allowing users to express orally which products they seek, and taking actions on retrieved results such as adding them to their cart or sending the product details to their mobile phone for further examination. Looking at users' behavior in product search, supported by a digital voice assistant, we have observed an interesting phenomenon where users purchase or engage with search results that are objectively judged irrelevant to their queries.
In this work, we analyze and characterize this phenomenon. We provide several hypotheses as to the reasons behind it, including users' personalized preferences, the product's popularity, the product's indirect relation with the query, the user's tolerance level, the query intent, and the product price. We address each hypothesis by conducting thorough data analyses and offer some insights with respect to users' purchase and engagement behavior with seemingly irrelevant results. We conclude with a discussion on how this analysis can be used to improve voice product search services.
To better exploit the search logs, various click models have been proposed to extract implicit relevance feedback from user clicks. Most traditional click models are based on probability graphical models (PGMs) with manually designed dependencies. Recently, some researchers also adopt neural-based methods to improve the accuracy of click prediction. However, most of the existing click models only model user behavior in query level. As the previous iterations within the session may have an impact on the current search round, we can leverage these behavior signals to better model user behaviors. In this paper, we propose a novel neural- based Context-Aware Click Model (CACM) for Web search. CACM consists of a context-aware relevance estimator and an examination predictor. The relevance estimator utilizes session context infor- mation, i.e., the query sequence and clickthrough data, as well as the pre-trained embeddings learned from a session-flow graph to estimate the context-aware relevance of each search result. The examination predictor estimates the examination probability of each result. We further investigate several combination functions to integrate the context-aware relevance and examination probabil- ity into click prediction. Experiment results on a public Web search dataset show that CACM outperforms existing click models in both relevance estimation and click prediction tasks.
Retrieval effectiveness in information retrieval systems is heavily dependent on how various parameters are tuned. One option to find these parameters is to run multiple online experiments and using a parameter sweep approach in order to optimize the search system. There are multiple downsides of this approach, mainly that it may lead to a poor experience for users. Another option is to do offline evaluation, which can act as a safeguard against potential quality issues. Offline evaluation requires a validation set of data that can be benchmarked against different parameter settings. However, for search over personal corpora, e.g. email and file search, it is impractical and often impossible to get a complete representative validation set, due to the inability to save raw queries and document information. In this work, we show how to do offline parameter tuning with only a partial validation set. In addition, we demonstrate how to do parameter tuning in the cases when we have complete knowledge of the internal implementation of the search system (white-box tuning), as well as the case where we have only partial knowledge (grey-box tuning). This has allowed us to do offline parameter tuning in a privacy-sensitive manner.
Mobility prediction, which is to predict where a user will arrive based on the user's historical mobility records, has attracted much attention. We argue that it is more useful to know not only where but also when a user will arrive next in many scenarios such as targeted advertising and taxi service. In this paper, we propose a novel context-aware deep model called DeepJMT for jointly performing mobility prediction (to know where) and time prediction (to know when). The DeepJMT model consists of (1) a hierarchical recurrent neural network (RNN) based sequential dependency encoder, which is more capable of capturing a user's mobility regularities and temporal patterns compared to vanilla RNN based models; (2) a spatial context extractor and a periodicity context extractor to extract location semantics and the user's periodicity, respectively; and (3) a co-attention based social & temporal context extractor which could extract the mobility and temporal evidence from social relationships. Experiments conducted on three real-world datasets show that DeepJMT outperforms the state-of-the-art mobility prediction and time prediction methods.
While social networks have increased the diversity of ideas and information available to users, they are also blamed for increasing the polarization of user opinions. Eli Pariser's "filter bubble" hypothesis  explains this counterintuitive phenomenon by linking user polarization to algorithmic filtering: to increase user engagement, social media companies connect users with ideas they are already likely to agree with, thus creating echo chambers of users with very similar beliefs.
In this paper, we introduce a mathematical framework to assess the impact of this popular, yet unverified, hypothesis. We augment the classical Friedkin-Johnsen opinion dynamics model to include algorithmic filtering by introducing a network administrator --- an external actor that models social media companies by dynamically adjusting the strength of edges in a social network graph. When the network administrator is incentivized to reduce disagreement among interacting users, we experimentally demonstrate on networks from Reddit and Twitter that even small changes by the administrator to social network graphs can increase user polarization. We support our experiments with theoretical results by showing that social networks generated from the stochastic block model are provably sensitive to algorithmic filtering. Finally, we propose a simple modification to the incentives of the network administrator that limits the filter bubble effect without significantly affecting user engagement.
Fiction and fantasy are archetypes of long-tail domains that lack comprehensive methods for automated language processing and knowledge extraction. We present ENTYFI, the first methodology for typing entities in fictional texts coming from books, fan communities or amateur writers. ENTYFI builds on 205 automatically induced high-quality type systems for popular fictional domains, and exploits the overlap and reuse of these fictional domains for fine-grained typing in previously unseen texts. ENTYFI comprises five steps: type system induction, domain relatedness ranking, mention detection, mention typing, and type consolidation. The recall-oriented typing module combines a supervised neural model, unsupervised Hearst-style and dependency patterns, and knowledge base lookups. The precision-oriented consolidation stage utilizes co-occurrence statistics in order to remove noise and to identify the most relevant types. Extensive experiments on newly seen fictional texts demonstrate the quality of ENTYFI.
Academic homepages are an important source for learning researchers' profiles. Recognising person names and publications in academic homepages are two fundamental tasks for understanding the identities of the homepages and collaboration networks of the researchers. Existing studies have tackled person name recognition and publication recognition separately. We observe that these two tasks are correlated since person names and publications often co-occur. Further, there are strong position patterns for the occurrence of person names and publications. With these observations, we propose a novel deep learning model consisting of two main modules, an alternatingly updated memory module which exploits the knowledge and correlation from both tasks, and a position-aware memory module which captures the patterns of where in a homepage names and publications appear. Empirical results show that our proposed model outperforms the state-of-the-art publication recognition model by 3.64% in F1 score and outperforms the state-of-the-art person name recognition model by 2.06% in F1 score. Ablation studies and visualisation confirm the effectiveness of the proposed modules.
Whenever a social media user decides to share a story, she is typically pleased to receive likes, comments, shares, or, more generally, feedback from her followers. As a result, she may feel compelled to use the feedback she receives to (re-)estimate her followers' preferences and decide which stories to share next to receive more (positive) feedback. Under which conditions can she succeed? In this work, we first investigate this problem from a theoretical perspective and then provide a set of practical algorithms to identify and characterize such behavior in social media.
More specifically, we address the above problem from the perspective of sequential decision making and utility maximization. For a wide family of utility functions, we first show that, to succeed,a user needs to actively trade off exploitation---sharing stories which lead to more (positive) feedback---and exploration---sharing stories to learn about her followers' preferences. However, exploration is not necessary if a user utilizes the feedback her followers provide to other users in addition to the feedback she receives. Then, we develop a utility estimation framework for observation data, which relies on statistical hypothesis testing to determine whether a user utilizes the feedback she receives from each of her followers to decide what to post next.Experiments on synthetic data illustrate our theoretical findings and show that our estimation framework is able to accurately recover users' underlying utility functions. Experiments on several real datasets gathered from Twitter and Reddit reveal that up to 82% (43%) of the Twitter (Reddit) users in our datasets do use the feedback they receive to decide what to post next.
Event detection (ED), a sub-task of event extraction, involves identifying triggers and categorizing event mentions. Existing methods primarily rely upon supervised learning and require large-scale labeled event datasets which are unfortunately not readily available in many real-life applications. In this paper, we consider and reformulate the ED task with limited labeled data as a Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn better prototypes for event types, but also produce more robust sentence encodings for event mentions. Differing from vanilla prototypical networks simply computing event prototypes by averaging, which only consume event mentions once, our model is more robust and is capable of distilling contextual information from event mentions for multiple times due to the multi-hop mechanism of DMNs. The experiments show that DMB-PN not only deals with sample scarcity better than a series of baseline models but also performs more robustly when the variety of event types is relatively large and the instance quantity is extremely small.
A growing trend recently is to harness the structure of today's big data, where much of the data can be represented as graphs. Simultaneously, graph convolutional networks (GCNs) have been proposed and since seen rapid development. More recently, due to the scalability issues that arise when attempting to utilize these powerful models on real-world data, methodologies have sought the use of sampling techniques. More specifically, minibatches of nodes are formed and then sets of nodes are sampled to aggregate from in one or more layers. Among these methods, the two prominent ways are based on sampling nodes from either a local or global perspective. In this work, we first observe the similarities in the two sampling strategies to that of epidemic and diffusion network models. Then we harness this understanding to fuse together the benefits of sampling from both a local and global perspective while alleviating some of the inherent issues found in both through the use of a low-dimensional approximation for the path-based Katz similarity measure. Our proposed framework, Epidemic Graph Convolutional Network (EGCN), is thus able to achieve improved performance over sampling from just one of the two perspectives alone. Empirical experiments are performed on several public benchmark datasets to verify the effectiveness over existing methodologies for the node classification task and we furthermore present some empirical parameter analysis of EGCN.
Recent studies have demonstrated that machine learning approaches like deep learning methods are easily fooled by adversarial attacks. Recently, a highly-influential study examined the impact of adversarial attacks on graph data and demonstrated that graph embedding techniques are also vulnerable to adversarial attacks. Fake users on social media and fake product reviews are examples of perturbations in graph data that are realistic counterparts of the adversarial models proposed. Graphs are widely used in a variety of domains and it is highly important to develop graph analysis techniques that are robust to adversarial attacks. One of the recent studies on generating adversarial attacks for graph data is Nettack. The Nettack model has shown to be very successful in deceiving the Graph Convolutional Network (GCN) model. Nettack is also transferable to other node classification approaches e.g. node embeddings. In this paper, we explore the properties of Nettack perturbations, in search for effective defenses against them. Our first finding is that Nettack demonstrates a very specific behavior in the spectrum of the graph: only high-rank (low-valued) singular components of the graph are affected. Following that insight, we show that a low-rank approximation of the graph, that uses only the top singular components for its reconstruction, can greatly reduce the effects of Nettack and boost the performance of GCN when facing adversarial attacks. Indicatively, on the CiteSeer dataset, our proposed defense mechanism is able to reduce the success rate of Nettack from 98% to 36%. Furthermore, we show that tensor-based node embeddings, which by default project the graph into a low-rank subspace, are robust against Nettack perturbations. Lastly, we propose LowBlow, a low-rank adversarial attack which is able to affect the classification performance of both GCN and tensor-based node embeddings and we show that the low-rank attack is noticeable and making it unnoticeable results in a high-rank attack.
Accurately learning from user data while providing quantifiable privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of d_χ-privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as defined by word embedding models. We present a privacy proof that satisfies d_χ-privacy where the privacy parameter $\varepsilon$ provides guarantees with respect to a distance metric defined by the word embedding space. We demonstrate how $\varepsilon$ can be selected by analyzing plausible deniability statistics backed up by large scale analysis on GloVe and fastText embeddings. We conduct privacy audit experiments against $2$ baseline models and utility experiments on 3 datasets to demonstrate the tradeoff between privacy and utility for varying values of varepsilon on different task types. Our results demonstrate practical utility (< 2% utility loss for training binary classifiers) while providing better privacy guarantees than baseline models.
Expert finding is a task designed to enable recommendation of the right person who can provide high-quality answers to a requester's question. Most previous works try to involve a content-based recommendation, which only superficially comprehends the relevance between a requester's question and the expertise of candidate experts by exploring the content or topic similarity between the requester's question and the candidate experts' historical answers. However, if a candidate expert has never answered a question similar to the requester's question, then existing methods have difficulty making a correct recommendation. Therefore, exploring the implicit relevance between a requester's question and a candidate expert's historical records by perception and reasoning should be taken into consideration. In this study, we propose a novel \textslrecurrent memory reasoning network (RMRN) to perform this task. This method focuses on different parts of a question, and accordingly retrieves information from the histories of the candidate expert.Since only a small percentage of historical records are relevant to any requester's question, we introduce a Gumbel-Softmax-based mechanism to select relevant historical records from candidate experts' answering histories. To evaluate the proposed method, we constructed two large-scale datasets drawn from Stack Overflow and Yahoo! Answer. Experimental results on the constructed datasets demonstrate that the proposed method could achieve better performance than existing state-of-the-art methods.
Interpretable explanations for recommender systems and other machine learning models are crucial to gain user trust. Prior works that have focused on paths connecting users and items in a heterogeneous network have several limitations, such as discovering relationships rather than true explanations, or disregarding other users' privacy. In this work, we take a fresh perspective, and present PRINCE: a provider-side mechanism to produce tangible explanations for end-users, where an explanation is defined to be a set of minimal actions performed by the user that, if removed, changes the recommendation to a different item. Given a recommendation, PRINCE uses a polynomial-time optimal algorithm for finding this minimal set of a user's actions from an exponential search space, based on random walks over dynamic graphs. Experiments on two real-world datasets show that PRINCE provides more compact explanations than intuitive baselines, and insights from a crowdsourced user-study demonstrate the viability of such action-based explanations. We thus posit that PRINCE produces scrutable, actionable, and concise explanations, owing to its use of counterfactual evidence, a user's own actions, and minimal sets, respectively.
User representation learning is vital to capture diverse user preferences, while it is also challenging as user intents are latent and scattered among complex and different modalities of user-generated data, thus, not directly measurable. Inspired by the concept of user schema in social psychology, we take a new perspective to perform user representation learning by constructing a shared latent space to capture the dependency among different modalities of user-generated data. Both users and topics are embedded to the same space to encode users' social connections and text content, to facilitate joint modeling of different modalities, via a probabilistic generative framework. We evaluated the proposed solution on large collections of Yelp reviews and StackOverflow discussion posts, with their associated network structures. The proposed model outperformed several state-of-the-art topic modeling based user models with better predictive power in unseen documents, and state-of-the-art network embedding based user models with improved link prediction quality in unseen nodes. The learnt user representations are also proved to be useful in content recommendation, e.g., expert finding in StackOverflow.
Social advertising exploits the interconnectivity of users in social networks to spread advertisement and generate user engagements. A lot of research has focused on how to select the best subset of users in a social network to maximize the number of engagements or the generated revenue of the advertisement. However, there is a lack of studies that consider the advertiser's value-per-engagement, i.e., how much an advertiser is maximally willing to pay for each engagement. Prior work on social advertising is based on the classical framework of influence maximization. In this paper, we propose a model where advertisers compete in an auction mechanism for the influential users within a social network. The auction mechanism can dynamically determine payments for advertisers based on their reported values. The main problem is to find auctions which incentivize advertisers to truthfully reveal their values, and also respect each advertiser's budget constraint. To tackle this problem, we propose a new truthful auction mechanism called TSA. Compared with existing approaches on real and synthetic datasets, TSA performs significantly better in terms of generated revenue.
Hierarchical user profiling that aims to model users' real-time interests in different granularity is an essential issue for personalized recommendations in E-commerce. On one hand, items (i.e. products) are usually organized hierarchically in categories, and correspondingly users' interests are naturally hierarchical on different granularity of items and categories. On the other hand, multiple granularity oriented recommendations become very popular in E-commerce sites, which require hierarchical user profiling in different granularity as well. In this paper, we propose HUP, a Hierarchical User Profiling framework to solve the hierarchical user profiling problem in E-commerce recommender systems. In HUP, we provide a Pyramid Recurrent Neural Networks, equipped with Behavior-LSTM to formulate users' hierarchical real-time interests at multiple scales. Furthermore, instead of simply utilizing users' item-level behaviors (e.g., ratings or clicks) in conventional methods, HUP harvests the sequential information of users' temporal finely-granular interactions (micro-behaviors, e.g., clicks on components of items like pictures or comments, browses with navigation of the search engines or recommendations) for modeling. Extensive experiments on two real-world E-commerce datasets demonstrate the significant performance gains of the HUP against state-of-the-art methods for the hierarchical user profiling and recommendation problems. We release the codes and datasets at https://github.com/guyulongcs/WSDM2020_HUP.
The convenient access to observational data enables us to learn causal effects without randomized experiments. This research direction draws increasing attention in research areas such as economics, healthcare, and education. For example, we can study how a medicine (the treatment) causally affects the health condition (the outcome) of a patient using existing electronic health records. To validate causal effects learned from observational data, we have to control confounding bias -- the influence of variables which causally influence both the treatment and the outcome. Existing work along this line overwhelmingly relies on the unconfoundedness assumption that there do not exist unobserved confounders. However, this assumption is untestable and can even be untenable. In fact, an important fact ignored by the majority of previous work is that observational data can come with network information that can be utilized to infer hidden confounders. For example, in an observational study of the individual-level treatment effect of a medicine, instead of randomized experiments, the medicine is often assigned to each individual based on a series of factors. Some of the factors (e.g., socioeconomic status) can be challenging to measure and therefore become hidden confounders. Fortunately, the socioeconomic status of an individual can be reflected by whom she is connected in social networks. With this fact in mind, we aim to exploit the network information to recognize patterns of hidden confounders which would further allow us to learn valid individual causal effects from observational data. In this work, we propose a novel causal inference framework, the network deconfounder, which learns representations to unravel patterns of hidden confounders from the network information. Empirically, we perform extensive experiments to validate the effectiveness of the network deconfounder on various datasets.
Crowdsourcing is a popular technique to collect large amounts of human-generated labels, such as relevance judgments used to create information retrieval (IR) evaluation collections. Previous research has shown how collecting high quality labels from a crowdsourcing platform can be challenging. Existing quality assurance techniques focus on answer aggregation or on the use of gold questions where ground-truth data allows to check for the quality of the responses.
In this paper, we present qualitative and quantitative results, revealing how different crowd workers adopt different work strategies to complete relevance judgment tasks efficiently and their consequent impact on quality. We delve into the techniques and tools that highly experienced crowd workers use to be more efficient in completing crowdsourcing micro-tasks. To this end, we use both qualitative results from worker interviews and surveys, as well as the results of a data-driven study of behavioral log data (i.e., clicks, keystrokes and keyboard shortcuts) collected from crowd workers performing relevance judgment tasks. Our results highlight the presence of frequently used shortcut patterns that can speed-up task completion, thus increasing the hourly wage of efficient workers. We observe how crowd work experiences result in different types of working strategies, productivity levels, quality and diversity of the crowdsourced judgments.
User-generated item lists are popular on many platforms. Examples include video-based playlists on YouTube, image-based lists (or "boards") on Pinterest, book-based lists on Goodreads, and answer-based lists on question-answer forums like Zhihu. As users create these lists, a common challenge is in identifying what items to curate next. Some lists are organized around particular genres or topics, while others are seemingly incoherent, reflecting individual preferences for what items belong together. Furthermore, this heterogeneity in item consistency may vary from platform to platform, and from sub-community to sub-community. Hence, this paper proposes a generalizable approach for user-generated item list continuation. Complementary to methods that exploit specific content patterns (e.g., as in song-based playlists that rely on audio features), the proposed approach models the consistency of item lists based on human curation patterns, and so can be deployed across a wide range of varying item types (e.g., videos, images, books). A key contribution is in intelligently combining two preference models via a novel consistency-aware gating network -- a general user preference model that captures a user's overall interests, and a current preference priority model that captures a user's current (as of the most recent item) interests. In this way, the proposed consistency-aware recommender can dynamically adapt as user preferences evolve. Evaluation over four datasets (of songs, books, and answers) confirms these observations and demonstrates the effectiveness of the proposed model versus state-of-the-art alternatives. Further, all code and data are available at https://github.com/heyunh2015/ListContinuation_WSDM2020.
Word embeddings, trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior.
In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. "Smith"), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.
Clique counting is a fundamental task in network analysis, and even the simplest setting of $3$-cliques (triangles) has been the center of much recent research. Getting the count of k-cliques for larger k is algorithmically challenging, due to the exponential blowup in the search space of large cliques. But a number of recent applications (especially for community detection or clustering) use larger clique counts. Moreover, one often desires local counts, the number of k-cliques per vertex/edge. Our main contribution is Pivoter, an algorithm that exactly counts the number of k-cliques, for all values of k. It is surprisingly effective in practice, and is able to get clique counts of graphs that were beyond the reach of previous work. For example, Pivoter gets all clique counts in a social network with a 100M edges within two hours on a commodity machine. Previous parallel algorithms do not terminate in days. Pivoter can also feasibly get local per-vertex and per-edge k-clique counts (for all k) for many public data sets with tens of millions of edges. To the best of our knowledge, this is the first algorithm that achieves such results. The main insight is the construction of a Succinct Clique Tree (SCT) that stores a compressed unique representation of all cliques in an input graph. It is built using a technique called pivoting, a classic approach by Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for maximal cliques. Remarkably, the SCT can be built without actually enumerating all cliques, and provides a succinct data structure from which exact clique statistics (k-clique counts, local counts) can be read off efficiently.
"How to learn image embeddings that capture fine-grained semantics based on the instance of an image?" "Is it possible for such embeddings to further understand image semantics closer to humans' perception?" In this paper, we present, Graph-Regularized Image Semantic Embedding (Graph-RISE), a web-scale neural graph learning framework deployed at Google, which allows us to train image embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. The proposed Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including kNN search and triplet ranking: the accuracy is improved by approximately 2X on the ImageNet dataset and by more than 5X on the iNaturalist dataset. Qualitatively, image retrieval from one billion images based on the proposed Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.
Epidemic models and self-exciting processes are two types of models used to describe diffusion phenomena online and offline. These models were originally developed in different scientific communities, and their commonalities are under-explored. This work establishes, for the first time, a general connection between the two model classes via three new mathematical components. The first is a generalized version of stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions; the second is the relationship between the (latent and arbitrary) recovery time distribution, recovery hazard function, and the infection kernel of self-exciting processes; the third includes methods for simulating, fitting, evaluating and predicting the generalized process. On three large Twitter diffusion datasets, we conduct goodness-of-fit tests and holdout log-likelihood evaluation of self-exciting processes with three infection kernels --- exponential, power-law and Tsallis Q-exponential. We show that the modeling performance of the infection kernels varies with respect to the temporal structures of diffusions, and also with respect to user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models that are identified as complementary by the goodness-of-fit tests.
Pattern counting in graphs is a fundamental primitive for many network analysis tasks, and there are several methods for scaling subgraph counting to large graphs. Many real-world networks have a notion of strength of connection between nodes, which is often modeled by a weighted graph, but existing scalable algorithms for pattern mining are designed for unweighted graphs. Here, we develop deterministic and random sampling algorithms that enable the fast discovery of the 3-cliques (triangles) of largest weight, as measured by the generalized mean of the triangle's edge weights. For example, one of our proposed algorithms can find the top-1000 weighted triangles of a weighted graph with billions of edges in thirty seconds on a commodity server, which is orders of magnitude faster than existing "fast" enumeration schemes. Our methods open the door towards scalable pattern mining in weighted graphs.
Recommender systems are embracing conversational technologies to obtain user preferences dynamically, and to overcome inherent limitations of their static models. A successful Conversational Recommender System (CRS) requires proper handling of interactions between conversation and recommendation. We argue that three fundamental problems need to be solved: 1) what questions to ask regarding item attributes, 2) when to recommend items, and 3) how to adapt to the users' online feedback. To the best of our knowledge, there lacks a unified framework that addresses these problems. In this work, we fill this missing interaction framework gap by proposing a new CRS framework named Estimation"Action" Reflection, or EAR, which consists of three stages to better converse with users. (1) Estimation, which builds predictive models to estimate user preference on both items and item attributes; (2) Action, which learns a dialogue policy to determine whether to ask attributes or recommend items, based on Estimation stage and conversation history; and (3) Reflection, which updates the recommender model when a user rejects the recommendations made by the Action stage. We present two conversation scenarios on binary and enumerated questions, and conduct extensive experiments on two datasets from Yelp and LastFM, for each scenario, respectively. Our experiments demonstrate significant improvements over the state-of-the-art method CRM , corresponding to fewer conversation turns and a higher level of recommendation hits.
Click-through rate (CTR) prediction is a critical task in online advertising and marketing. For this problem, existing approaches, with shallow or deep architectures, have three major drawbacks. First, they typically lack persuasive rationales to explain the outcomes of the models. Unexplainable predictions and recommendations may be difficult to validate and thus unreliable and untrustworthy. In many applications, inappropriate suggestions may even bring severe consequences. Second, existing approaches have poor efficiency in analyzing high-order feature interactions. Third, the polysemy of feature interactions in different semantic subspaces is largely ignored. In this paper, we propose InterHAt that employs a Transformer with multi-head self-attention for feature learning. On top of that, hierarchical attention layers are utilized for predicting CTR while simultaneously providing interpretable insights of the prediction results. InterHAt captures high-order feature interactions by an efficient attentional aggregation strategy with low computational complexity. Extensive experiments on four public real datasets and one synthetic dataset demonstrate the effectiveness and efficiency of InterHAt.
Sequential recommender systems seek to exploit the order of users' interactions, in order to predict their next action based on the context of what they have done recently. Traditionally, Markov Chains(MCs), and more recently Recurrent Neural Networks (RNNs) and Self Attention (SA) have proliferated due to their ability to capture the dynamics of sequential patterns. However a simplifying assumption made by most of these models is to regard interaction histories as ordered sequences, without regard for the time intervals between each interaction (i.e., they model the time-order but not the actual timestamp). In this paper, we seek to explicitly model the timestamps of interactions within a sequential modeling framework to explore the influence of different time intervals on next item prediction. We propose TiSASRec (Time Interval aware Self-attention based sequential recommendation), which models both the absolute positions of items as well as the time intervals between them in a sequence. Extensive empirical studies show the features of TiSASRec under different settings and compare the performance of self-attention with different positional encodings. Furthermore, experimental results show that our method outperforms various state-of-the-art sequential models on both sparse and dense datasets and different evaluation metrics.
Cross domain recommender systems have been increasingly valuable for helping consumers identify the most satisfying items from different categories. However, previously proposed cross-domain models did not take into account bidirectional latent relations between users and items. In addition, they do not explicitly model information of user and item features, while utilizing only user ratings information for recommendations. To address these concerns, in this paper we propose a novel approach to cross-domain recommendations based on the mechanism of dual learning that transfers information between two related domains in an iterative manner until the learning process stabilizes. We develop a novel latent orthogonal mapping to extract user preferences over multiple domains while preserving relations between users across different latent spaces. Combining with autoencoder approach to extract the latent essence of feature information, we propose Deep Dual Transfer Cross Domain Recommendation (DDTCDR) model to provide recommendations in respective domains. We test the proposed method on a large dataset containing three domains of movies, book and music items and demonstrate that it consistently and significantly outperforms several state-of-the-art baselines and also classical transfer learning approaches.
Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience.
In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that \model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.
Recommendation systems tend to suffer severely from the sparse training data. A large portion of users and items usually have a very limited number of training instances. The data sparsity issue prevents us from accurately understanding users' preferences and items' characteristics and jeopardize the recommendation performance eventually. In addition, models, trained with sparse data, lack abundant training supports and tend to be vulnerable to adversarial perturbations, which implies possibly large errors in generalization.
In this work, we investigate the recommendation task in the context of prospective customer recommendation in location based social networks. To comprehensively utilize the training data, we explicitly learn to compare users' historical check-in businesses utilizing self-attention mechanisms. To enhance the robustness of a recommender system and improve its generalization performance, we perform adversarial training. Adversarial perturbations are dynamically constructed during training and models are trained to be tolerant of such nuisance perturbations. In a nutshell, we introduce a Self-Attentive prospective Customer RecommendAtion framework, SACRA, which learns to recommend by making comparisons among users' historical check-ins with adversarial training. To evaluate the proposed model, we conduct a series of experiments to extensively compare with 12 existing methods using two real-world datasets. The results demonstrate that SACRA significantly outperforms all baselines.
Accelerating Deep Convolutional Neural Networks (CNNs) has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning has been regarded as a promising solution, which is due to its advantage in significant speedup and memory reduction of both network model and intermediate feature maps. Previous works utilized "smaller-norm-less-important" criterion to prune filters with smaller ࡁp-norm values by pruning and retraining alternately. This trends to narrow the model capacity for the following reasons: (1) Violent pruning. Previous works adopt a violent strategy in which all filters are simultaneously pruned, which leaving the room to retain model accuracy limited. (2) Filter degradation. Previous works simply set the pruned filter to 0 and retrained it alterately, which easily led to the loss of learning ability of filters. To solve this problem, we propose a novel filter pruning method, namely Incremental Filter Pruning via Random Walk (IFPRW). IFPRW solves the problem of violent pruning by incremental method and Filter degradation by means of random walk. When applied to two image classification benchmarks, the usefulness and strength of IFPRW is validated. Notably, on CIFAR-10, IFPRW reduces more than 46% FLOPs on ResNet-110 with even 0.28% relative accuracy improvement. Moreover, on ILSVRC-2012, IFPRW reduces more than 54% FLOPs on ResNet-101 with only 0.7% top-5 accurcacy drop. which proving that IFPRW outperforms the state-of-the-art filter pruning methods.
Network embedding has been intensively studied in the literature and widely used in various applications, such as link prediction and node classification. While previous work focus on the design of new algorithms or are tailored for various problem settings, the discussion of initialization strategies in the learning process is often missed. In this work, we address this important issue of initialization for network embedding that could dramatically improve the performance of the algorithms on both effectiveness and efficiency. Specifically, we first exploit the graph partition technique that divides the graph into several disjoint subsets, and then construct an abstract graph based on the partitions. We obtain the initialization of the embedding for each node in the graph by computing the network embedding on the abstract graph, which is much smaller than the input graph, and then propagating the embedding among the nodes in the input graph. With extensive experiments on various datasets, we demonstrate that our initialization technique significantly improves the performance of the state-of-the-art algorithms on the evaluations of link prediction and node classification by up to 7.76% and 8.74% respectively. Besides, we show that the technique of initialization reduces the running time of the state-of-the-arts by at least 20%.
Influence maximization in social networks is the problem of finding a set of seed nodes in the network that maximizes the spread of influence under certain information prorogation model, which has become an important topic in social network analysis. In this paper, we show that conventional influence maximization algorithms cause uneven spread of influence among different attribute groups in social networks, which could lead to severer bias in public opinion dissemination and viral marketing. We formulate the balanced influence maximization problem to address the trade-off between influence maximization and attribute balance, and propose a sampling based solution to solve the problem efficiently. To avoid full network exploration, we first propose an attribute-based (AB) sampling method to sample attributed social networks with respect to preserving network structural properties and attribute proportion among user groups. Then we propose an attributed-based reverse influence sampling (AB-RIS) algorithm to select seed nodes from the sampled graph. The proposed AB-RIS algorithm runs efficiently with guaranteed accuracy, and achieves the trade-off between influence maximization and attribute balance. Extensive experiments based on four real-world social network datasets show that AB-RIS significantly outperforms the state-of-the-art approaches in balanced influence maximization.
The research of reinforcement learning (RL) based recommendation method has become a hot topic in recommendation community, due to the recent advance in interactive recommender systems. The existing RL recommendation approaches can be summarized into a unified framework with three components, namely embedding component (EC), state representation component (SRC) and policy component (PC). We find that EC cannot be nicely trained with the other two components simultaneously. Previous studies bypass the obstacle through a pre-training and fixing strategy, which makes their approaches unlike a real end-to-end fashion. More importantly, such pre-trained and fixed EC suffers from two inherent drawbacks: (1) Pre-trained and fixed embeddings are unable to model evolving preference of users and item correlations in the dynamic environment; (2) Pre-training is inconvenient in the industrial applications. To address the problem, in this paper, we propose an End-to-end Deep Reinforcement learning based Recommendation framework (EDRR). In this framework, a supervised learning signal is carefully designed for smoothing the update gradients to EC, and three incorporating ways are introduced and compared. To the best of our knowledge, we are the first to address the training compatibility between the three components in RL based recommendations. Extensive experiments are conducted on three real-world datasets, and the results demonstrate the proposed EDRR effectively achieves the end-to-end training purpose for both policy-based and value-based RL models, and delivers better performance than state-of-the-art methods.
Multi-graph clustering aims to improve clustering accuracy by leveraging information from different domains, which has been shown to be extremely effective for achieving better clustering results than single graph based clustering algorithms. Despite the previous success, existing multi-graph clustering methods mostly use shallow models, which are incapable to capture the highly non-linear structures and the complex cluster associations in multi-graph, thus result in sub-optimal results. Inspired by the powerful representation learning capability of neural networks, in this paper, we propose an end-to-end deep learning model to simultaneously infer cluster assignments and cluster associations in multi-graph. Specifically, we use autoencoding networks to learn node embeddings. Meanwhile, we propose a minimum-entropy based clustering strategy to cluster nodes in the embedding space for each graph. We introduce two regularizers to leverage both within-graph and cross-graph dependencies. An attentive mechanism is further developed to learn cross-graph cluster associations. Through extensive experiments on a variety of datasets, we observe that our method outperforms state-of-the-art baselines by a large margin.
Coreference resolution aims at recognizing different forms in a document which refer to the same entity in the real world. Although many models have been proposed and achieved success, there still exist some challenges. Recent models that use recurrent neural networks to obtain mention representations ignore dependencies between spans and their proceeding distant spans, which will lead to predicted clusters that are locally consistent but globally inconsistent. In addition, these models are trained only by maximizing the marginal likelihood of gold antecedent spans from coreference clusters, which will make some gold mentions undetectable and cause unsatisfactory coreference results. To address these challenges, we propose a neural coreference resolution model. It employs mutual attention to take into account the dependencies between spans and their proceeding spans directly (use attention mechanism to capture global information between spans and their proceeding spans). And our model is trained by jointly optimizing mention clustering and imbalanced mention detection, which enables it to detect more gold mentions in a document to make more accurate coreference decisions. Experimental results on the CoNLL-2012 English dataset show that our model can detect the most gold mentions and achieve the state-of-the-art coreference performance compared with baselines.
Automatically labeling multiple styles for every song is a comprehensive application in all kinds of music websites. Recently, some researches explore review-driven multi-label music style classification and exploit style correlations for this task. However, their methods focus on mining the statistical relations between different music styles and only consider shallow style relations. Moreover, these statistical relations suffer from the underfitting problem because some music styles have little training data. To tackle these problems, we propose a novel knowledge relations integrated framework (KRF) to capture the complete style correlations, which jointly exploits the inherent relations between music styles according to external knowledge and their statistical relations. Based on the two types of relations, we use graph convolutional network to learn the deep correlations between styles automatically. Experimental results show that our framework significantly outperforms the state-of-the-art methods. Further studies demonstrate that our framework can effectively alleviate the underfitting problem and learn meaningful style correlations.
Entity alignment to find equivalent entities in cross-lingual Knowledge Graphs (KGs) plays a vital role in automatically integrating multiple KGs. Existing translation-based entity alignment methods jointly model the cross-lingual knowledge and monolingual knowledge into one unified optimization problem. On the other hand, the Graph Neural Network (GNN) based methods either ignore the node differentiations, or represent relation through entity or triple instances. They all fail to model the meta semantics embedded in relation nor complex relations such as n-to-n and multi-graphs. To tackle these challenges, we propose a novel Meta Relation Aware Entity Alignment (MRAEA) to directly model cross-lingual entity embeddings by attending over the node's incoming and outgoing neighbors and its connected relations' meta semantics. In addition, we also propose a simple and effective bi-directional iterative strategy to add new aligned seeds during training. Our experiments on all three benchmark entity alignment datasets show that our approach consistently outperforms the state-of-the-art methods, exceeding by 15%-58% on Hit@1. Through an extensive ablation study, we validate that the proposed meta relation aware representations, relation aware self-attention and bi-directional iterative strategy of new seed selection all make contributions to significant performance improvement. The code is available at https://github.com/MaoXinn/MRAEA.
In personal email search, user queries often impose different requirements on different aspects of the retrieved emails. For example, the query "my recent flight to the US" requires emails to be ranked based on both textual contents and recency of the email documents, while other queries such as "medical history" do not impose any constraints on the recency of the email. Recent deep learning-to-rank models for personal email search often directly concatenate dense numerical features (e.g., document age) with embedded sparse features (e.g., n-gram embeddings). In this paper, we first show with a set of experiments on synthetic datasets that direct concatenation of dense and sparse features does not lead to the optimal search performance of deep neural ranking models. To effectively incorporate both sparse and dense email features into personal email search ranking, we propose a novel neural model, SepAttn. SepAttn first builds two separate neural models to learn from sparse and dense features respectively, and then applies an attention mechanism at the prediction level to derive the final prediction from these two models. We conduct a comprehensive set of experiments on a large-scale email search dataset, and demonstrate that our SepAttn model consistently improves the search quality over the baseline models.
Predicting human mobility is an important trajectory mining task for various applications, ranging from smart city planning to personalized recommendation system. While most of previous works adopt GPS tracking data to model human mobility, the recent fast-growing geo-tagged social media (GTSM) data brings new opportunities to this task. However, predicting human mobility on GTSM data is not trivial because of three challenges: 1) extreme data sparsity; 2) high order sequential patterns of human mobility and 3) evolving preference of users for tagging.
In this paper, we propose ACN, an attentive convolutional network model for predicting human mobility from sparse and complex GTSM data. In ACN, we firstly design a multi-dimension embedding layer which jointly embeds key features (i.e., spatial, temporal and user features) that govern human mobility. Then, we regard the embedded trajectory as an "image" and learn short-term sequential patterns as local features of the image using convolution filters. Instead of directly using convention filters, we design hybrid dilated and separable convolution filters to effectively capture high order sequential patterns from lengthy trajectory. In addition, we propose an attention mechanism which learns the user long-term preference to augment convolutional network for mobility prediction. We conduct extensive experiments on three publicly available GTSM datasets to evaluate the effectiveness of our model. The results demonstrate that ACN consistently outperforms existing state-of-art mobility prediction approaches on a variety of common evaluation metrics.
Subgraph counting is a fundamental task in network analysis. Typically, algorithmic work is on total counting, where we wish to count the total frequency of a (small) pattern subgraph in a large input data set. But many applications require local counts (also called vertex orbit counts) wherein, for every vertex v of the input graph, one needs the count of the pattern subgraph involving v. This provides a rich set of vertex features that can be used in machine learning tasks, especially classification and clustering. But getting local counts is extremely challenging. Even the easier problem of getting total counts has received much research attention. Local counts require algorithms that get much finer grained information, and the sheer output size makes it difficult to design scalable algorithms.
We present EVOKE, a scalable algorithm that can determine vertex orbits counts for all 5-vertex pattern subgraphs. In other words, EVOKE exactly determines, for every vertex v of the input graph and every 5-vertex subgraph H, the number of copies of H that v participates in. EVOKE can process graphs with tens of millions of edges, within an hour on a commodity machine. EVOKE is typically hundreds of times faster than previous state of the art algorithms, and gets results on datasets beyond the reach of previous methods.
Theoretically, we generalize a recent "graph cutting" framework to get vertex orbit counts. This framework generate a collection of polynomial equations relating vertex orbit counts of larger subgraphs to those of smaller subgraphs. EVOKE carefully exploits the structure among these equations to rapidly count. We prove and empirically validate that EVOKE only has a small constant factor overhead over the best (total) 5-vertex subgraph counter.
This paper introduces a new learning paradigm called eXtreme Regression (XR) whose objective is to accurately predict the numerical degrees of relevance of an extremely large number of labels to a data point. XR can provide elegant solutions to many large-scale ranking and recommendation applications including Dynamic Search Advertising (DSA). XR can learn more accurate models than the recently popular extreme classifiers which incorrectly assume strictly binary-valued label relevances. Traditional regression metrics which sum the errors over all the labels are unsuitable for XR problems since they could give extremely loose bounds for the label ranking quality. Also, the existing regression algorithms won't efficiently scale to millions of labels. This paper addresses these limitations through: (1) new evaluation metrics for XR which sum only the k largest regression errors; (2) a new algorithm called XReg which decomposes XR task into a hierarchy of much smaller regression problems thus leading to highly efficient training and prediction. This paper also introduces a (3) new labelwise prediction algorithm in XReg useful for DSA and other recommendation tasks.
Experiments on benchmark datasets demonstrated that XReg can outperform the state-of-the-art extreme classifiers as well as large-scale regressors and rankers by up to 50% reduction in the new XR error metric, and up to 2% and 2.4% improvements in terms of the propensity-scored precision metric used in extreme classification and the click-through rate metric used in DSA respectively. Deployment of XReg on DSA in Bing resulted in a relative gain of 58% in revenue and 27% in query coverage. XReg's source code can be downloaded from http://manikvarma.org/code/Xreg/download.html.
Sequential recommendation task aims to predict user preference over items in the future given user historical behaviors. The order of user behaviors implies that there are resourceful sequential patterns embedded in the behavior history which reveal the underlying dynamics of user interests. Various sequential recommendation methods are proposed to model the dynamic user behaviors. However, most of the models only consider the user's own behaviors and dynamics, while ignoring the collaborative relations among users and items, i.e., similar tastes of users or analogous properties of items. Without modeling collaborative relations, those methods suffer from the lack of recommendation diversity and thus may have worse performance. Worse still, most existing methods only consider the user-side sequence and ignore the temporal dynamics on the item side. To tackle the problems of the current sequential recommendation models, we propose Sequential Collaborative Recommender (SCoRe) which effectively mines high-order collaborative information using cross-neighbor relation modeling and, additionally utilizes both user-side and item-side historical sequences to better capture user and item dynamics. Experiments on three real-world yet large-scale datasets demonstrate the superiority of the proposed model over strong baselines.
Knowledge Graph Question Answering aims to automatically answer natural language questions via well-structured relation information between entities stored in knowledge graphs. When faced with a multi-relation question, existing embedding-based approaches take the whole topic-entity-centric subgraph into account, resulting in high time complexity. Meanwhile, due to the high cost for data annotations, it is impractical to exactly show how to answer a complex question step by step, and only the final answer is labeled, as weak supervision. To address these challenges, this paper proposes a neural method based on reinforcement learning, namely Stepwise Reasoning Network, which formulates multi-relation question answering as a sequential decision problem. The proposed model performs effective path search over the knowledge graph to obtain the answer, and leverages beam search to reduce the number of candidates significantly. Meanwhile, based on the attention mechanism and neural networks, the policy network can enhance the unique impact of different parts of a given question over triple selection. Moreover, to alleviate the delayed and sparse reward problem caused by weak supervision, we propose a potential-based reward shaping strategy, which can accelerate the convergence of the training algorithm and help the model perform better. Extensive experiments conducted over three benchmark datasets well demonstrate the effectiveness of the proposed model, which outperforms the state-of-the-art approaches.
The success of many graph-based machine learning tasks highly depends on an appropriate representation learned from the graph data. Most work has focused on learning node embeddings that preserve proximity as opposed to structural role-based embeddings that preserve the structural similarity among nodes. These methods fail to capture higher-order structural dependencies and connectivity patterns that are crucial for structural role-based applications such as visitor stitching from web logs. In this work, we formulate higher-order network representation learning and describe a general framework called HONE for learning such structural node embeddings from networks via the subgraph patterns (network motifs, graphlet orbits/positions) in a nodes neighborhood. A general diffusion mechanism is introduced in HONE along with a space-efficient approach that avoids explicit construction of the k-step motif-based matrices using a k-step linear operator. Furthermore, HONE is shown to be fast and efficient with a worst-case time complexity that is nearly-linear in the number of edges. The experiments demonstrate the effectiveness of HONE for a number of important tasks including link prediction and visitor stitching from large web log data.
Individuals' personal information collections (their emails, files, appointments, web searches, contacts, etc) offer a wealth of insights into the organization and structure of their everyday lives. In this paper we address the task of learning representations of personal information items to capture individuals' ongoing activities, such as projects and tasks: Such representations can be used in activity-centric applications like personal assistants, email clients, and productivity tools to help people better manage their data and time. We propose a graph-based approach that leverages the inherent interconnected structure of personal information collections, and derive efficient, exact techniques to incrementally update representations as new data arrive. We demonstrate the strengths of our graph-based representations against competitive baselines in a novel intrinsic rating task and an extrinsic recommendation task.
Recommender systems widely use implicit feedback such as click data because of its general availability. Although the presence of clicks signals the users' preference to some extent, the lack of such clicks does not necessarily indicate a negative response from the users, as it is possible that the users were not exposed to the items (positive-unlabeled problem). This leads to a difficulty in predicting the users' preferences from implicit feedback. Previous studies addressed the positive-unlabeled problem by uniformly upweighting the loss for the positive feedback data or estimating the confidence of each data having relevance information via the EM-algorithm. However, these methods failed to address the missing-not-at-random problem in which popular or frequently recommended items are more likely to be clicked than other items even if a user does not have a considerable interest in them. To overcome these limitations, we first define an ideal loss function to be optimized to realize recommendations that maximize the relevance and propose an unbiased estimator for the ideal loss. Subsequently, we analyze the variance of the proposed unbiased estimator and further propose a clipped estimator that includes the unbiased estimator as a special case. We demonstrate that the clipped estimator is expected to improve the performance of the recommender system, by considering the bias-variance trade-off. We conduct semi-synthetic and real-world experiments and demonstrate that the proposed method largely outperforms the baselines. In particular, the proposed method works better for less popular items that are less frequently observed in the training data. The findings indicate that the proposed method can better achieve the objective of recommending items with the highest relevance.
Recent years have witnessed tremendous interest in understanding and predicting information spread on social media platforms such as Twitter, Facebook, etc. Existing diffusion prediction methods primarily exploit the sequential order of influenced users by projecting diffusion cascades onto their local social neighborhoods. However, this fails to capture global social structures that do not explicitly manifest in any of the cascades, resulting in poor performance for inactive users with limited historical activities.
In this paper, we present a novel variational autoencoder framework (Inf-VAE) to jointly embed homophily and influence through proximity-preserving social and position-encoded temporal latent variables. To model social homophily, Inf-VAE utilizes powerful graph neural network architectures to learn social variables that selectively exploit the social connections of users. Given a sequence of seed user activations, Inf-VAE uses a novel expressive co-attentive fusion network that jointly attends over their social and temporal variables to predict the set of all influenced users. Our experimental results on multiple real-world social network datasets, including Digg, Weibo, and Stack-Exchanges demonstrate significant gains (22% MAP@10) for Inf-VAE over state-of-the-art diffusion prediction models; we achieve massive gains for users with sparse activities, and users who lack direct social neighbors in seed sets.
Learning node representations in graphs is important for many applications such as link prediction, node classification, and community detection. Existing graph representation learning methods primarily target static graphs while many real-world graphs evolve over time. Complex time-varying graph structures make it challenging to learn informative node representations over time.
We present Dynamic Self-Attention Network (DySAT), a novel neural architecture that learns node representations to capture dynamic graph structural evolution. Specifically, DySAT computes node representations through joint self-attention along the two dimensions of structural neighborhood and temporal dynamics. Compared with state-of-the-art recurrent methods modeling graph evolution, dynamic self-attention is efficient, while achieving consistently superior performance. We conduct link prediction experiments on two graph types: communication networks and bipartite rating networks. Experimental results demonstrate significant performance gains for DySAT over several state-of-the-art graph embedding baselines, in both single and multi-step link prediction tasks. Furthermore, our ablation study validates the effectiveness of jointly modeling structural and temporal self-attention.
Recent research has shown the advantages of using autoencoders based on deep neural networks for collaborative filtering. In particular, the recently proposed Mult-VAE model, which used the multinomial likelihood variational autoencoders, has shown excellent results for top-N recommendations. In this work, we propose the Recommender VAE (RecVAE) model that originates from our research on regularization techniques for variational autoencoders. RecVAE introduces several novel ideas to improve Mult-VAE, including a novel composite prior distribution for the latent codes, a new approach to setting the beta hyperparameter for the beta-VAE framework, and a new approach to training based on alternating updates. In experimental evaluation, we show that RecVAE significantly outperforms previously proposed autoencoder-based models, including Mult-VAE and RaCT, across classical collaborative filtering datasets, and present a detailed ablation study to assess our new developments. Code and models are available at https://github.com/ilya-shenbin/RecVAE.
Internet companies use crowdsourcing to collect large amounts of data needed for creating products based on machine learning techniques. A significant source of such labels for OCR data sets is (re)CAPTCHA, which distinguishes humans from automated bots by asking them to recognize text and, at the same time, receives new labeled data in this way. An important component of such approach to data collection is the reduction of noisy labels produced by bots and non-qualified users.
In this paper, we address the problem of labeling text images via CAPTCHA, where user identification is generally impossible. We propose a new algorithm to aggregate multiple guesses collected through CAPTCHA. We employ incremental relabeling to minimize the number of guesses needed for obtaining the recognized text of a good accuracy. The aggregation model and the stopping rule for our incremental relabeling are based on novel machine learning techniques and use meta features of CAPTCHA tasks and accumulated guesses. Our experiments show that our approach can provide a large amount of accurately recognized texts using a minimal number of user guesses. Finally, we report the great improvements of an optical character recognition model after implementing our approach in Yandex.
Verizon Media native advertising (also known as Yahoo Gemini native) serves billions of ad impressions daily, reaching several hundreds of millions USD in revenue yearly. Although we strive to provide the best experience for our users, there will always be some users that dislike our ads in certain cases. To address these situations Gemini native platform provides an ad close mechanism that enables users to close ads that they dislike and also to provide a reasoning for their action. Surprisingly, users do care about their ad experience and their engagement with the ad close mechanism is quite significant. While the ad close rate (ACR) is lower than the click through rate (CTR), they are of the same order of magnitude, especially on Yahoo mail properties. Since ad close events indicate bad user experience caused mostly by poor ad quality, we would like to exploit the ad close signals to improve user experience and reduce the number of ad close events while maintaining a predefined total revenue loss.
In this work we present our ad close mitigation (ACM) solution that penalizes ads with high closing likelihood, in our auctions. In particular, we use the ad close signal and other available features to predict the probability of an ad close event, and calculate the expected loss due to such event for using the true expected revenue in the auction. We show that this approach fundamentally changes the generalized second price (GSP) auction and provides incentive for advertisers to improve their ads' quality. Our solution was tested in both offline and large scale online settings, serving real Gemini native traffic. Results of the online experiment show that we are able to reduce the number of ad close events by more than 20%, while decreasing the revenue in less than 0.4%. In addition, we present a large scale analysis of the ad close signal that supports various design decisions and sheds light on ways the ad close mechanism affects different crowds.
The Sparse Linear Method (SLIM) is a well-established approach for top-N recommendations. This article proposes several improvements that are enabled by the Alternating Directions Method of Multipliers (ADMM), a well-known optimization method with many application areas. First, we show that optimizing the original SLIM-objective by ADMM results in an approach where the training time is independent of the number of users in the training data, and hence trivially scales to large numbers of users. Second, the flexibility of ADMM allows us to switch on and off the various constraints and regularization terms in the original SLIM-objective, in order to empirically assess their contributions to ranking accuracy on given data. Third, we also propose two extensions to the original SLIM training-objective in order to improve recommendation accuracy further without increasing the computational cost. In our experiments on three well-known data-sets, we first compare to the original SLIM-implementation and find that not only ADMM reduces training time considerably, but also achieves an improvement in recommendation accuracy due to better optimization. We then compare to various state-of-the-art approaches and observe up to 25% improvement in recommendation accuracy in our experiments. Finally, we evaluate the importance of sparsity and the non-negativity constraint in the original SLIM-objective with sub-sampling experiments that simulate scenarios of cold-starting and large catalog sizes compared to relatively small user base, which often occur in practice.
Reading comprehension (RC) aims to locate a text span from a context passage to answer the given question. Despite the effectiveness of modern neural RC models, most existing work relies on maximum likelihood estimation (MLE) and ignores the structure of the output space. That is during training, one treats all the text spans do not match the ground truth as equally poor, leading to overconfident predictions on ground truth labels and reduced generalization ability in test. One way to bridge the gap between training and test is to take into account the task reward of alternative outputs using the reinforcement learning (RL) algorithms, which is often deficient in optimization as compared with MLE. In this paper, we propose a new learning criterion for the RC task which combines the merits of both MLE and RL-based methods. Specifically, we show that we are able to derive the distribution of the outputs, i.e., label distribution, using their corresponding task rewards based on the decomposition property of the RC problem. We then optimize the RC model by directly learning towards the auxiliary label distribution, instead of the ground truth label, using the MLE framework. In this way, we can make use of the structure of the output space for better generalization (as RL) via efficient optimization (as MLE). We name our approach as Label Distribution augmented MLE (LD-MLE), which is a general learning criterion that could be adopted by almost all the existing RC models. Experiments on three representative benchmark datasets demonstrate that RC models learned with the LD-MLE criterion can achieve consistently improved results over those based on the traditional MLE and RL-based criteria.
We incorporate the realistic scenario of spontaneous user adoption into influence propagation (also refer to as self-activation) and propose the self-activation independent cascade (SAIC) model: nodes may be self activated besides being selected as seeds, and influence propagates from both selected seeds and self activated nodes. Self activation occurs in many real world situations; for example, people naturally share product recommendations with their friends, even without marketing intervention. Under the SAIC model, we study three influence maximization problems: (a) boosted influence maximization (BIM) aims to maximize the total influence spread from both self-activated nodes and k selected seeds; (b) preemptive influence maximization (PIM) aims to find k nodes that, if self-activated, can reach the most number of nodes before other self-activated nodes; and (c) boosted preemptive influence maximization (BPIM) aims to select k seed that are guaranteed to be activated and can reach the most number of nodes before other self-activated nodes. We propose scalable algorithms for all three problems and prove that they achieve $1-1/e-\varepsilon$ approximation for BIM and BPIM and $1-\varepsilon$ for PIM, for any $\varepsilon > 0$. Through extensive tests on real-world graphs, we demonstrate that our algorithms outperform the baseline algorithms significantly for the PIM problem in solution quality, and also outperform the baselines for BIM and BPIM when self-activation behaviors are nonuniform across nodes.
Recommending new items in real-world e-commerce portals is a challenging problem as the cold start phenomenon, i.e., lacks of user-item interactions. To address this problem, we propose a novel recommendation model, i.e., adversarial neural network with multiple generators, to generate users from multiple perspectives of items' attributes. Namely, the generated users are represented by attribute-level features. As both users and items are attribute-level representations, we can implicitly obtain user-item attribute-level interaction information. In light of this, the new item can be recommended to users based on attribute-level similarity. Extensive experimental results on two item cold-start scenarios, movie and goods recommendation, verify the effectiveness of our proposed model as compared to state-of-the-art baselines.
Recently, plenty of neural network based recommendation models have demonstrated their strength in modeling complicated relationships between heterogeneous objects (i.e., users and items). However, the applications of these fine trained recommendation models are limited to the off-line manner or the re-ranking procedure (on a pre-filtered small subset of items), due to their time-consuming computations. Fast item ranking under learned neural network based ranking measures is largely still an open question.
In this paper, we formulate ranking under neural network based measures as a generic ranking task, Optimal Binary Function Search (OBFS), which does not make strong assumptions for the ranking measures. We first analyze limitations of existing fast ranking methods (e.g., ANN search) and explain why they are not applicable for OBFS. Further, we propose a flexible graph-based solution for it, Binary Function Search on Graph (BFSG). It can achieve approximate optimal efficiently, with accessible conditions. Experiments demonstrate effectiveness and efficiency of the proposed method, in fast item ranking under typical neural network based measures.
Graph neural networks (GNNs) are widely used in many applications. However, their robustness against adversarial attacks is criticized. Prior studies show that using unnoticeable modifications on graph topology or nodal features can significantly reduce the performances of GNNs. It is very challenging to design robust graph neural networks against poisoning attack and several efforts have been taken. Existing work aims at reducing the negative impact from adversarial edges only with the poisoned graph, which is sub-optimal since they fail to discriminate adversarial edges from normal ones. On the other hand, clean graphs from similar domains as the target poisoned graph are usually available in the real world. By perturbing these clean graphs, we create supervised knowledge to train the ability to detect adversarial edges so that the robustness of GNNs is elevated. However, such potential for clean graphs is neglected by existing work. To this end, we investigate a novel problem of improving the robustness of GNNs against poisoning attacks by exploring clean graphs. Specifically, we propose PA-GNN, which relies on a penalized aggregation mechanism that directly restrict the negative impact of adversarial edges by assigning them lower attention coefficients. To optimize PA-GNN for a poisoned graph, we design a meta-optimization algorithm that trains PA-GNN to penalize perturbations using clean graphs and their adversarial counterparts, and transfers such ability to improve the robustness of PA-GNN on the poisoned graph. Experimental results on four real-world datasets demonstrate the robustness of PA-GNN against poisoning attacks on graphs.
This paper investigates the notion of learning user and item representations in non-Euclidean space. Specifically, we study the connection between metric learning in hyperbolic space and collaborative filtering by exploring Mobius gyrovector spaces where the formalism of the spaces could be utilized to generalize the most common Euclidean vector operations. Overall, this work aims to bridge the gap between Euclidean and hyperbolic geometry in recommender systems through metric learning approach. We propose HyperML (Hyperbolic Metric Learning), a conceptually simple but highly effective model for boosting the performance. Via a series of extensive experiments, we show that our proposed HyperML not only outperforms their Euclidean counterparts, but also achieves state-of-the-art performance on multiple benchmark datasets, demonstrating the effectiveness of personalized recommendation in hyperbolic geometry.
Modern collaborative filtering algorithms seek to provide personalized product recommendations by uncovering patterns in consumer-product interactions. However, these interactions can be biased by how the product is marketed, for example due to the selection of a particular human model in a product image. These correlations may result in the underrepresentation of particular niche markets in the interaction data; for example, a female user who would potentially like motorcycle products may be less likely to interact with them if they are promoted using stereotypically 'male' images.
In this paper, we first investigate this correlation between users' interaction feedback and products' marketing images on two real-world e-commerce datasets. We further examine the response of several standard collaborative filtering algorithms to the distribution of consumer-product market segments in the input interaction data, revealing that marketing strategy can be a source of bias for modern recommender systems. In order to protect recommendation performance on underrepresented market segments, we develop a framework to address this potential marketing bias. Quantitative results demonstrate that the proposed approach significantly improves the recommendation fairness across different market segments, with a negligible loss (or better) recommendation accuracy.
We propose a personalized user recommendation framework for content curation platforms that models preferences for both users and the items they engage with simultaneously. In this way, user preferences for specific item types (e.g., fantasy novels) can be balanced with user specialties (e.g., reviewing novels with strong female protagonists). In particular, the proposed model has three unique characteristics: (i) it simultaneously learns both user-item and user-user preferences through a multi-aspect autoencoder model; (ii) it fuses the latent representations of user preferences on users and items to construct shared factors through an adversarial framework; and (iii) it incorporates an attention layer to produce weighted aggregations of different latent representations, leading to improved personalized recommendation of users and items. Through experiments against state-of-the-art models, we find the proposed framework leads to a 18.43% (Goodreads) and 6.14% (Spotify) improvement in top-k user recommendation.
Recommendation systems typically rely on the interactions between a crowd of ordinary users and items, ignoring the fact that many real-world communities are notably influenced by a small group of key opinion leaders, whose feedback on items wields outsize influence. With important positions in the community (e.g. have a large number of followers), their elite opinions are able to diffuse to the community and further impact what items we buy, what media we consume, and how we interact with online platforms. Hence, this paper investigates how to develop a novel recommendation system by explicitly capturing the influence from key opinion leaders to the whole community. Centering around opinion elicitation and diffusion, we propose an end-to-end Graph-based neural model - GoRec. Specifically, to preserve the multi-relations between key opinion leaders and items, GoRec elicits the opinions from key opinion leaders with a translation-based embedding method. Moreover, GoRec adopts the idea of Graph Neural Networks to model the elite opinion diffusion process for improved recommendation. Through experiments on Goodreads and Epinions, the proposed model outperforms state-of-the-art approaches by 10.75% and 9.28% on average in Top-K item recommendation.
Currently, most sequence-based recommendation models aim to predict a user's next actions (e.g. next purchase) based on their past actions. These models either capture users' intrinsic preference (e.g. a comedy lover, or a fan of fantasy) from their long-term behavior patterns or infer their current needs by emphasizing recent actions. However, in e-commerce, intrinsic user behavior may be shifted by occasions such as birthdays, anniversaries, or gifting celebrations (Valentine's Day or Mother's Day), leading to purchases that deviate from long-term preferences and are not related to recent actions. In this work, we propose a novel next-item recommendation system which models a user's default, intrinsic preference, as well as two different kinds of occasion-based signals that may cause users to deviate from their normal behavior. More specifically, this model is novel in that it: (1) captures a personal occasion signal using an attention layer that models reoccurring occasions specific to that user (e.g. a birthday); (2) captures a global occasion signal using an attention layer that models seasonal or reoccurring occasions for many users (e.g. Christmas); (3) balances the user's intrinsic preferences with the personal and global occasion signals for different users at different timestamps with a gating layer. We explore two real-world e-commerce datasets (Amazon and Etsy) and show that the proposed model outperforms state-of-the-art models by 7.62% and 6.06% in predicting users' next purchase.
User satisfaction is an important factor when evaluating search systems, and hence a good metric should give rise to scores that have a strong positive correlation with user satisfaction ratings. A metric should also correspond to a plausible user model, and hence provide a tangible manifestation of how users interact with search rankings. Recent work has focused on metrics whose user models accurately portray the behavior of search engine users. Here we investigate whether those same metrics then also correlate with user satisfaction. We carry out experiments using various classes of metrics, and confirm through the lens of the C/W/L framework that the metrics with user models that reflect typical behavior also tend to be the metrics that correlate well with user satisfaction ratings.
A knowledge-based question-answering (KB-QA) system is one that answers natural-language questions by accessing information stored in a knowledge base (KB). Existing KB-QA systems generally register an accuracy of 70-80% for simple questions and less for more complex ones. We observe that certain questions are intrinsically difficult to answer correctly with existing systems. We propose the PERQ framework to address this issue. Given a question q, we perform three steps to boost answer accuracy: (1) (Prediction) We predict if q can be answered correctly by a KB-QA system S. (2) (Explanation) If S is predicted to fail q, we analyze them to determine the most likely reasons of the failure. (3) (Rectification) We use the prediction and explanation results to rectify the answer. We put forward tools to achieve the three steps and analyze their effectiveness. Our experiments show that the PERQ framework can significantly improve KB-QA systems' accuracies over simple questions.
In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation. We first provide a comprehensive comparison between PKG and ordinary knowledge graph (KG) and then illustrate why KG embedding methods are not suitable for PKG learning. We construct a self-attention-enhanced distributed representation learning model for learning PKG embeddings from raw customer activity data in an end-to-end fashion. We design an effective multi-task learning schema to fully leverage the multi-modal e-commerce data. The ¶oincare embedding is also employed to handle complex entity structures. We use a real-world dataset from \textslgrocery.walmart.com to evaluate the performances on knowledge completion, search ranking and recommendation. The proposed approach compares favourably to baselines in knowledge completion and downstream tasks.
Learning product representations that reflect complementary relationship plays a central role in e-commerce recommender system. In the absence of the product relationships graph, which existing methods rely on, there is a need to detect the complementary relationships directly from noisy and sparse customer purchase activities. Furthermore, unlike simple relationships such as similarity, complementariness is asymmetric and non-transitive. Standard usage of representation learning emphasizes on only one set of embedding, which is problematic for modelling such properties of complementariness.
We propose using knowledge-aware learning with dual product embedding to solve the above challenges. We encode contextual knowledge into product representation by multi-task learning, to alleviate the sparsity issue. By explicitly modelling with user bias terms, we separate the noise of customer-specific preferences from the complementariness. Furthermore, we adopt the dual embedding framework to capture the intrinsic properties of complementariness and provide geometric interpretation motivated by the classic separating hyperplane theory. Finally, we propose a Bayesian network structure that unifies all the components, which also concludes several popular models as special cases.
The proposed method compares favourably to state-of-art methods, in downstream classification and recommendation tasks. We also develop an implementation that scales efficiently to a dataset with millions of items and customers.
Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have demonstrated excellent results in question answering areas. However, due to the sheer amount of model parameters, the inference speed of these models is very slow. How to apply these complex models to real business scenarios becomes a challenging but practical problem. Previous model compression methods usually suffer from information loss during the model compression procedure, leading to inferior models compared with the original one. To tackle this challenge, we propose a Two-stage Multi-teacher Knowledge Distillation (TMKD for short) method for web Question Answering system. We first develop a general Q&A distillation task for student model pre-training, and further fine-tune this pre-trained student model with multi-teacher knowledge distillation on downstream tasks (like Web Q&A task, MNLI, SNLI, RTE tasks from GLUE), which effectively reduces the overfitting bias in individual teacher models, and transfers more general knowledge to the student model. The experiment results show that our method can significantly outperform the baseline methods and even achieve comparable results with the original teacher models, along with substantial speedup of model inference.
While node semantics have been extensively explored in social networks, little research attention has been paid to pro le edge semantics, i.e., social relations. Ideal edge semantics should not only show that two users are connected, but also why they know each other and what they share in common. However, relations in social networks are often hard to pro le, due to noisy multi-modal signals and limited user-generated ground-truth labels.
In this work, we aim to develop a uni ed and principled frame- work that can pro le user relations as edge semantics in social networks by integrating multi-modal signals in the presence of noisy and incomplete data. Our framework is also exible towards limited or missing supervision. Speci cally, we assume a latent distribution of multiple relations underlying each user link, and learn them with multi-modal graph edge variational autoencoders. We encode the network data with a graph convolutional network, and decode arbitrary signals with multiple reconstruction networks. Extensive experiments and case studies on two public DBLP author networks and two internal LinkedIn member networks demonstrate the superior e ectiveness and e ciency of our proposed model.
How can we run graphical inference on large graphs efficiently and accurately? Many real-world networks are modeled as graphical models, and graphical inference is fundamental to understand the properties of those networks. In this work, we propose a novel approach for fast and accurate inference, which first samples a small subgraph and then runs inference over the subgraph instead of the given graph. This is done by the bounded treewidth (BTW) sampling, our novel algorithm that generates a subgraph with guaranteed bounded treewidth while retaining as many edges as possible. We first analyze the properties of BTW theoretically. Then, we evaluate our approach on node classification and compare it with the baseline which is to run loopy belief propagation (LBP) on the original graph. Our approach can be coupled with various inference algorithms: it shows higher accuracy up to 13.7% with the junction tree algorithm, and allows faster inference up to 23.8 times with LBP. We further compare BTW with previous graph sampling algorithms and show that it gives the best accuracy.
Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.
For random walks on a graph, the mean hitting time $H_j$ from a vertex i chosen from the stationary distribution to the target vertex j can be used as a measure of importance for vertex j, while the Kemeny constant K is the mean hitting time from a vertex i to a vertex j selected randomly according to the stationary distribution. Both quantities have found a large variety of applications in different areas. However, their high computational complexity limits their applications, especially for large networks with millions of vertices. In this paper, we first establish a connection between the two quantities, representing K in terms of $H_j$ for all vertices. We then express both quantities in terms of quadratic forms of the pseudoinverse for graph Laplacian, based on which we develop an efficient algorithm that provides an approximation of $H_j$ for all vertices and K in nearly linear time with respect to the edge number, with high probability. Extensive experiment results on real-life and model networks validate both the efficiency and accuracy of the proposed algorithm.
Recently, the embedding-based recommendation models (e.g., matrix factorization and deep models) have been prevalent in both academia and industry due to their effectiveness and flexibility. However, they also have such intrinsic limitations as lacking explainability and suffering from data sparsity. In this paper, we propose an end-to-end joint learning framework to get around these limitations without introducing any extra overhead by distilling structured knowledge from a differentiable path-based recommendation model. Through extensive experiments, we show that our proposed framework can achieve state-of-the-art recommendation performance and meanwhile provide interpretable recommendation reasons.
Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record pairs to be matched. However, most of the traditional blocking methods are learning-free and key-based, and their successes are largely built on laborious human effort in cleaning data and designing blocking keys.
In this paper, we propose AutoBlock, a novel hands-off blocking framework for entity matching, based on similarity-preserving representation learning and nearest neighbor search. Our contributions include: (a) Automation: AutoBlock frees users from laborious data cleaning and blocking key tuning. (b) Scalability: AutoBlock has a sub-quadratic total time complexity and can be easily deployed for millions of records. (c) Effectiveness: AutoBlock outperforms a wide range of competitive baselines on multiple large-scale, real-world datasets, especially when datasets are dirty and/or unstructured.
Question routing (QR) aims at recommending newly posted questions to the potential answerers who are most likely to answer the questions. The existing approaches that learn users' expertise from their past question-answering activities usually suffer from challenges in two aspects: 1) multi-faceted expertise and 2) temporal dynamics in the answering behavior. This paper proposes a novel temporal context-aware model in multiple granularities of temporal dynamics that concurrently address the above challenges. Specifically, the temporal context-aware attention characterizes the answerer's multi-faceted expertise in terms of the questions' semantic and temporal information simultaneously. Moreover, the design of the multi-shift and multi-resolution module enables our model to handle temporal impact on different time granularities. Extensive experiments on six datasets from different domains demonstrate that the proposed model significantly outperforms competitive baseline models.
The importance of the distribution of ratings on recommender systems (RS) is well-recognized. And yet, recommendation approaches based on latent factor models and recently introduced neural variants (e.g., NCF) optimize for the head of these distributions, potentially leading to large estimation errors for tail ratings. These errors in tail ratings that are far from the mean predicted rating fall out of a uni-modal assumption underlying these popular models, as we show in this paper. We propose to improve the estimation of tail ratings by extending traditional single latent representations (e.g., an item is represented by a single latent vector) with new multi-latent representations for better modeling these tail ratings. We show how to incorporate these multi-latent representations in an end-to-end neural prediction model that is designed to better reflect the underlying ratings distributions of items. Through experiments over six datasets, we find the proposed model leads to a significant improvement in RMSE versus a suite of benchmark methods. We also find that the predictions for the most polarized items are improved by more than 15%.
Examination is one of the most important user interactions in Web search. A number of works studied examination behavior in Web search and helped researchers better understand how users allocate their attention on search engine result pages (SERPs). Compared to desktop search, mobile search has a number of differences such as fewer results on the screen. These differences bring in mobile-specific factors affecting users' examination behavior. However, there still lacks research on users' attention allocation mechanism via viewports in mobile search. Therefore, we design a lab-based study to collect user's rich interaction behavior in mobile search. Based on the collected data, we first analyze how users examine SERPs and allocate their attention to heterogeneous results. Then we investigate the effect of mobile-specific factors and other common factors on users allocating attention. Finally, we apply the findings of user attention allocation from the user study into click model construction efforts, which significantly improves the state-of-the-art click model. Our work brings insights into a better understanding of users' interaction patterns in mobile search and may benefit other mobile search-related research.
Relevance search is to find top-ranked entities in a knowledge graph (KG) that are relevant to a query entity. Relevance is ambiguous, particularly over a schema-rich KG like DBpedia which supports a wide range of different semantics of relevance based on numerous types of relations and attributes. As users may lack the expertise to formalize the desired semantics, supervised methods have emerged to learn the hidden user-defined relevance from user-provided examples. Along this line, in this paper we propose a novel generative model over KGs for relevance search, named GREASE. The model applies to meta-path based relevance where a meta-path characterizes a particular type of semantics of relating the query entity to answer entities. It is also extended to support properties that constrain answer entities. Extensive experiments on two large-scale KGs demonstrate that GREASE has advanced the state of the art in effectiveness, expressiveness, and efficiency.
The goal of personalized search is to tailor the document ranking list to meet user's individual needs. Previous studies showed users usually look for the information that has been searched before. This is called re-finding behavior which is widely explored in existing personalized search approaches. However, most existing methods for identifying re-finding behavior focus on simple lexical similarities between queries. In this paper, we propose to construct memory networks (MN) to support the identification of more complex re-finding behavior. Specifically, incorporating semantic information, we devise two external memories to make an expansion of re-finding based on the query and the document respectively. We further design an intent memory to recognize session-based re-finding behavior. Endowed with these memory networks, we can build a fine-grained user model dynamically based on the current query and documents, and use the model to re-rank the results. Experimental results show the significant improvement of our model compared with traditional methods.
In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can lead to imprecise permutation probabilities and inefficient training because of selecting documents one by one. (2) It does not favor documents having high relevance. (3) It has a loose assumption that sampling documents at different steps is independent. To overcome the first two limitations, we model ranking as selecting documents from a candidate set based on unique rating levels in decreasing order. The number of steps in training is determined by the number of unique rating levels. More specifically, in each step, we apply multiple multi-class classification tasks to a document candidate set and choose all documents that have the highest rating from the document set. This is in contrast to taking one document step by step in the classical Plackett-Luce model. Afterward, we remove all of the selected documents from the document set and repeat until the remaining documents all have the lowest rating. We propose a new loss function and associated four models for the entire sequence of weighted classification tasks by assigning high weights to the selected documents with high ratings for optimizing Normalized Discounted Cumulative Gain (NDCG). To overcome the final limitation, we further propose a novel and efficient way of refining prediction scores by combining an adapted Vanilla Recurrent Neural Network (RNN) model with pooling given selected documents at previous steps. We encode all of the documents already selected by an RNN model. In a single step, we rank all of the documents with the same ratings using the last cell of the RNN multiple times. We have implemented our models using three settings: neural networks, neural networks with gradient boosting, and regression trees with gradient boosting. We have conducted experiments on four public datasets. The experiments demonstrate that the models notably outperform state-of-the-art learning-to-rank models.
The next-item recommendation has attracted great research interests with both static and dynamic users' preferences considered. Existing approaches typically utilize user-item binary relations, and assume a flat preference distribution over items for each user. However, this assumption neglects the hierarchical discrimination between user intentions and user preferences, causing the methods have limited capacity to depict intention-specific preference. In fact, a consumer's purchasing behavior involves a natural sequential process, i.e., he/she first has an intention to buy one type of items, followed by choosing a specific item according to his/her preference under this intention. To this end, we propose a novel key-array memory network (KA-MemNN), which takes both user intentions and preferences into account for next-item recommendation. Specifically, the user behavioral intention tendency is determined through key addressing. Further, each array outputs an intention-specific preference representation of a user. Then, the degree of user's behavioral intention tendency and intention-specific preference representation are combined to form a hierarchical representation of a user. This representation is further utilized to replace the static profile of users in traditional matrix factorization for the purposes of reasoning. The experimental results on real-world data demonstrate the advantages of our approach over state-of-the-art methods.
Applying reinforcement learning (RL) in recommender systems is attractive but costly due to the constraint of the interaction with real customers, where performing online policy learning through interacting with real customers usually harms customer experiences. A practical alternative is to build a recommender agent offline from logged data, whereas directly using logged data offline leads to the problem of selection bias between logging policy and the recommendation policy. The existing direct offline learning algorithms, such as Monte Carlo methods and temporal difference methods are either computationally expensive or unstable on convergence. To address these issues, we propose Pseudo Dyna-Q (PDQ). In PDQ, instead of interacting with real customers, we resort to a customer simulator, referred to as the World Model, which is designed to simulate the environment and handle the selection bias of logged data. During policy improvement, the World Model is constantly updated and optimized adaptively, according to the current recommendation policy. This way, the proposed PDQ not only avoids the instability of convergence and high computation cost of existing approaches but also provides unlimited interactions without involving real customers. Moreover, a proved upper bound of empirical error of reward function guarantees that the learned offline policy has lower bias and variance. Extensive experiments demonstrated the advantages of PDQ on two real-world datasets against state-of-the-arts methods.
Enabling the analysis of behavioral disorders over time in social networks, can help in suicide prevention, (school) bullying detection and extremist/criminal activity prediction. In this paper, we present a novel data analytics pipeline to enable the analysis of patterns of behavioral disorders on social networks. We present a Social Behavior Graph (sbGraph) model, to enable the analysis of factors that are driving behavior disorders over time. We use the golden standards in personality, behavior and attitude to build a domain specific Knowledge Base (KB). We use this domain knowledge to design cognitive services to automatically contextualize the raw social data and to prepare them for behavioral analytics. Then we introduce a pattern-based word embedding technique, namely personality2vec, on each feature extracted to build the sbGraph. The goal is to use mathematical embedding from a space with a dimension per feature to a continuous vector space which can be mapped to classes of behavioral disorders (such as cyber-bullying and radicalization) in the domain specific KB. We implement an interactive dashboard to enable social network analysts to analyze and understand the patterns of behavioral disorders over time. We focus on a motivating scenario in Australian government's office of the e-Safety commissioner, where the goal is to empowering all citizens to have safer, more positive experiences online.
Lack of comprehensive information on frequently asked questions (FAQ) web pages forces users to pose their questions on community question answering forums or contact businesses over slow media like emails or phone calls. This in turn often results into sub-optimal user experience and opportunity loss for businesses. While previous work focuses on FAQ mining and answering queries from FAQ pages, there is no work on verifying completeness or augmenting FAQ pages. We present a system, called FAQAugmenter, which given an FAQ web page, (1) harnesses signals from query logs and the web corpus to identify missing topics, and (2) suggests ranked list of questions for FAQ web page augmentation. Our experiments with FAQ pages from five enterprises each across three categories (banks, hospitals and airports) show that FAQAugmenter suggests high quality relevant questions. FAQAugmenter will contribute significantly not just in improving quality of FAQ web pages but also in turn improving quality of downstream applications like Microsoft QnA Maker.
Quantities are more than numeric values. They represent measures for entities, expressed in numbers with associated units. Search queries often include quantities, such as athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. Processing such queries requires understanding the quantities, where capturing the surrounding context is an essential part of it. Although modern search engines or QA systems handle entity-centric queries well, they consider numbers and units as simple keywords, and therefore fail to understand the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.) As a result, they cannot generate the correct candidate answers. In this work, we demonstrate a prototype QA system, called Qsearch, that can handle advanced queries with quantity constraints using the common cues present in both query and the text sources.
Network visualization has played a critical role in graph analysis, as it not only presents a big picture of a network but also helps reveal the structural information of a network. The most popular visual representation of networks is the node-link diagram. However, visualizing a large network with the node-link diagram can be challenging due to the difficulty in obtaining an optimal graph layout. To address this challenge, a recent advancement in network representation: network shape, allows one to compactly represent a network and its subgraphs with the distribution of their embeddings. Inspired by this research, we have designed a web platform WebShapes that enables researchers and practitioners to visualize their network data as customized 3D shapes (http://b.link/webshapes). Furthermore, we provide a case study on real-world networks to explore the sensitivity of network shapes to different graph sampling, embedding, and fitting methods, and we show examples of understanding networks through their network shapes.
Scholarly search systems greatly aid the deep understanding of scholarly data and facilitate the research activities of scholars for scientific studies. Though a number of such systems have been developed, most of them either support rankings of limited search of entities or provide only basic ranking metrics. These existing systems also mainly adopt RDBMSs as their storage such that the linked feature of scholarly data is not fully exploited. In this study, we design and develop a novel scholarly search system Athena. (1) It supports four types of scholarly entity searches: articles, authors, venues and affiliations, and is equipped with five ranking metrics, including three traditional metrics and two comprehensive importance ranking metrics. (2) It also provides profiling of scholarly entities. (3) It further utilizes a graph storage to directly leverage the linked feature for speeding up the processing of complex queries. We demonstrate the advantages of Athena at scholarly search, profiling, graph storage and ranking quality.
With the growing popularity of neural approaches for ad-hoc ranking, there is a need for tools that can effectively reproduce prior results and ease continued research by supporting current state-of-the-art approaches. Although several excellent neural ranking tools exist, none offer an easy end-to-end ad-hoc neural raking pipeline. A complete pipeline is particularly important for ad-hoc ranking because there are numerous parameter settings that have a considerable effect on the ultimate performance yet often are under-reported in current work (e.g., initial ranking settings, re-ranking threshold, training sampling strategy, etc.). In this work, I present a complete ad-hoc neural ranking pipeline which addresses these shortcomings: OpenNIR. The pipeline is easy to use (a single command will download required data, train, and evaluate a model), yet highly configurable, allowing for continued work in areas that are understudied. Aside from the core pipeline, the software also includes several bells and whistles that make use of components of the pipeline, such as performance benchmarking and tuning of unsupervised ranker parameters for fair comparisons against traditional baselines. The pipeline and these capabilities are demonstrated. The code is available, and contributions are welcome.
Human perception is known to be predominantly visual. As modern web infrastructure promoted the storage of media, the web-data paradigm shifted from text-only documents to those containing text and images. A multitude of blog posts, news articles, and social media posts exist on the Internet today as examples of multimodal stories. The manual alignment of images and text in a story is time-consuming and labor intensive. We present a web application for automatically selecting relevant images from an album and placing them in suitable contexts within a body of text. The application solves a global optimization problem that maximizes the coherence of text paragraphs and image descriptors, and allows for exploring the underlying image descriptors and similarity metrics. Experiments show that our method can align images with texts with high semantic fit, and to user satisfaction.
In this paper, we present SPread, an automated financial metric extraction and spreading tool from earnings reports. The tool is created in a document-agnostic fashion, and uses an interpolation of tagging methods to capture arbitrarily complicated expressions. SPread can handle single-line items as well as metrics broken down into sub-items. A validation layer further improves the performance of upstream modules and enables the tool to reach an F1 performance of more than 87% for metrics expressed in tabular format, and 76% for metrics in free-form text. The results are displayed to end-users in an interactive web interface, which allows them to locate, compare, validate, adjust, and export the values.
Large scale knowledge graph (KG) has attracted wide attentions in both academia and industry recently. However, due to the complexity of SPARQL syntax and massive volume of real KG, it remains difficult for ordinary users to access KG. In this demo, we present VISION-KG, a topic-centric visualization system to help users navigate KG easily via entity summarization and entity clustering. Given a query entity v0, VISION-KG summarizes the induced subgraph of v0's neighbor nodes via our proposed facts ranking method that measures importance, relatedness and diversity. Moreover, to achieve conciseness, we split the summarized graph into several topic-centric summarized subgraph according to semantic and structural similarities among entities. We will demonstrate how VISION-KG provides a user-friendly visualization interface for navigating KG.
We present Capreolus, a toolkit designed to facilitate end-to-end it ad hoc retrieval experiments with neural networks by providing implementations of prominent neural ranking models within a common framework. Our toolkit adopts a standard reranking architecture via tight integration with the Anserini toolkit for candidate document generation using standard bag-of-words approaches. Using Capreolus, we are able to reproduce Yang et al.'s recent SIGIR 2019 finding that, in a reranking scenario on the test collection from the TREC 2004 Robust Track, many neural retrieval models do not significantly outperform a strong query expansion baseline. Furthermore, we find that this holds true for five additional models implemented in Capreolus. We describe the architecture and design of our toolkit, which includes a Web interface to facilitate comparisons between rankings returned by different models.
This tutorial addresses the fundamentals and advances in deep Bayesian mining and learning for natural language with ubiquitous applications ranging from speech recognition to document summarization, text classification, text segmentation, information extraction, image caption generation, sentence generation, dialogue control, sentiment classification, recommendation system, question answering and machine translation, to name a few. Traditionally, "deep learning" is taken to be a learning process where the inference or optimization is based on the real-valued deterministic model. The "semantic structure" in words, sentences, entities, actions and documents drawn from a large vocabulary may not be well expressed or correctly optimized in mathematical logic or computer programs. The "distribution function" in discrete or continuous latent variable model for natural language may not be properly decomposed or estimated. This tutorial addresses the fundamentals of statistical models and neural networks, and focus on a series of advanced Bayesian models and deep models including hierarchical Dirichlet process, Chinese restaurant process, hierarchical Pitman-Yor process, Indian buffet process, recurrent neural network (RNN), long short-term memory, sequence-to-sequence model, variational auto-encoder (VAE), generative adversarial network (GAN), attention mechanism, memory-augmented neural network, skip neural network, temporal difference VAE, stochastic neural network, stochastic temporal convolutional network, predictive state neural network, and policy neural network. Enhancing the prior/posterior representation is addressed. We present how these models are connected and why they work for a variety of applications on symbolic and complex patterns in natural language. The variational inference and sampling method are formulated to tackle the optimization for complicated models. The word and sentence embeddings, clustering and co-clustering are merged with linguistic and semantic constraints. A series of case studies, tasks and applications are presented to tackle different issues in deep Bayesian mining, searching, learning and understanding. At last, we will point out a number of directions and outlooks for future studies. This tutorial serves the objectives to introduce novices to major topics within deep Bayesian learning, motivate and explain a topic of emerging importance for data mining and natural language understanding, and present a novel synthesis combining distinct lines of machine learning work.
Recommender systems (RS) are an integral part of many online services aiming to provide an enhanced user-oriented experience. Machine learning (ML) models are nowadays broadly adopted in modern state-of-the-art approaches to recommendation, which are typically trained to maximize a user-centred utility (e.g., user satisfaction) or a business-oriented one (e.g., profitability or sales increase). They work under the main assumption that users' historical feedback can serve as proper ground-truth for model training and evaluation. However, driven by the success in the ML community, recent advances show that state-of-the-art recommendation approaches such as matrix factorization (MF) models or the ones based on deep neural networks can be vulnerable to adversarial perturbations applied on the input data. These adversarial samples can impede the ability for training high-quality MF models and can put the driven success of these approaches at high risk.
As a result, there is a new paradigm of secure training for RS that takes into account the presence of adversarial samples into the recommendation process. We present adversarial machine learning in Recommender Systems (AML-RecSys), which concerns the study of effective ML techniques in RS to fight against an adversarial component. AML-RecSys has been proposed in two main fashions within the RS literature: (i) adversarial regularization, which attempts to combat against adversarial perturbation added to input data or model parameters of a RS and, (ii) generative adversarial network (GAN)-based models, which adopt a generative process to train powerful ML models. We discuss a theoretical framework to unify the two above models, which is performed via a minimax game between an adversarial component and a discriminator. Furthermore, we explore various examples illustrating the successful application of AML to solve various RS tasks. Finally, we present a global taxonomy/overview of the academic literature based on several identified dimensions, namely (i) research goals and challenges, (ii) application domains and (iii) technical overview.
In this tutorial, we present a portion of unique industry experience in efficient data labelling via crowdsourcing shared by both leading researchers and engineers from Yandex. We will make an introduction to data labelling via public crowdsourcing marketplaces and will present key components of efficient label collection. This will be followed by a practice session, where participants will choose one of the real label collection tasks, experiment with selecting settings for the labelling process, and launch their label collection project on Yandex.Toloka, one of the largest crowdsourcing marketplaces. The projects will be run on real crowds within the tutorial session. Finally, participants will receive a feedback about their projects and practical advice to make them more efficient. We expect that our tutorial will address an audience with a wide range of background and interests. We do not require specific prerequisite knowledge or skills. We invite beginners, advanced specialists, and researchers to learn how to efficiently collect labelled data.
A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex changes like using machine learning models to personalize user experience. The key aspect of A/B testing is evaluation of experiment results. Designing the right set of metrics - correct outcome measures, data quality indicators, guardrails that prevent harm to business, and a comprehensive set of supporting metrics to understand the "why" behind the key movements is the #1 challenge practitioners face when trying to scale their experimentation program [11, 14]. On the technical side, improving sensitivity of experiment metrics is a hard problem and an active research area, with large practical implications as more and more small and medium size businesses are trying to adopt A/B testing and suffer from insufficient power. In this tutorial we will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions. A version of this tutorial was also present at KDD 2019 . It was attended by around 150 participants.
Intelligible machine learning and knowledge discovery are important for modeling individual and social behavior, user activity, link prediction, community detection, crowd-generated data, and others. The role of the interpretable method in web search and mining activities is also very significant to enhance clustering, classification, data summarization, knowledge acquisition, opinion and sentiment mining, web traffic analysis, and web recommender systems. Deep learning success in accuracy of prediction and its failure in explanation of the produced models without special interpretation efforts motivated the surge of efforts to make Machine Learning (ML) models more intelligible and understandable. The prominence of visual methods in getting appealing explanations of ML models motivated the growth of deep visualization, and visual knowledge discovery. This tutorial covers the state-of-the-art research, development, and applications in the area of Intelligible Knowledge Discovery, and Machine Learning boosted by Visual Means.
In the era of big data, it is easy for us collect a huge number of image and text data. However, we frequently face the real-world problems with only small (labeled) data in some domains, such as healthcare and urban computing. The challenge is how to make machine learn algorithms still work well with small data? To solve this challenge, in this tutorial, we will cover the state-of-the-art machine learning techniques to handle small data issue. In particular, we focus on the following three aspects: (1) Providing a comprehensive review of recent advances in exploring the power of knowledge transfer, especially focusing on meta-learning; (2) introducing the cutting-edge techniques of incorporating human/expert knowledge into machine learning models; and (3) identifying the open challenges to data augmentation techniques, such as generative adversarial networks. We believe this is an emerging and potentially high-impact topic in computational data science, which will attract both researchers and practitioners from academia and industry.
How do we surface the large amount of information present in HTML documents on the Web, from news articles to scientific papers to Rotten Tomatoes pages to tables of sports scores? Such information can enable a variety of applications including knowledge base construction, question answering, recommendation, and more. In this tutorial, we present approaches for Information Extraction (IE) from Web data that can be differentiated along two key dimensions: 1) the diversity in data modality that is leveraged, e.g. text, visual, XML/HTML, and 2) the thrust to develop scalable approaches with zero to limited human supervision. We cover the key ideas and intuition behind existing approaches to emphasize their applicability and potential in various settings.
Recommendation methods construct predictive models to estimate the likelihood of a user-item interaction. Previous models largely follow a general supervised learning paradigm - treating each interaction as a separate data instance and building a supervised learning model upon the information isolated island. Such paradigm, however, overlook relations among data instances, hence easily resulting in suboptimal performance especially for sparse scenarios. Moreover, due to the black-box nature, most models hardly exhibit the reasons behind a prediction, making the recommendation process opaque to understand.
In this tutorial, we revisit the recommendation problem from the perspective of graph learning and reasoning. Common data sources for recommendation can be organized into graphs, such as bipartite user-item interaction graphs, social networks, item knowledge graphs (heterogeneous graphs), among others. Such a graph-based organization connects the isolated data instances and exhibits relationships among instances as high-order connectivities, thereby encoding meaningful patterns for collaborative filtering, content-based filtering, social influence modeling, and knowledgeaware reasoning. Inspired by this, prior studies have incorporated graph analysis (e.g., random walk) and graph learning (e.g., network embedding) into recommender models and achieved great success. Together with the recent success of graph neural networks (GNNs), graph-based models have exhibited the potential to be the technologies for next-generation recommender systems. This tutorial provides a review on graph-based learning methods for recommendation, with special focus on recent developments of GNNs. By introducing this emerging and promising topic in this tutorial, we expect the audience to get deep understanding and accurate insight on the spaces, stimulate more ideas and discussions, and promote developments of technologies.
Anomaly detection has been widely studied and used in diverse applications. Building an effective anomaly detection system requires the researchers/developers to learn the complex structure from noisy data, identify the dynamic anomaly patterns and detect anomalies while lacking sufficient labels. Recent advancement in deep learning techniques has made it possible to largely improve anomaly detection performance compared to the classical approaches. This tutorial will help the audience gain a comprehensive understanding of deep learning-based anomaly detection techniques in various application domains. First, it introduces what is the anomaly detection problem, the approaches taken before the deep model era and the challenges it faced. Then it surveys the state-of-the-art deep learning models extensively and discusses the techniques used to overcome the limitations from traditional algorithms. Second to last, it studies deep model anomaly detection techniques in real world examples from LinkedIn production systems. The tutorial concludes with a discussion of future trends.
Conversational systems have improved dramatically recently, and are receiving increasing attention in academic literature. These systems are also becoming adapted in E-Commerce due to increased integration of E-Commerce search and recommendation source with virtual assistants such as Alexa, Siri, and Google assistant. However, significant research challenges remain spanning areas of dialogue systems, spoken natural language processing, human-computer interaction, and search and recommender systems, which all are exacerbated with demanding requirements of E-Commerce.
The purpose of this workshop is to bring together researchers and practitioners in the areas of conversational systems, human-computer interaction, information retrieval, and recommender systems. Bringing diverse research areas together into a single workshop would accelerate progress on adapting conversation systems to the E-Commerce domain, to set a research agenda, to examine how to build and share data sets, and to establish common evaluation metrics and benchmarks to drive research progress.
Capturing and effectively utilising user states and goals is becoming a timely challenge for successfully leveraging intelligent and usercentric systems in differentweb search and data mining applications. Examples of such systems are conversational agents, intelligent assistants, educational and contextual information retrieval systems, recommender/match-making systems and advertising systems, all of which rely on identifying the user state in order to provide the most relevant information and assist users in achieving their goals. There has been, however, limited work towards building such state-aware intelligent learning mechanisms. Hence, devising information systems that can keep track of the user's state has been listed as one of the grand challenges to be tackled in the next few years . It is thus timely to organize a workshop that re-visits the problem of designing and evaluating state-aware and user-centric systems, ensuring that the community (spanning academic and industrial backgrounds) works together to tackle these challenges.
We present HSDM, a full-day workshop on Health Search and Data Mining co-located with WSDM 2020's Health Day. This event builds on recent biomedical workshops in the NLP and ML communities but puts a clear emphasis on search and data mining (and their intersection) that is lacking in other venues. The program will include two keynote addresses by key opinion leaders in the clinical, search, and data mining domains. The technical program consists of 6 original research presentations. Finally, we will close with a panel discussion with keynote speakers, PC members, and the audience.
This workshop aims to help consolidate the growing interest in biomedical applications of data-driven methods that becomes apparent all over the search and data mining spectrum, in WSDM's spirit of collaboration between industry and academia.
Privacy-preserving data analysis has become essential in Machine Learning (ML), where access to vast amounts of data can provide large gains the in accuracies of tuned models. A large proportion of user-contributed data comes from natural language e.g., text transcriptions from voice assistants. It is therefore important for curated natural language datasets to preserve the privacy of the users whose data is collected and for the models trained on sensitive data to only retain non-identifying (i.e., generalizable) information. The workshop aims to bring together researchers and practitioners from academia and industry to discuss the challenges and approaches to designing, building, verifying, and testing privacy-preserving systems in the context of Natural Language Processing (NLP).
The first Workshop on Integrity in Social Networks and Media is held in conjunction with the 13th ACM Conference on Web Search and Data Mining (WSDM) in Houston, Texas, USA. The goal of the workshop is to bring together researchers and practitioners to discuss content and interaction integrity challenges in social networks and social media platforms.
Natural language processing is becoming more and more important in recommender systems. This half day workshop explores challenges and potential research directions in Recommender Systems (RSs) combining Natural Language Processing (NLP). The focus will be on stimulating discussions around how to combine natural language processing technologies with recommendation. We welcome theoretical, experimental, and methodological studies that leverage NLP technologies to advance recommender systems, as well as emphasize the applicability in practical applications. The workshop aims to bring together a diverse set of researchers and practitioners interested in investigating the interaction between NLP and RSs to develop more intelligent RSs.
Online platforms such as LinkedIn or specialized platforms such as Glassdoor are widely used by job seekers before applying for the job. These web platforms have rating and reviews about employer and jobs. Hence a job seeker do online search for the employer, before applying for the job. They try to find if the employer and job is good for them or not, what are the pros and cons of working there etc. Therefore, these reviews and ratings have an impact on job seekers decision as it portrays the pros and cons of working in a particular firm. Hence, the main objective of this study is main objective of this study is to find how the job seekers search for online employer reviews and the impact of these reviews on employer attractiveness and job pursuit intention. The other objective is to find the most crucial job factors that are given priority by the employee. For this, the study is proposed to be conducted in two stages, first, collecting data from the website Glassdoor, having 600000 companies' reviews. In the second stage, conducting an experimental study to examine the influence of job attributes (high vs. low) and employer rating (high vs. low) on job choice and employer attractiveness.
User intent understanding is a crucial step in designing both conversational agents and search engines. Detecting or inferring user intent is challenging, since the user utterances or queries can be short, ambiguous, and contextually dependent. To address these research challenges, my thesis work focuses on: 1) Utterance topic and intent classification for conversational agents 2) Query intent mining and classification for Web search engines, focusing on the e-commerce domain. To address the first topic, I proposed novel models to incorporate entity information and conversation-context clues to predict both topic and intent of the user's utterances. For the second research topic, I plan to extend the existing state of the art methods in Web search intent prediction to the e-commerce domain, via: 1) Developing a joint learning model to predict search queries' intents and the product categories associated with them, 2) Discovering new hidden users' intents. All the models will be evaluated on the real queries available from a major e-commerce site search engine. The results from these studies can be leveraged to improve performance of various tasks such as natural language understanding, query scoping, query suggestion, and ranking, resulting in an enriched user experience.
Twitter is currently a popular microblogging platform for spread of information by users in the form of tweet messages. Such tweets are shared with followers of the seed user who may reshare it with their own set of followers. Long chain of such retweets form cascades. For effective diffusion of information through such Twitter cascades, we identify two different objectives based on using temporal sequence of retweets. Firstly, we aim to infer the structure of influence trees of Twitter cascades, denoting the who-influenced-whom relationship among retweeting users in the cascade, that can play a significant role in identifying critical paths in the network for information dissemination. The constructed trees closely resemble ground truth influence trees of empirical cascades with high retweet count. Secondly, we propose a fast and efficient algorithm for detection of influential users by identifying anchor nodes from temporal retweet sequence. Identification of such a diverse set of influential users enable a faster diffusion of tweets to a large and diverse population, when targeted as seeds thereby maximizing the influence spread, facilitating several applications including viral marketing, disease control and news dissemination.
It is essential to fully understand user intents for the optimization of downstream tasks such as document ranking and query suggestion in web search. As users tend to submit ambiguous queries, numer- ous studies utilize contextual information such as query sequence and user clicks for the auxiliary of user intent modeling. Most of these work adopted Recurrent Neural Network (RNN) based frame- works to encode sequential information within a session, which is hard to realize parallel computation. To this end, we plan to adopt attention-based units to generate context-aware representations for elements in sessions. As intra-session contexts are deficient for handling the data sparsity and cold-start problems in session search, we would also attempt to integrate cross-session dependen- cies by constructing session graphs on the whole corpus to enrich the representation of queries and documents.
As we rapidly continue into the information age, the rate at which data is produced has created an unprecedented demand for novel methods to effectively/efficiently extract insightful patterns. Then, once paired with domain knowledge, we can seek to understand the past, make predictions about the future, and ultimately take actionable steps towards improving our society. Thus, due to the fact that much of today's big data can be represented as graphs, emphasis is being taken to harness the natural structure of data through network analysis. Furthermore, many real-world networks can be better represented as signed networks, e.g., in an online social network such as Facebook, friendships can be represented as positive links while negative links can represent blocked users. Hence, due to signed networks being ubiquitous, in this work we seek to provide a fundamental background into the domain, a hierarchical categorization of existing work highlighting both seminal and state of the art, provide a curated collection of signed network datasets, and discuss important future directions.
One crucial aspect that yet remains fairly unknown while can inform us about the behavior of deep neural networks is their decision boundaries. Trust can be improved once we understand how and why deep models carve out a particular form of decision boundary and thus make particular decisions. Robustness against adversarial examples is directly related to the decision boundary as adversarial examples are basically 'missed out' by the decision boundary between two classes. Investigating the decision boundary of deep neural networks, nevertheless, faces tremendous challenges. First, how we can generate instances near the decision boundary that are similar to real samples? Second, how we can leverage near decision boundary instances to characterize the behaviour of deep neural networks? Motivated to solve these challenges, we focus on investigating the decision boundary of deep neural network classifiers. In particular, we propose a novel approach to generate instances near decision boundary of pre-trained DNNs and then leverage these instances to characterize the behaviour of deep models.
Existing retrieval models have gained much success, however, we have to admit that the models work in a rather different manner than how humans make relevance judgments. To bridge the gap between practical user behavior and retrieval model, it is essential to construct cognitive-oriented retrieval model. The cognitive process in IR includes two important aspects: searching and reading. Searching is the user behaviors interacted with retrieval system such as query formulation and click while reading is the information seeking behavior in a specific landing page or document. We plan to better understand user cognitive behaviors in these two aspects by conducting a lab-based user study. More importantly, the heuristics drawn from cognitive behaviors are then incorporated into retrieval models.
Unexpectedness constitutes an important factor for recommender system to improve user satisfaction and avoid filter bubble issues. In this proposal, we propose to provide unexpected recommendations using the hybrid utility function as a mixture of estimated ratings, unexpectedness, relevance and annoyance. We plan to conduct extensive experiments to validate the superiority of the proposed method.
Studying behavior of systems through networks is important because it allows to understand them and make decisions based on this knowledge. Community detection is one of the tools used in this sense, for detection of groups in graphs. This can be done not only considering connections between nodes, but also including their attributes. Also, objects can be part of different groups with varying degrees, so overlapping fuzzy assignment is relevant in this context. Furthermore, most networks change overtime, so including this aspect also enhance the benefits of using community detection. Hence, in this doctoral thesis we propose to design models for overlapping community detection for static and dynamic networks with node attributes. Firstly, an approach based on a nonnegative matrix factorization generative model that automatically detects the number of communities in the network, is designed. Secondly, tensor factorization is used in order to overcome some of the challenges faced in the first model.
Complex systems in different disciplines are usually modeled as heterogeneous networks. Different from homogeneous networks or attributed networks, heterogeneous networks are associated with complexity in heterogeneous structure or heterogeneous content or both. The abundant information in heterogeneous networks provide opportunities yet pose challenges for researchers and practitioners to develop customized machine learning solutions for solving different problems in complex systems. We are motivated to do significant work for learning from heterogeneous networks. In this paper, we first introduce the motivation and background of this research. Later, we present our current work which include a series of proposed methods and applications. These methods will be introduced in the perspectives of personalization in web-based systems and heterogeneous network embedding. In the end, we raise several research directions as future agenda.