Modern neural language models (LMs) have grabbed much attention in recent years, due in part to their massive sizes and the resources (time, money, data) required to derive them, and in part to their unprecedented performance on language understanding and generation tasks. It is clear that massive LMs are a required component of any future natural language system, are critical to improving existing applications such as search, and enable new applications previously beyond the reach of technology.
Modern LMs also make incredibly silly mistakes, that no five-year-old would ever make. One take on this is that all limitations will fade away as model sizes, training data and training time increase, as they surely will. An alternative take is that this is wishful thinking, and that the models require thoughtful guidance in order for them to approach human-level linguistic performance. This talk discusses this latter perspective.
Web search has transformed how we access all kinds of information and has become a core fabric of everyday life. It is used to find information, buy things, plan travel, understand medical conditions, monitor events, etc. Search in other domains has not received nearly the same attention so our experiences in web search shape our thinking about search more generally even when the scenarios are quite different. This is especially true for email search. Although email was initially designed to facilitate asynchronous communication, it has also become a large repository of personal information. The volume of email continues to grow in both consumer and enterprise settings, and search plays a key role in getting back to needed information. Email search is, however, very different than Web search on many dimensions -- the content being sought is personal and private, metadata such as who sent it or when it was sent is plentiful and important, search intentions are different, people know a lot about what they are looking for, etc. Given these differences, new approaches are required. In this talk I will summarize research we have done using large-scale behavioral logs and complementary qualitative methods to characterize what are people looking for, what they know about what they are looking for, and how this interacts with email management practices. I will then describe several opportunities to help people articulate their information needs and design interfaces and interaction techniques to support this. Finally, I will conclude by pointing to new frontiers in email management and search.
The recent availability of diverse health data resources on large cohorts of human individuals presents many challenges and opportunities. I will present our work aimed at developing machine learning algorithms for predicting future onset of disease and identifying causal drivers of disease based on nationwide electronic health record data as well as data from high-throughput omics profiling technologies such as genetics, microbiome, and metabolomics. Our models provide novel insights into potential drivers of obesity, diabetes, and heart disease, and identify hundreds of novel markers at the microbiome, metabolite, and immune system level. Overall, our predictive models can be translated into personalized disease prevention and treatment plans, and to the development of new therapeutic modalities based on metabolites and the microbiome.
Most work to date on mitigating the COVID-19 pandemic is focused urgently on biomedicine and epidemiology. Yet, pandemic-related policy decisions cannot be made on health information alone. Decisions need to consider the broader impacts on people and their needs. Quantifying human needs across the population is challenging as it requires high geo-temporal granularity, high coverage across the population, and appropriate adjustment for seasonal and other external effects. Here, we propose a computational methodology, building on Maslow's hierarchy of needs, that can capture a holistic view of relative changes in needs following the pandemic through a difference-in-differences approach that corrects for seasonality and volume variations. We apply this approach to characterize changes in human needs across physiological, socioeconomic, and psychological realms in the US, based on more than 35 billion search interactions spanning over 36,000 ZIP codes over a period of 14 months. The analyses reveal that the expression of basic human needs has increased exponentially while higher-level aspirations declined during the pandemic in comparison to the pre-pandemic period. In exploring the timing and variations in statewide policies, we find that the durations of shelter-in-place mandates have influenced social and emotional needs significantly. We demonstrate that potential barriers to addressing critical needs, such as support for unemployment and domestic violence, can be identified through web search interactions. Our approach and results suggest that population-scale monitoring of shifts in human needs can inform policies and recovery efforts for current and anticipated needs.
Political campaigns are increasingly turning to targeted advertising platforms to inform and mobilize potential voters. The appeal of these platforms stems from their promise to empower advertisers to select (or "target") users who see their messages with great precision, including through inferences about those users' interests and political affiliations. However, prior work has shown that the targeting may not work as intended, as platforms' ad delivery algorithms play a crucial role in selecting which subgroups of the targeted users see the ads. In particular, the platforms can selectively deliver ads to subgroups within the target audiences selected by advertisers in ways that can lead to demographic skews along race and gender lines, and do so without the advertiser's knowledge. In this work we demonstrate that ad delivery algorithms used by Facebook, the most advanced targeted advertising platform, shape the political ad delivery in ways that may not be beneficial to the political campaigns and to societal discourse. In particular, the ad delivery algorithms lead to political messages on Facebook being shown predominantly to people who Facebook thinks already agree with the ad campaign's message even if the political advertiser targets an ideologically diverse audience. Furthermore, an advertiser determined to reach ideologically non-aligned users is non-transparently charged a high premium compared to their more aligned competitor, a difference from traditional broadcast media. Our results demonstrate that Facebook exercises control over who sees which political messages beyond the control of those who pay for them or those who are exposed to them. Taken together, our findings suggest that the political discourse's increased reliance on profit-optimized, non-transparent algorithmic systems comes at a cost of diversity of political views that voters are exposed to. Thus, the work raises important questions of fairness and accountability desiderata for ad delivery algorithms applied to political ads.
The rising ubiquity of social media presents a platform for individuals to express suicide ideation, instead of traditional, formal clinical settings. While neural methods for assessing suicide risk on social media have shown promise, a crippling limitation of existing solutions is that they ignore the inherent ordinal nature across fine-grain levels of suicide risk. To this end, we reformulate suicide risk assessment as an Ordinal Regression problem, over the Columbia-Suicide Severity Scale. We propose SISMO, a hierarchical attention model optimized to factor in the graded nature of increasing suicide risk levels, through soft probability distribution since not all wrong risk-levels are equally wrong. We establish the face value of SISMO for preliminary suicide risk assessment on real-world Reddit data annotated by clinical experts. We conclude by discussing the empirical, practical, and ethical considerations pertaining to SISMO in a larger picture, as a human-in-the-loop framework
Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml.
We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update its belief about previously labeled documents, and can cope with the semantic drift problem. Self-Pretraining is iterative and consists of two classifiers. In each iteration, one classifier draws a random set of unlabeled documents and labels them. This set is used to initialize the second classifier, to be further trained by the set of labeled documents. The algorithm proceeds to the next iteration and the classifiers' roles are reversed. To improve the flow of information across the iterations and also to cope with the semantic drift problem, Self-Pretraining employs an iterative distillation process, transfers hypotheses across the iterations, utilizes a two-stage training model, uses an efficient learning rate schedule, and employs a pseudo-label transformation heuristic. We have evaluated our model in three publicly available social media datasets. Our experiments show that Self-Pretraining outperforms the existing state-of-the-art semi-supervised classifiers across multiple settings. Our code is available at https://github.com/p-karisani/self-pretraining .
Extreme multi-label classification (XML) involves tagging a data point with its most relevant subset of labels from an extremely large label set, with several applications such as product-to-product recommendation with millions of products. Although leading XML algorithms scale to millions of labels, they largely ignore label metadata such as textual descriptions of the labels. On the other hand, classical techniques that can utilize label metadata via representation learning using deep networks struggle in extreme settings. This paper develops the DECAF algorithm that addresses these challenges by learning models enriched by label metadata that jointly learn model parameters and feature representations using deep networks and offer accurate classification at the scale of millions of labels. DECAF makes specific contributions to model architecture design, initialization, and training, enabling it to offer up to 2-6% more accurate prediction than leading extreme classifiers on publicly available benchmark product-to-product recommendation datasets, such as LF-AmazonTitles-1.3M. At the same time, DECAF was found to be up to 22x faster at inference than leading deep extreme classifiers, which makes it suitable for real-time applications that require predictions within a few milliseconds. The code for DECAF is available at the following URL: https://github.com/Extreme-classification/DECAF
Product query classification is the basic component for query understanding, which aims to classify the user queries into multiple categories under a predefined product category taxonomy for the E-commerce search engine. It is a challenging task due to the tremendous amount of product categories. And a slight modification to a query will change its corresponding categories entirely, e.g., appending the "button" to the query "shirt". The problem is more severe for the tail queries which lack enough supervision information from customers. Motivated by this phenomenon, this paper proposes to model the contrasting/similar relationships between such similar queries. Our framework is composed of a base model and an across-context attention module. The across-context attention module plays the role of deriving and extracting external information from these variant queries by predicting their categories. We conduct both offline and online experiments on the real-world E-commerce search engine. Experimental results demonstrate the effectiveness of our across-context attention module.
This paper explores and offers guidance on a specific and relevant problem in task design for crowdsourcing: how to formulate a complex question used to classify a set of items. In micro-task markets, classification is still among the most popular tasks. We situate our work in the context of information retrieval and multi-predicate classification, i.e., classifying a set of items based on a set of conditions. Our experiments cover a wide range of tasks and domains, and also consider crowd workers alone and in tandem with machine learning classifiers. We provide empirical evidence into how the resulting classification performance is affected by different predicate formulation strategies, emphasizing the importance of predicate formulation as a task design dimension in crowdsourcing.
Despite deep neural network (DNN)'s impressive prediction performance in various domains, it is well known now that a set of DNN models trained with the same model specification and the exact same training data could produce very different prediction results. People have relied on the state-of-the-art ensemble method to estimate prediction uncertainty. However, ensembles are expensive to train and serve for web-scale traffic systems.
In this paper, we seek to advance the understanding of prediction variation estimated by the ensemble method. Through empirical experiments on two widely used benchmark datasets Movielens and Criteo in recommender systems, we observe that prediction variations come from various randomness sources, including training data shuffling, and random initialization. When we add more randomness sources to ensemble members, we see higher prediction variations among these ensemble members, and more accurate mean prediction. Moreover, we propose to infer prediction variation from neuron activation strength and demonstrate its strong prediction power. Our approach provides a simple way for prediction variation estimation, and opens up new opportunities for future work in many interesting areas (e.g., model-based reinforcement learning) without relying on serving expensive ensemble models.
This paper connects equal opportunity to popularity bias in implicit recommenders to introduce the problem of popularity-opportunity bias. That is, conditioned on user preferences that a user likes both items, the more popular item is more likely to be recommended (or ranked higher) to the user than the less popular one. This type of bias is harmful, exerting negative effects on the engagement of both users and item providers. Thus, we conduct a three-part study: (i) By a comprehensive empirical study, we identify the existence of the popularity-opportunity bias in fundamental matrix factorization models on four datasets; (ii) coupled with this empirical study, our theoretical study shows that matrix factorization models inherently produce the bias; and (iii) we demonstrate the potential of alleviating this bias by both in-processing and post-processing algorithms. Extensive experiments on four datasets show the effective debiasing performance of these proposed methods compared with baselines designed for conventional popularity bias.
Due to the advances in deep learning, visually-aware recommender systems (RS) have recently attracted increased research interest. Such systems combine collaborative signals with images, usually represented as feature vectors outputted by pre-trained image models. Since item catalogs can be huge, recommendation service providers often rely on images that are supplied by the item providers. In this work, we show that relying on such external sources can make an RS vulnerable to attacks, where the goal of the attacker is to unfairly promote certain pushed items. Specifically, we demonstrate how a new visual attack model can effectively influence the item scores and rankings in a black-box approach, i.e., without knowing the parameters of the model. The main underlying idea is to systematically create small human-imperceptible perturbations of the pushed item image and to devise appropriate gradient approximation methods to incrementally raise the pushed item's score. Experimental evaluations on two datasets show that the novel attack model is effective even when the contribution of the visual features to the overall performance of the recommender system is modest.
In a collaborative-filtering recommendation scenario, biases in the data will likely propagate in the learned recommendations. In this paper we focus on the so-called mainstream bias: the tendency of a recommender system to provide better recommendations to users who have a mainstream taste, as opposed to non-mainstream users. We propose NAECF, a conceptually simple but effective idea to address this bias. The idea consists of adding an autoencoder (AE) layer when learning user and item representations with text-based Convolutional Neural Networks. The AEs, one for the users and one for the items, serve as adversaries to the process of minimizing the rating prediction error when learning how to recommend. They enforce that the specific unique properties of all users and items are sufficiently well incorporated and preserved in the learned representations. These representations, extracted as the bottlenecks of the corresponding AEs, are expected to be less biased towards mainstream users, and to provide more balanced recommendation utility across all users. Our experimental results confirm these expectations, significantly improving the recommendations for non-mainstream users while maintaining the recommendation quality for mainstream users. Our results emphasize the importance of deploying extensive content-based features, such as online reviews, in order to better represent users and items to maximize the de-biasing effect.
Users of recommendation systems usually focus on one topic at a time. When finishing reading an item, users may want to access more relevant items related to the last read one as extended reading. However, conventional recommendation systems are hard to provide the continuous extended reading function of these relevant items, since the main recommendation results should be diversified. In this paper, we propose a new task named recommendation suggestion, which aims to (1) predict whether users want extended reading, and (2) provide appropriate relevant items as suggestions. These recommended relevant items are arranged in a relevant box and instantly inserted below the clicked item in the main feed. The challenge of recommendation suggestion on relevant items is that it should further consider semantic relevance and information gain besides CTR-related factors. Moreover, the real-time relevant box insertion may also harm the overall performance when users do not want extended reading. To address these issues, we propose a novel Real-time relevant recommendation suggestion (R3S) framework, which consists of an Item recommender and a Box trigger. We extract features from multiple aspects including feature interaction, semantic similarity and information gain as different experts, and propose a new Multi-critic multi-gate mixture-of-experts (M3oE) strategy to jointly consider different experts with multi-head critics. In experiments, we conduct both offline and online evaluations on a real-world recommendation system with detailed ablation tests. The significant improvements in item/box related metrics verify the effectiveness of R3S. Moreover, we have deployed R3S on WeChat Top Stories, which affects millions of users. The source codes are in https://github.com/modriczhang/R3S.
Reinforcement Learning (RL) techniques have been sought after as the next-generation tools to further advance the field of recommendation research. Different from classic applications of RL, recommender agents, especially those deployed on commercial recommendation platforms, have to operate in extremely large state and action spaces, serving a dynamic user base in the order of billions, and a long-tail item corpus in the order of millions or billions. The (positive) user feedback available to train such agents is extremely scarce in retrospect. Improving the sample efficiency of RL algorithms is thus of paramount importance when developing RL agents for recommender systems. In this work, we present a general framework to augment the training of model-free RL agents with auxiliary tasks for improved sample efficiency. More specifically, we opt to add additional tasks that predict users' immediate responses (positive or negative) toward recommendations, i.e., user response modeling, to enhance the learning of the state and action representations for the recommender agents. We also introduce a tool based on gradient correlation analysis to guide the model design. We showcase the efficacy of our method in offline experiments, learning and evaluating agent policies over hundreds of millions of user trajectories. We also conduct live experiments on an industrial recommendation platform serving billions of users and tens of millions of items to verify its benefit.
Personalized recommender systems rely on knowledge of user preferences to produce recommendations. While those preferences are often obtained from past user interactions with the recommendation catalog, in some situations such observations are insufficient or unavailable. The most widely studied case is with new users, although other similar situations arise where explicit preference elicitation is valuable. At the same time, a seemingly disparate challenge is that there is a well-known popularity bias in many algorithmic approaches to recommender systems. The most common way of addressing this challenge is diversification, which tends to be applied to the output of a recommender algorithm, prior to items being presented to users. We tie these two problems together, showing a tight relationship. Our results show that popularity bias in preference elicitation contributes to popularity bias in recommendation. In particular, most elicitation methods directly optimize only for the relevance of recommendations that would result from collected preferences. This focus on recommendation accuracy biases the preferences collected. We demonstrate how diversification can instead be applied directly at elicitation time. Our model diversifies the preferences elicited using Multi-Armed Bandits, a classical exploration-exploitation framework from reinforcement learning. This leads to a broader understanding of users' preferences, and improved diversity and serendipity of recommendations, without necessitating post-hoc debiasing corrections.
The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a 'polarized' bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a 'polarized' bubble if the expected length of a random walk from it to a page of different opinion is large. The structural bias of a graph is the sum of the radii of highly-polarized bubbles. We study the problem of decreasing the structural bias through edge insertions. 'Healing' all nodes with high polarized bubble radius is hard to approximate within a logarithmic factor, so we focus on finding the best k edges to insert to maximally reduce the structural bias. We present RePBubLik, an algorithm that leverages a variant of the random walk closeness centrality to select the edges to insert. RePBubLik obtains, under mild conditions, a constant-factor approximation. It reduces the structural bias faster than existing edge-recommendation methods, including some designed to reduce the polarization of a graph.
Graph Neural Networks (GNNs) have achieved tremendous success in various real-world applications due to their strong ability in graph representation learning. GNNs explore the graph structure and node features by aggregating and transforming information within node neighborhoods. However, through theoretical and empirical analysis, we reveal that the aggregation process of GNNs tends to destroy node similarity in the original feature space. There are many scenarios where node similarity plays a crucial role. Thus, it has motivated the proposed framework SimP-GCN that can effectively and efficiently preserve node similarity while exploiting graph structure. Specifically, to balance information from graph structure and node features, we propose a feature similarity preserving aggregation which adaptively integrates graph structure and node features. Furthermore, we employ self-supervised learning to explicitly capture the complex feature similarity and dissimilarity relations between nodes. We validate the effectiveness of SimP-GCN on seven benchmark datasets including three assortative and four disassorative graphs. The results demonstrate that SimP-GCN outperforms representative baselines. Further probe shows various advantages of the proposed framework. The implementation of SimP-GCN is available at https://github.com/ChandlerBang/SimP-GCN.
Graph convolutional networks (GCNs), aiming to obtain node embeddings by integrating high-order neighborhood information through stacked graph convolution layers, have demonstrated great power in many network analysis tasks such as node classification and link prediction. However, a fundamental weakness of GCNs, that is, topological limitations, including over-smoothing and local homophily of topology, limits their ability to represent networks. Existing studies for solving these topological limitations typically focus only on the convolution of features on network topology, which inevitably relies heavily on network structure. Moreover, most networks are text-rich, so it is important to integrate not only document-level information, but also the local text information which is particularly significant while often ignored by the existing methods. To solve these limitations, we propose BiTe-GCN, a novel GCN architecture modeling via bidirectional convolution of topology and features on text-rich networks. Specifically, we first transform the original text-rich network into an augmented bi-typed heterogeneous network, capturing both the global document-level information and the local text-sequence information from texts. We then introduce discriminative convolution mechanisms, which performs convolution on this augmented bi-typed network, realizing the convolutions of topology and features altogether in the same system, and learning different contributions of these two parts (i.e., network part and text part), automatically for the given learning objectives. Extensive experiments on text-rich networks demonstrate that our new architecture outperforms the state-of-the-arts by a breakout improvement. Moreover, this architecture can also be applied to several e-commerce search scenes such as JD searching, and experiments on JD dataset show the superiority of the proposed architecture over the related methods.
One fundamental problem in causal inference is to learn the individual treatment effects (ITE) -- assessing the causal effects of a certain treatment (e.g., prescription of medicine) on an important outcome (e.g., cure of a disease) for each data instance, but the effectiveness of most existing methods is often limited due to the existence of hidden confounders. Recent studies have shown that the auxiliary relational information among data can be utilized to mitigate the confounding bias. However, these works assume that the observational data and the relations among them are static, while in reality, both of them will continuously evolve over time and we refer such data as time-evolving networked observational data.
In this paper, we make an initial investigation of ITE estimation on such data. The problem remains difficult due to the following challenges: (1) modeling the evolution patterns of time-evolving networked observational data; (2) controlling the hidden confounders with current data and historical information; (3) alleviating the discrepancy between the control group and the treated group. To tackle these challenges, we propose a novel ITE estimation framework Dynamic Networked Observational Data Deconfounder (\mymodel) which aims to learn representations of hidden confounders over time by leveraging both current networked observational data and historical information. Additionally, a novel adversarial learning based representation balancing method is incorporated toward unbiased ITE estimation. Extensive experiments validate the superiority of our framework when measured against state-of-the-art baselines. The implementation can be accessed in \hrefhttps://github.com/jma712/DNDC https://github.com/jma712/DNDC.
The goal of influence maximization is to select a set of seed users that will optimally diffuse information through a network. In this paper, we study how applying traditional influence maximization algorithms affects the balance between different audience categories (e.g., gender breakdown) who will eventually be exposed to a message. More specifically, we investigate how structural homophily (i.e., the tendency to connect to similar others) and influence diffusion homophily (i.e., the tendency to be influenced by similar others) affect the balance among the activated nodes. We find that even under mild levels of homophily, the balance among the exposed nodes is significantly worse than the balance among the overall population, resulting in a significant disadvantage for one group. To address this challenge, we propose an algorithm that jointly maximizes the influence and balance among nodes while still preserving the attractive theoretical guarantees of the traditional influence maximization algorithms. We run a series of experiments on multiple synthetic and four real-world datasets to demonstrate the effectiveness of the proposed algorithm in improving the balance between different categories of exposed nodes.
Heterogeneous information networks consist of multiple types of edges and nodes, which have a strong ability to represent the rich semantics underpinning network structures. Recently, the dynamics of networks has been studied in many tasks such as social media analysis and recommender systems. However, existing methods mainly focus on the static networks or dynamic homogeneous networks, which are incapable or inefficient in modeling dynamic heterogeneous information networks. In this paper, we propose a method named Dynamic Heterogeneous Information Network Embedding (DyHINE), which can update embeddings when the network evolves. The method contains two key designs: (1) A dynamic time-series embedding module which employs a hierarchical attention mechanism to aggregate neighbor features and temporal random walks to capture dynamic interactions; (2) An online real-time updating module which efficiently updates the computed embeddings via a dynamic operator. Experiments on three real-world datasets demonstrate the effectiveness of our model compared with state-of-the-art methods on the task of temporal link prediction.
A/B tests have been widely adopted across industries as the golden rule that guides decision making. However, the long-term true north metrics we ultimately want to drive through A/B test may take a long time to mature. In these situations, a surrogate metric which predicts the long-term metric is often used instead to conclude whether the treatment is effective. However, because the surrogate rarely predicts the true north perfectly, a regular A/B test based on surrogate metrics tends to have high false positive rate and the treatment variant deemed favorable from the test may not be the winning one. In this paper, we discuss how to adjust the A/B testing comparison to ensure experiment results are trustworthy. We also provide practical guidelines on the choice of good surrogate metrics. To provide a concrete example of how to leverage surrogate metrics for fast decision making, we present a case study on developing and evaluating the predicted confirmed hire surrogate metric in LinkedIn job marketplace.
In many industry settings, online controlled experimentation (A/B test) has been broadly adopted as the gold standard to measure product or feature impacts. Most research has primarily focused on user engagement type metrics, specifically measuring treatment effects at mean (average treatment effects, ATE), and only a few have been focusing on performance metrics (e.g. latency), where treatment effects are measured at quantiles. Measuring quantile treatment effects (QTE) is challenging due to the myriad difficulties such as dependency introduced by clustered samples, scalability issues, density bandwidth choices, etc. In addition, previous literature has mainly focused on QTE at some pre-defined locations, such as P50 or P90, which doesn't always convey the full picture. In this paper, we propose a novel scalable non-parametric solution, which can provide a continuous range of QTE with point-wise confidence intervals while circumventing the density estimation altogether. Numerical results show high consistency with traditional methods utilizing asymptotic normality. An end-to-end pipeline has been implemented at Snap Inc., providing daily insights on key performance metrics at a distributional level.
It is often critical for prediction models to be robust to distributional shifts between training and testing data. From a causal perspective, the challenge is to distinguish the stable causal relationships from the unstable spurious correlations across shifts. We describe a causal transfer random forest (CTRF) that combines existing training data with a small amount of data from a randomized experiment to train a model which is robust to the feature shifts and therefore transfers to a new targeting distribution. Theoretically, we justify the robustness of the approach against feature shifts with the knowledge from causal learning. Empirically, we evaluate the CTRF using both synthetic data experiments and real-world experiments in the Bing Ads platform, including a click prediction task and in the context of an end-to-end counterfactual optimization system. The proposed CTRF produces robust predictions and outperforms most baseline methods compared in the presence of feature shifts.
E-commerce business is revolutionizing our shopping experiences by providing convenient and straightforward services. One of the most fundamental problems is how to balance the demand and supply in market segments to build an efficient platform. While conventional machine learning models have achieved great success on data-sufficient segments, it may fail in a large-portion of segments in E-commerce platforms, where there are not sufficient records to learn well-trained models. In this paper, we tackle this problem in the context of market segment demand prediction. The goal is to facilitate the learning process in the target segments by leveraging the learned knowledge from data-sufficient source segments. Specifically, we propose a novel algorithm, RMLDP, to incorporate a multi-pattern fusion network (MPFN) with a meta-learning paradigm. The multi-pattern fusion network considers both local and seasonal temporal patterns for segment demand prediction. In the meta-learning paradigm, transferable knowledge is regarded as the model parameter initialization of MPFN, which are learned from diverse source segments. Furthermore, we capture the segment relations by combining data-driven segment representation and segment knowledge graph representation and tailor the segment-specific relations to customize transferable model parameter initialization. Thus, even with limited data, the target segment can quickly find the most relevant transferred knowledge and adapt to the optimal parameters. We conduct extensive experiments on two large-scale industrial datasets. The results justify that our RMLDP outperforms a set of state-of-the-art baselines. Besides, RMLDP has been deployed in Taobao, a real-world E-commerce platform. The online A/B testing results further demonstrate the practicality of RMLDP.
Consumer lending service is escalating in E-Commerce platforms due to its capability in enhancing buyers' purchasing power, improving average order value, and increasing revenue of the platforms. Credit risk forecasting and credit limits setting are two fundamental problems in E-Commerce/online consumer lending services. Currently, the majority of institutes rely on two-separate-step methods to resolve. First, build a rating model to evaluate credit risk, and then design heuristic strategies to set credit limits, which requires a large amount of prior knowledge and lacks theoretical justifications. In this paper, we propose an end-to-end multi-view and multi-task learning based approach named MvMoE (Multi-view-aware Mixture-of-Experts network) to solve these two problems simultaneously. First, a multi-view network with a hierarchical attention mechanism is constructed to distill users' heterogeneous financial information into shared hidden representations. Then, we jointly train these two tasks with a view-aware multi-gate mixture-of-experts network and a subsequent progressive network to improve their performances. With the real-world dataset contained 5.44 million users, we investigate the effectiveness of MvMoE. Experimental results exhibit that the proposed model is able to improve AP over 5.60% on credit risk forecasting and MAE over 9.52% on credit limits setting compared with conventional methods. Meanwhile, MvMoE has good interpretability, which better underpins the imperative demands in financial industries.
Algorithmic recommendations shape music consumption at scale, and understanding the impact of various algorithmic models on how content is consumed is a central question for music streaming platforms. The ability to shift consumption towards less popular content and towards content different from user's typical historic tastes not only affords the platform ways of handling issues such as filter bubbles and popularity bias, but also contributes to maintaining a healthy and sustainable consumption patterns necessary for overall platform success.
In this work, we view diversity as an enabler for shifting consumption and consider two notions of music diversity, based on taste similarity and popularity, and investigate how four different recommendation approaches optimized for user satisfaction, fare on diversity metrics. To investigate how the ranker complexity influences diversity, we use two well-known rankers and propose two new models of increased complexity: a feedback aware neural ranker and a reinforcement learning (RL) based ranker. We demonstrate that our models lead to gains in satisfaction, but at the cost of diversity. Such trade-off between model complexity and diversity necessitates the need for explicitly encoding diversity in the modeling process, for which we consider four types of approaches: interleaving based, submodularity based, interpolation, and RL reward modeling based. We find that our reward modeling based RL approach achieves the best trade-off between optimizing the satisfaction metric and surfacing diverse content, thereby enabling consumption shifting at scale. Our findings have implications for the design and deployment of practical approaches for music diversification, which we discuss at length.
Spectral clustering is widely used in modern data analysis. Spectral clustering methods speed up the computation and keep useful information by reducing dimensionality. Recently, graph signal filtering (GSF) has been introduced to further speed up the dimensionality reduction process by avoiding solving eigenvectors.
In this work, we first prove that the non-ideal filter not only affects the accuracy of GSF, but also the calculation speed. Then we further propose a modified Kernel Polynomial Method (KPM) to help GSF set the filter properly and effectively. We combine the main steps of KPM and GSF, and propose a novel spectral clustering method: Chebyshev Accelerated Spectral Clustering (CASC). In CASC, we take advantages of the excellent properties of Chebyshev polynomials: Compared with other spectral clustering methods using GSF, CASC spends negligible time on estimating the eigenvalues, and achieves the same accuracy with less computation. The experiments on artificial and real-world data prove that CASC is accurate and fast.
Product embeddings have been heavily investigated in the past few years, serving as the cornerstone for a broad range of machine learning applications in e-commerce. Despite the empirical success of product embeddings, little is known on how and why they work from the theoretical standpoint. Analogous results from the natural language processing (NLP) often rely on domain-specific properties that are not transferable to the e-commerce setting, and the downstream tasks often focus on different aspects of the embeddings. We take an e-commerce-oriented view of the product embeddings and reveal a complete theoretical view from both the representation learning and the learning theory perspective. We prove that product embeddings trained by the widely-adopted skip-gram negative sampling algorithm and its variants are sufficient dimension reduction regarding a critical product relatedness measure. The generalization performance in the downstream machine learning task is controlled by the alignment between the embeddings and the product relatedness measure. Following the theoretical discoveries, we conduct exploratory experiments that supports our theoretical insights for the product embeddings.
Cold-start problem is a fundamental challenge for recommendation tasks. Despite the recent advances on Graph Neural Networks (GNNs) incorporate the high-order collaborative signal to alleviate the problem, the embeddings of the cold-start users and items aren't explicitly optimized, and the cold-start neighbors are not dealt with during the graph convolution in GNNs. This paper proposes to pre-train a GNN model before applying it for recommendation. Unlike the goal of recommendation, the pre-training GNN simulates the cold-start scenarios from the users/items with sufficient interactions and takes the embedding reconstruction as the pretext task, such that it can directly improve the embedding quality and can be easily adapted to the new cold-start users/items. To further reduce the impact from the cold-start neighbors, we incorporate a self-attention-based meta aggregator to enhance the aggregation ability of each graph convolution step, and an adaptive neighbor sampler to select the effective neighbors according to the feedbacks from the pre-training GNN model. Experiments on three public recommendation datasets show the superiority of our pre-training GNN model against the original GNN models on user/item embedding inference and the recommendation task.
There are many scenarios where short- and long-term causal effects of an intervention are different. For example, low-quality ads may increase short-term ad clicks but decrease the long-term revenue via reduced clicks. This work, therefore, studies the the problem of long-term effect where the outcome of primary interest, orprimary outcome, takes months or even years to accumulate. The observational study of long-term effect presents unique challenges. First, the confounding bias causes large estimation error and variance, which can further accumulate towards the prediction of primary outcomes. Second, short-term outcomes are often directly used as the proxy of the primary outcome, i.e., thesurrogate. Nevertheless, this method entails the strong surrogacy assumption that is often impractical. To tackle these challenges, we propose to build connections between long-term causal inference and sequential models in machine learning. This enables us to learnsurrogate representations that account for thetemporal unconfoundedness and circumvent the stringent surrogacy assumption by conditioning on the inferred time-varying confounders. Experimental results show that the proposed framework outperforms the state-of-the-art.
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to predict the pairwise preference between the two word sets, jointly with the Masked Language Model (MLM) objective. By further fine-tuning on a variety of representative downstream ad-hoc retrieval tasks, PROP achieves significant improvements over baselines without pre-training or with other pre-training methods. We also show that PROP can achieve exciting performance under both the zero- and low-resource IR settings.
Preference data is a form of dyadic data, with measurements associated with pairs of elements arising from two discrete sets of objects. These are users and items, as well as their interactions, e.g., ratings. We are interested in learning representations for both sets of objects, i.e., users and items, to predict unknown pairwise interactions. Motivated by the recent successes of deep latent variable models, we propose Bilateral Variational Autoencoder (BiVAE), which arises from a combination of a generative model of dyadic data with two inference models, user- and item-based, parameterized by neural networks. Interestingly, our model can take the form of a Bayesian variational autoencoder either on the user or item side. As opposed to the vanilla VAE model, BiVAE is "bilateral'', in that users and items are treated similarly, making it more apt for two-way or dyadic data. While theoretically sound, we formally show that, similarly to VAE, our model might suffer from an over-regularized latent space. This issue, known as posterior collapse in the VAE literature, may appear due to assuming an over-simplified prior (isotropic Gaussian) over the latent space. Hence, we further propose a mitigation of this issue by introducing constrained adaptive prior (CAP) for learning user- and item-dependent prior distributions. Empirical results on several real-world datasets show that the proposed model outperforms conventional VAE and other comparative collaborative filtering models in terms of item recommendation. Moreover, the proposed CAP further boosts the performance of BiVAE. An implementation of BiVAE is available on Cornac recommender library.
Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Its prevalence on the web, however, is still not well understood - if we run GPT-2 detectors across the web, what will we find? Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality content without any training. This enables fast bootstrapping of quality indicators in a low-resource setting. Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.
Product reviews play a key role in e-commerce platforms. Studies show that many users read product reviews before purchase and trust them as much as personal recommendations. However, in many cases, the number of reviews per product is large and finding useful information becomes a challenging task. A few websites have recently added an option to post tips - short, concise, practical, and self-contained pieces of advice about products. These tips are complementary to the reviews and usually add a new non-trivial insight about the product, beyond its title, attributes, and description. Yet, most if not all major e-commerce platforms lack the notion of a tip as a first class citizen and customers typically express their advice through other means, such as reviews. In this work, we propose an extractive method for tip generation from product reviews. We focus on five popular e-commerce domains whose reviews tend to contain useful non-trivial tips that are beneficial for potential customers. We formally define the task of tip extraction in e-commerce by providing the list of tip types, tip timing (before and/or after the purchase), and connection to the surrounding context sentences. To extract the tips, we propose a supervised approach and provide a labeled dataset, annotated by human editors, over 14,000 product reviews using a dedicated tool. To demonstrate the potential of our approach, we compare different tip generation methods and evaluate them both manually and over the labeled set. Our approach demonstrates especially high performance for popular products in the Baby, Home Improvement and Sports & Outdoors domains, with precision of over 95% for the top 3 tips per product.
The transparency issue of sponsorship disclosure in advertising posts has become a significant problem in influencer marketing. Although influencers are urged to comply with the regulations governing sponsorship disclosure, a considerable number of influencers fail to disclose sponsorship properly in paid advertisements. In this paper, we propose a learning-to-rank based model, Sponsored Post Detector (SPoD), to detect undisclosed sponsorship of social media posts by learning various aspects of the posts such as text, image, and the social relationship among influencers and brands. More precisely, we exploit image objects and contextualized information to obtain the representations of the posts and also utilize Graph Convolutional Networks (GCNs) on a network which consists of influencers, brands, and posts with embed social media attributes. We further optimize the model by conducting manifold regularization based on temporal information and mentioned brands in posts. The extensive studies and experiments are conducted on sampled real-world Instagram datasets containing 1,601,074 posts, which mention 26,910 brands, published over 6 years by 38,113 influencers. Our experimental results demonstrate that SPoD significantly outperforms the existing baseline methods in discovering sponsored posts on social media.
We present Quotebank, an open corpus of 178 million quotations attributed to the speakers who uttered them, extracted from 162 million English news articles published between 2008 and 2020. In order to produce this Web-scale corpus, while at the same time benefiting from the performance of modern neural models, we introduce Quobert, a minimally supervised framework for extracting and attributing quotations from massive corpora. Quobert avoids the necessity of manually labeled input and instead exploits the redundancy of the corpus by bootstrapping from a single seed pattern to extract training data for fine-tuning a BERT-based model. Quobert is language- and corpus agnostic and correctly attributes 86.9% of quotations in our experiments. Quotebank and Quobert are publicly available at https://doi.org/10.5281/zenodo.4277311.
In e-commerce, opinion tags refer to a ranked list of tags provided by the e-commerce platform that reflect characteristics of reviews of an item. To assist consumers to quickly grasp a large number of reviews about an item, opinion tags are increasingly being applied by e-commerce platforms. Current mechanisms for generating opinion tags rely on either manual labelling or heuristic methods, which is time-consuming and ineffective. In this paper, we propose the abstractive opinion tagging task, where systems have to automatically generate a ranked list of opinion tags that are based on, but need not occur in, a given set of user-generated reviews. The abstractive opinion tagging task comes with three main challenges: the noisy nature of reviews; the formal nature of opinion tags vs. the colloquial language usage in reviews; and the need to distinguish between different items with very similar aspects. To address these challenges, we propose an abstractive opinion tagging framework, named AOT-Net, to generate a ranked list of opinion tags given a large number of reviews. First, a sentence-level salience estimation component estimates each review's salience score. Next, a review clustering and ranking component ranks reviews in two steps: first, reviews are grouped into clusters and ranked by cluster size; then, reviews within each cluster are ranked by their distance to the cluster center. Finally, given the ranked reviews, a rank-aware opinion tagging component incorporates an alignment feature and alignment loss to generate a ranked list of opinion tags. To facilitate the study of this task, we create and release a large-scale dataset, called eComTag, crawled from real-world e-commerce websites. Extensive experiments conducted on the eComTag dataset verify the effectiveness of the proposed AOT-Net in terms of various evaluation metrics.
Trends are those keywords, phrases, or names that are mentioned most often on social media or in news in a particular timeframe.They are an effective way for human news readers to both discover and stay focused on the most relevant information of the day. In this work, we consider trends that correspond to an entity in a knowledge graph and introduce the new and as-yet unexplored task of identifying other entities that may help explain the "why" an entity is trending. We refer to these retrieved entities as contextual entities. Some of them are more important than others in the context of the trending entity and we thus determine a ranking of entities according to how useful they are in contextualizing the trend.We propose two solutions for ranking contextual entities. The first one is fully unsupervised and based on Personalized PageRank, calculated over a trending entity-specific graph of other entities where the edges encode a notion of directional similarity based on embedded background knowledge. Our second method is based on learning to rank and combines the intuitions behind the unsupervised model with signals derived from hand-crafted features in a supervised setting. We compare our models on this novel task by using a new, purpose-built test collection created using crowdsourcing. Our methods improve over the strongest baseline in terms ofPrecision at 1 by 7% (unsupervised) and 13% (supervised). We find that the salience of a contextual entity and how coherent it is with respect to the news story are strong indicators of relevance in both unsupervised and supervised settings.
Conversational question answering (QA) requires the ability to correctly interpret a question in the context of previous conversation turns. We address the conversational QA task by decomposing it into question rewriting and question answering subtasks. The question rewriting (QR) subtask is specifically designed to reformulate ambiguous questions, which depend on the conversational context, into unambiguous questions that can be correctly interpreted outside of the conversational context. We introduce a conversational QA architecture that sets the new state of the art on the TREC CAsT 2019 passage retrieval dataset. Moreover, we show that the same QR model improves QA performance on the QuAC dataset with respect to answer span extraction, which is the next step in QA after passage retrieval. Our evaluation results indicate that the QR model we proposed achieves near human-level performance on both datasets and the gap in performance on the end-to-end conversational QA task is attributed mostly to the errors in QA.
This paper concerns user preference estimation in multi-round conversational recommender systems (CRS), which interacts with users by asking questions about attributes and recommending items multiple times in one conversation. Multi-round CRS such as EAR have been proposed in which the user's online feedback at both attribute level and item level can be utilized to estimate user preference and make recommendations. Though preliminary success has been shown, existing user preference models in CRS usually use the online feedback information as independent features or training instances, overlooking the relation between attribute-level and item-level feedback signals. The relation can be used to more precisely identify the reasons (e.g., some certain attributes) that trigger the rejection of an item, leading to more fine-grained utilization of the feedback information. To address aforementioned issue, this paper proposes a novel preference estimation model tailored for multi-round CRS, called Feedback-guided Preference Adaptation Network (FPAN). In FPAN, two gating modules are designed to respectively adapt the original user embedding and item-level feedback, both according to the online attribute-level feedback. The gating modules utilize the fine-grained attribute-level feedback to revise the user embedding and coarse-grained item-level feedback, achieving more accurate user preference estimation by considering the relation between feedback. Experimental results on two benchmarks showed that FPAN outperformed the state-of-the-art user preference models in CRS, and the multi-round CRS can also be enhanced by using FPAN as its recommender component.
The ubiquity of implicit feedback makes them the default choice to build online recommender systems. While the large volume of implicit feedback alleviates the data sparsity issue, the downside is that they are not as clean in reflecting the actual satisfaction of users. For example, in E-commerce, a large portion of clicks do not translate to purchases, and many purchases end up with negative reviews. As such, it is of critical importance to account for the inevitable noises in implicit feedback for recommender training. However, little work on recommendation has taken the noisy nature of implicit feedback into consideration.
In this work, we explore the central theme of denoising implicit feedback for recommender training. We find serious negative impacts of noisy implicit feedback, i.e., fitting the noisy data hinders the recommender from learning the actual user preference. Our target is to identify and prune the noisy interactions, so as to improve the efficacy of recommender training. By observing the process of normal recommender training, we find that noisy feedback typically has large loss values in the early stages. Inspired by this observation, we propose a new training strategy named Adaptive Denoising Training (ADT), which adaptively prunes noisy interactions during training. Specifically, we devise two paradigms for adaptive loss formulation: Truncated Loss that discards the large-loss samples with a dynamic threshold in each iteration; and Reweighted Loss that adaptively lowers the weights of large-loss samples. We instantiate the two paradigms on the widely used binary cross-entropy loss and test the proposed ADT strategies on three representative recommenders. Extensive experiments on three benchmarks demonstrate that ADT significantly improves the quality of recommendation over normal training.
Next destination recommendation is an important task in the transportation domain of taxi and ride-hailing services, where users are recommended with personalized destinations given their current origin location. However, recent recommendation works do not satisfy this origin-awareness property, and only consider learning from historical destination locations, without origin information. Thus, the resulting approaches are unable to learn and predict origin-aware recommendations based on the user's current location, leading to sub-optimal performance and poor real-world practicality. Hence, in this work, we study the origin-aware next destination recommendation task. We propose the Spatial-Temporal Origin-Destination Personalized Preference Attention (STOD-PPA) encoder-decoder model to learn origin-origin (OO), destination-destination (DD), and origin-destination (OD) relationships by first encoding both origin and destination sequences with spatial and temporal factors in local and global views, then decoding them through personalized preference attention to predict the next destination. Experimental results on seven real-world user trajectory taxi datasets show that our model significantly outperforms baseline and state-of-the-art methods.
A significant number of event-related queries are issued in Web search. In this paper, we seek to improve retrieval performance by leveraging events and specifically target the classic task of query expansion. We propose a method to expand an event-related query by first detecting the events related to it. Then, we derive the candidates for expansion as terms semantically related to both the query and the events. To identify the candidates, we utilize a novel mechanism to simultaneously embed words and events in the same vector space. We show that our proposed method of leveraging events improves query expansion performance significantly compared with state-of-the-art methods on various newswire TREC datasets.
In many applications of session-based recommendation, social networks are usually available. Since users' interests are influenced by their friends, recommender systems can leverage social networks to better understand their users' preferences and thus provide more accurate recommendations. However, existing methods for session-based social recommendation are not efficient. To predict the next item of a user's ongoing session, the methods need to process many additional sessions of the user's friends to capture social influences, while non-social-aware methods (i.e., those without using social networks) only need to process one single session. To solve the efficiency issue, we propose an efficient framework for session-based social recommendation. In the framework, first, a heterogeneous graph neural network is used to learn user and item representations that integrate the knowledge from social networks. Then, to generate predictions, only the user and item representations relevant to the current session are passed to a non-social-aware model. During inference, since the user and item representations can be precomputed, the overall model runs as fast as the original non-social-aware model, while it can achieve better performance by leveraging the knowledge from social networks. Apart from being efficient, our framework has two additional advantages. First, the framework is flexible because it is compatible with any existing non-social-aware models and can easily incorporate more knowledge other than social networks. Second, our framework can capture cross-session item transitions while existing methods can only capture intra-session item transitions. Extensive experiments conducted on three public datasets demonstrate the effectiveness and efficiency of the proposed framework. Our code is available at https://github.com/twchen/SEFrame.
For many kinds of interventions, such as a new advertisement, marketing intervention, or feature recommendation, it is important to target a specific subset of people for maximizing its benefits at minimum cost or potential harm. However, a key challenge is that no data is available about the effect of such a prospective intervention since it has not been deployed yet. In this work, we propose a split-treatment analysis that ranks the individuals most likely to be positively affected by a prospective intervention using past observational data. Unlike standard causal inference methods, the split-treatment method does not need any observations of the target treatments themselves. Instead it relies on observations of a proxy treatment that is caused by the target treatment. Under reasonable assumptions, we show that the ranking of heterogeneous causal effect based on the proxy treatment is the same as the ranking based on the target treatment's effect. In the absence of any interventional data for cross-validation, Split-Treatment uses sensitivity analyses for unobserved confounding to eliminate unreliable models. We apply Split-Treatment to simulated data and a large-scale, real-world targeting task and validate our discovered rankings via a randomized experiment for the latter.
A desirable property of learning systems is to be both effective and interpretable. Towards this goal, recent models have been proposed that first generate an extractive explanation from the input text and then generate a prediction on just the explanation called explain-then-predict models. These models primarily consider the task input as a supervision signal in learning an extractive explanation and do not effectively integrate rationales data as an additional inductive bias to improve task performance. We propose a novel yet simple approach ExPred, which uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses. Next, we use another prediction network on just the extracted explanations for optimizing the task performance. We conduct an extensive evaluation of our approach on three diverse language datasets -- sentiment classification, fact-checking, and question answering -- and find that we substantially outperform existing approaches.
Recommendation datasets are prone to selection biases due to self-selection behavior of users and item selection process of systems. This makes explicitly combating selection biases an essential problem in training recommender systems. Most previous studies assume no unbiased data available for training. We relax this assumption and assume that a small subset of training data is unbiased. Then, we propose a novel objective that utilizes the unbiased data to adaptively assign propensity weights to biased training ratings. This objective, combined with unbiased performance estimators, alleviates the effects of selection biases on the training of recommender systems. To optimize the objective, we propose an efficient algorithm that minimizes the variance of propensity estimates for better generalized recommender systems. Extensive experiments on two real-world datasets confirm the advantages of our approach in significantly reducing both the error of rating prediction and the variance of propensity estimation.
How can we build recommender systems to take into account fairness? Real-world recommender systems are often composed of multiple models, built by multiple teams. However, most research on fairness focuses on improving fairness in a single model. Further, recent research on classification fairness has shown that combining multiple "fair" classifiers can still result in an "unfair" classification system. This presents a significant challenge: how do we understand and improve fairness in recommender systems composed of multiple components?
In this paper, we study the compositionality of recommender fairness. We consider two recently proposed fairness ranking metrics: equality of exposure and pairwise ranking accuracy. While we show that fairness in recommendation is not guaranteed to compose, we provide theory for a set of conditions under which fairness of individual models does compose. We then present an analytical framework for both understanding whether a real system's signals can achieve compositional fairness, and improving which component would have the greatest impact on the fairness of the overall system. In addition to the theoretical results, we find on multiple datasets---including a large-scale real-world recommender system---that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components.
As Recommender Systems (RS) influence more and more people in their daily life, the issue of fairness in recommendation is becoming more and more important. Most of the prior approaches to fairness-aware recommendation have been situated in a static or one-shot setting, where the protected groups of items are fixed, and the model provides a one-time fairness solution based on fairness-constrained optimization. This fails to consider the dynamic nature of the recommender systems, where attributes such as item popularity may change over time due to the recommendation policy and user engagement. For example, products that were once popular may become no longer popular, and vice versa. As a result, the system that aims to maintain long-term fairness on the item exposure in different popularity groups must accommodate this change in a timely fashion.
Novel to this work, we explore the problem of long-term fairness in recommendation and accomplish the problem through dynamic fairness learning. We focus on the fairness of exposure of items in different groups, while the division of the groups is based on item popularity, which dynamically changes over time in the recommendation process. We tackle this problem by proposing a fairness-constrained reinforcement learning algorithm for recommendation, which models the recommendation problem as a Constrained Markov Decision Process (CMDP), so that the model can dynamically adjust its recommendation policy to make sure the fairness requirement is always satisfied when the environment changes. Experiments on several real-world datasets verify our framework's superiority in terms of recommendation performance, short-term fairness, and long-term fairness.
We consider the problem of utility maximization in online ranking applications while also satisfying a pre-defined fairness constraint. We consider batches of items which arrive over time, already ranked using an existing ranking model. We propose online post-processing for re-ranking these batches to enforce adherence to the pre-defined fairness constraint, while maximizing a specific notion of utility. To achieve this goal, we propose two deterministic re-ranking policies. In addition, we learn a re-ranking policy based on a novel variation of learning to search. Extensive experiments on real world and synthetic datasets demonstrate the effectiveness of our proposed policies both in terms of adherence to the fairness constraint and utility maximization. Furthermore, our analysis shows that the performance of the proposed policies depends on the original data distribution w.r.t the fairness constraint and the notion of utility.
Optimizing ranking systems based on user interactions is a well-studied problem. State-of-the-art methods for optimizing ranking systems based on user interactions are divided into online approaches - that learn by directly interacting with users - and counterfactual approaches - that learn from historical interactions. Existing online methods are hindered without online interventions and thus should not be applied counterfactually. Conversely, counterfactual methods cannot directly benefit from online interventions.
We propose a novel intervention-aware estimator for both counterfactual and online Learning to Rank (LTR). With the introduction of the intervention-aware estimator, we aim to bridge the online/counterfactual LTR division as it is shown to be highly effective in both online and counterfactual scenarios. The estimator corrects for the effect of position bias, trust bias, and item-selection bias by using corrections based on the behavior of the logging policy and on online interventions: changes to the logging policy made during the gathering of click data. Our experimental results, conducted in a semi-synthetic experimental setup, show that, unlike existing counterfactual LTR methods, the intervention-aware estimator can greatly benefit from online interventions.
In machine learning and statistics, bias and variance of supervised learning models are well studied concepts. In this work, we try to better understand how these concepts apply in the context of ranking of items (e.g., for web-search or recommender systems). We define notions of bias and variance directly on pairwise ordering of items. We show that ranking disagreements between true orderings and a ranking function can be decomposed into bias and variance components akin to the similar decomposition for the squared loss and other losses that have been previously studied. The popular ranking metric, the area under the curve (AUC), is shown to admit a similar decomposition. We demonstrate the utility of the framework in understanding differences between models. Further, directly looking at the decomposition of the ranking loss rather than the loss used for model fitting can reveal surprising insights. Specifically, we show examples where model training achieves low variance in the traditional sense, yet results in high variance (and high error) on the ranking task.
Recent advances in unbiased learning to rank (LTR) count on Inverse Propensity Scoring (IPS) to eliminate bias in implicit feedback. Though theoretically sound in correcting the bias introduced by treating clicked documents as relevant, IPS ignores the bias caused by (implicitly) treating non-clicked ones as irrelevant. In this work, we first rigorously prove that such use of click data leads to unnecessary pairwise comparisons between relevant documents, which prevent unbiased ranker optimization.
Based on the proof, we derive a simple yet well justified new weighting scheme, called Propensity Ratio Scoring (PRS), which provides treatments on both clicks and non-clicks. Besides correcting the bias in clicks, PRS avoids relevant-relevant document comparisons in LTR training and enjoys a lower variability. Our extensive empirical evaluations confirm that PRS ensures a more effective use of click data and improved performance in both synthetic data from a set of LTR benchmarks, as well as in the real-world large-scale data from GMail search.
In feeds recommendation, users are able to constantly browse items generated by never-ending feeds using mobile phones. The implicit feedback from users is an important resource for learning to rank, however, building ranking functions from such observed data is recognized to be biased. The presentation of the items will influence the user's judgements and therefore introduces biases. Most previous works in the unbiased learning to rank literature focus on position bias (i.e., an item ranked higher has more chances of being examined and interacted with). By analyzing user behaviors in product feeds recommendation, in this paper, we identify and introduce context bias, which refers to the probability that a user interacting with an item is biased by its surroundings, to unbiased learning to rank. We propose an Unbiased Learning to Rank with Combinational Propensity (ULTR-CP) framework to remove the inherent biases jointly caused by multiple factors. Under this framework, a context-aware position bias model is instantiated to estimate the unified bias considering both position and context biases. In addition to evaluating propensity score estimation approaches by the ranking metrics, we also discuss the evaluation of the propensities directly by checking their balancing properties. Extensive experiments performed on a real e-commerce data set collected from JD.com verify the effectiveness of context bias and illustrate the superiority of ULTR-CP against the state-of-the-art methods.
Interpretability of ranking models is a crucial yet relatively under-examined research area. Recent progress on this area largely focuses on generating post-hoc explanations for existing black-box ranking models. Though promising, such post-hoc methods cannot provide sufficiently accurate explanations in general, which makes them infeasible in many high-stakes scenarios, especially the ones with legal or policy constraints. Thus, building an intrinsically interpretable ranking model with transparent, self-explainable structure becomes necessary, but this remains less explored in the learning-to-rank setting.
In this paper, we lay the groundwork for intrinsically interpretable learning-to-rank by introducing generalized additive models (GAMs) into ranking tasks. Generalized additive models (GAMs) are intrinsically interpretable machine learning models and have been extensively studied on regression and classification tasks. We study how to extend GAMs into ranking models which can handle both item-level and list-level features and propose a novel formulation of ranking GAMs. To instantiate ranking GAMs, we employ neural networks instead of traditional splines or regression trees. We also show that our neural ranking GAMs can be distilled into a set of simple and compact piece-wise linear functions that are much more efficient to evaluate with little accuracy loss. We conduct experiments on three data sets and show that our proposed neural ranking GAMs can outperform other traditional GAM baselines while maintaining similar interpretability.
Cloud-based file storage platforms such as Google Drive are widely used as a means for storing, editing and sharing personal and organizational documents. In this paper, we improve search ranking quality for cloud storage platforms by utilizing user activity logs. Different from search logs, activity logs capture general document usage activity beyond search, such as opening, editing and sharing documents. We propose to automatically learn text embeddings that are effective for search ranking from activity logs. We develop a novel co-access signal, i.e., whether two documents were accessed by a user around the same time, to train deep semantic matching models that are useful for improving the search ranking quality. We confirm that activity-trained semantic matching models can improve ranking by conducting extensive offline experimentation using Google Drive search and activity logs. To the best of our knowledge, this is the first work to examine the benefits of leveraging document usage activity at large scale for cloud storage search; as such it can shed light on using such activity in scenarios where direct collection of search-specific interactions (e.g., query and click logs) may be expensive or infeasible.
Knowledge tracing (KT) aims to model students' knowledge level based on their historical performance, which plays an important role in computer-assisted education and adaptive learning. Recent studies try to take temporal effects of past interactions into consideration, such as the forgetting behavior. However, existing work mainly relies on time-related features or a global decay function to model the time-sensitive effects. Fine-grained temporal dynamics of different cross-skill impacts have not been well studied (named as temporal cross-effects). For example, cross-effects on some difficult skills may drop quickly, and the effects caused by distinct previous interactions may also have different temporal evolutions, which cannot be captured in a global way. In this work, we investigate fine-grained temporal cross-effects between different skills in KT. We first validate the existence of temporal cross-effects in real-world datasets through empirical studies. Then, a novel model, HawkesKT, is proposed to explicitly model the temporal cross-effects inspired by the point process, where each previous interaction will have different time-sensitive impacts on the mastery of the target skill. HawkesKT adopts two components to model temporal cross-effects: 1) mutual excitation represents the degree of cross-effects and 2) kernel function controls the adaptive temporal evolution. To the best of our knowledge, we are the first to introduce Hawkes process to model temporal cross-effects in KT. Extensive experiments on three benchmark datasets show that HawkesKT is superior to state-of-the-art KT methods. Remarkably, our method also exhibits excellent interpretability and shows significant advantages in training efficiency, which makes it more applicable in real-world large-scale educational settings.
In recent years, a plethora of fact checking and fact verification techniques have been developed to detect the veracity or factuality of online information text for various applications. However, limited efforts have been undertaken to understand the interpretability of such veracity detection, i.e. explaining why a particular piece of text is factually correct or incorrect. In this work, we seek to bridge this gap by proposing a technique, FACE-KEG, to automatically perform explainable fact checking. Given an input fact or claim, our proposed model constructs a relevant knowledge graph for it from a large-scale structured knowledge base. This graph is encoded via a novel graph transforming encoder. Our model also simultaneously retrieves and encodes relevant textual context about the input text from the knowledge base. FACE-KEG then jointly exploits both the concept-relationship structure of the knowledge graph as well as semantic contextual cues in order to (i) detect the veracity of an input fact, and (ii) generate a human-comprehensible natural language explanation justifying the fact's veracity. We conduct extensive experiments on three large-scale datasets, and demonstrate the effectiveness of FACE-KEG while performing fact checking. Automatic and human evaluations further show that FACE-KEG significantly outperforms competitive baselines in learning concise, coherent and informative explanations for the input facts.
Representation learning for temporal knowledge graphs has attracted increasing attention in recent years. In this paper, we study the problem of learning dynamic embeddings for temporal knowledge graphs. We address this problem by proposing a Dynamic Bayesian Knowledge Graphs Embedding model (DBKGE), which is able to dynamically track the semantic representations of entities over time in a joint metric space and make predictions for the future. Unlike other temporal knowledge graph embedding methods, DBKGE is a novel probabilistic representation learning method that aims at inferring dynamic embeddings of entities in a streaming scenario. To obtain high-quality embeddings and model their uncertainty, our DBKGE embeds entities with means and variances of Gaussian distributions. Based on amortized inference, an online inference algorithm is proposed to jointly learn the latent representations of entities and smooth their changes across time. Experiments on Yago and Wiki datasets demonstrate that our proposed algorithm outperforms the state-of-the-art static and temporal knowledge graph embedding models.
Food recommendation has become an important means to help guide users to adopt healthy dietary habits. Previous works on food recommendation either i) fail to consider users' explicit requirements, ii) ignore crucial health factors (e.g., allergies and nutrition needs), or iii) do not utilize the rich food knowledge for recommending healthy recipes. To address these limitations, we propose a novel problem formulation for food recommendation, modeling this task as constrained question answering over a large-scale food knowledge base/graph (KBQA). Besides the requirements from the user query, personalized requirements from the user's dietary preferences and health guidelines are handled in a unified way as additional constraints to the QA system. To validate this idea, we create a QA style dataset for personalized food recommendation based on a large-scale food knowledge graph and health guidelines. Furthermore, we propose a KBQA-based personalized food recommendation framework which is equipped with novel techniques for handling negations and numerical comparisons in the queries. Experimental results on the benchmark show that our approach significantly outperforms non-personalized counterparts (average 59.7% absolute improvement across various evaluation metrics), and is able to recommend more relevant and healthier recipes.
Multi-hop Knowledge Base Question Answering (KBQA) aims to find the answer entities that are multiple hops away in the Knowl- edge Base (KB) from the entities in the question. A major challenge is the lack of supervision signals at intermediate steps. Therefore, multi-hop KBQA algorithms can only receive the feedback from the final answer, which makes the learning unstable or ineffective. To address this challenge, we propose a novel teacher-student approach for the multi-hop KBQA task. In our approach, the stu- dent network aims to find the correct answer to the query, while the teacher network tries to learn intermediate supervision signals for improving the reasoning capacity of the student network. The major novelty lies in the design of the teacher network, where we utilize both forward and backward reasoning to enhance the learning of intermediate entity distributions. By considering bidi- rectional reasoning, the teacher network can produce more reliable intermediate supervision signals, which can alleviate the issue of spurious reasoning. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our approach on the KBQA task.
Community Question Answering (CQA) sites such as Yahoo! Answers and Baidu Knows have emerged as rich knowledge resources for information seekers. However, answers posted to CQA sites often vary a lot in their qualities. User votes from the community may partially reflect the overall quality of the answer, but they are often missing. Hence, automatic selection of "good'' answers becomes a practical research problem that will help us manage the quality of accumulated knowledge. Without loss of generality, a good answer should deliver not only relevant but also trustworthy information that can help resolve the information needs of the posted question, but the latter has received less investigation in the past. In this paper, we propose a novel matching-verification framework for automatic answer selection. The matching component assesses the relevance of a candidate answer to a given question as conventional QA methods. The major enhancement is the verification component, which aims to leverage the wisdom of crowds, e.g., some big information repository, for trustworthiness measurement. Given a question, we take the top retrieved results from the information repository as the supporting evidences to distill the consensus representation. A major challenge is that there is no guarantee that one can always obtain reliable consensus from the wisdom of crowds for a question due to the noisy nature and the limitation of the existing search technology.Therefore, we decompose the trustworthiness measurement into two parts, i.e., a verification score which measures the consistency between a candidate answer and the consensus representation, and a confidence score which measures the reliability of the consensus itself. Empirical studies on three real-world CQA data collections, i.e. YahooQA, QuoraQA and AmazonQA, show that our approach can significantly outperform the state-of-the-art methods on the answer selection task.
In recent years, marked temporal point processes (MTPPs) have emerged as a powerful modeling machinery to characterize asynchronous events in a wide variety of applications. MTPPs have demonstrated significant potential in predicting event-timings, especially for events arriving in near future. However, due to current design choices, MTPPs often show poor predictive performance at forecasting event arrivals in distant future. To ameliorate this limitation, in this paper, we design DualTPP which is specifically well-suited to long horizon event forecasting. DualTPP has two components. The first component is an intensity free MTPP model, which captures microscopic event dynamics by modeling the time of future events. The second component takes a different dual perspective of modeling aggregated counts of events in a given time-window, thus encapsulating macroscopic event dynamics. Then we develop a novel inference framework jointly over the two models by solving a sequence of constrained quadratic optimization problems. Experiments with a diverse set of real datasets show that DualTPP outperforms existing MTPP methods on long horizon forecasting by substantial margins, achieving almost an order of magnitude reduction in Wasserstein distance between actual events and forecasts. The code and the datasets can be found at the following URL: https://github.com/pratham16cse/DualTPP
The accurate and interpretable prediction of future events in time-series data often requires the capturing of representative patterns (or referred to as states) underpinning the observed data. To this end, most existing studies focus on the representation and recognition of states, but ignore the changing transitional relations among them. In this paper, we present evolutionary state graph, a dynamic graph structure designed to systematically represent the evolving relations (edges) among states (nodes) along time. We conduct analysis on the dynamic graphs constructed from the time-series data and show that changes on the graph structures (e.g., edges connecting certain state nodes) can inform the occurrences of events (i.e., time-series fluctuation). Inspired by this, we propose a novel graph neural network model, Evolutionary State Graph Network (EvoNet), to encode the evolutionary state graph for accurate and interpretable time-series event prediction. Specifically, EvoNet models both the node-level (state-to-state) and graph-level (segment-to-segment) propagation, and captures the node-graph (state-to-segment) interactions over time. Experimental results based on five real-world datasets show that our approach not only achieves clear improvements compared with 11 baselines, but also provides more insights towards explaining the results of event predictions.
Edge streams are commonly used to capture interactions in dynamic networks, such as email, social, or computer networks. The problem of detecting anomalies or rare events in edge streams has a wide range of applications. However, it presents many challenges due to lack of labels, a highly dynamic nature of interactions, and the entanglement of temporal and structural changes in the network. Current methods are limited in their ability to address the above challenges and to efficiently process a large number of interactions. Here, we propose F-FADE, a new approach for detection of anomalies in edge streams, which uses a novel frequency-factorization technique to efficiently model the time-evolving distributions of frequencies of interactions between node-pairs. The anomalies are then determined based on the likelihood of the observed frequency of each incoming interaction. F-FADE is able to handle in an online streaming setting a broad variety of anomalies with temporal and structural changes, while requiring only constant memory. Our experiments on one synthetic and six real-world dynamic networks show that F-FADE achieves state of the art performance and may detect anomalies that previous methods are unable to find.
Recent methods in sequential recommendation focus on learning an overall embedding vector from a user's behavior sequence for the next-item recommendation. However, from empirical analysis, we discovered that a user's behavior sequence often contains multiple conceptually distinct items, while a unified embedding vector is primarily affected by one's most recent frequent actions. Thus, it may fail to infer the next preferred item if conceptually similar items are not dominant in recent interactions. To this end, an alternative solution is to represent each user with multiple embedding vectors encoding different aspects of the user's intentions. Nevertheless, recent work on multi-interest embedding usually considers a small number of concepts discovered via clustering, which may not be comparable to the large pool of item categories in real systems. It is a non-trivial task to effectively model a large number of diverse conceptual prototypes, as items are often not conceptually well clustered in fine granularity. Besides, an individual usually interacts with only a sparse set of concepts. In light of this, we propose a novel Sparse Interest NEtwork (SINE) for sequential recommendation. Our sparse-interest module can adaptively infer a sparse set of concepts for each user from the large concept pool and output multiple embeddings accordingly. Given multiple interest embeddings, we develop an interest aggregation module to actively predict the user's current intention and then use it to explicitly model multiple interests for next-item prediction. Empirical results on several public benchmark datasets and one large-scale industrial dataset demonstrate that SINE can achieve substantial improvement over state-of-the-art methods.
Many real-world applications, e.g., healthcare, present multi-variate time series prediction problems. In such settings, in addition to the predictive accuracy of the models, model transparency and explainability are paramount. We consider the problem of building explainable classifiers from multi-variate time series data. A key criterion to understand such predictive models involves elucidating and quantifying the contribution of time varying input variables to the classification. Hence, we introduce a novel, modular, convolution-based feature extraction and attention mechanism that simultaneously identifies the variables as well as time intervals which determine the classifier output. We present results of extensive experiments with several benchmark data sets that show that the proposed method outperforms the state-of-the-art baseline methods on multi-variate time series classification task. The results of our case studies demonstrate that the variables and time intervals identified by the proposed method make sense relative to available domain knowledge.
Air pollution is an important environmental issue of increasing concern, which impacts human health. Accurate air quality prediction is crucial for avoiding people suffering from serious air pollution. Most of the prior works focus on capturing the temporal trend of air quality for each monitoring station. Recent deep learning based methods also model spatial dependencies among neighboring stations. However, we observe that besides geospatially adjacent stations, the stations which share similar functionalities or consistent temporal patterns could also have strong dependencies. In this paper, we propose an Attentive Temporal Graph Convolutional Network (ATGCN) to model diverse inter-station relationships for air quality prediction of citywide stations. Specifically, we first encode three types of relationships among stations including spatial adjacency, functional similarity, and temporal pattern similarity into graphs. Then we design parallel encoding modules, which respectively incorporate attentive graph convolution operations into the Gated Recurrent Units (GRUs) to iteratively aggregate features from related stations with different graphs. Furthermore, augmented with an attention-based fusion unit, decoding modules with a similar structure to the encoding modules are designed to generate multi-step predictions for all stations. The experiments on two real-world datasets demonstrate the superior performance of our model beyond state-of-the-art methods.
Bipartite graph embedding has recently attracted much attention due to the fact that bipartite graphs are widely used in various application domains. Most previous methods, which adopt random walk-based or reconstruction-based objectives, are typically effective to learn local graph structures. However, the global properties of bipartite graph, including community structures of homogeneous nodes and long-range dependencies of heterogeneous nodes, are not well preserved. In this paper, we propose a bipartite graph embedding called BiGI to capture such global properties by introducing a novel local-global infomax objective. Specifically, BiGI first generates a global representation which is composed of two prototype representations. BiGI then encodes sampled edges as local representations via the proposed subgraph-level attention mechanism. Through maximizing the mutual information between local and global representations, BiGI enables nodes in bipartite graph to be globally relevant. Our model is evaluated on various benchmark datasets for the tasks of top-K recommendation and link prediction. Extensive experiments demonstrate that BiGI achieves consistent and significant improvements over state-of-the-art baselines. Detailed analyses verify the high effectiveness of modeling the global properties of bipartite graph.
Graph centrality measures use the structure of a network to quantify central or "important" nodes, with applications in web search, social media analysis, and graphical data mining generally. Traditional centrality measures such as the well known PageRank interpret a directed edge as a vote in favor of the importance of the linked node. We study the case where nodes may belong to diverse communities or interests and investigate centrality measures that can identify nodes that are simultaneously important to many such diverse communities. We propose a family of diverse centrality measures formed as fixed point solutions to a generalized nonlinear eigenvalue problem. Our measure can be efficiently computed on large graphs by iterated best response and we study its normative properties on both random graph models and real-world data. We find that we are consistently and efficiently able to identify the most important diverse nodes of a graph, that is, those that are simultaneously central to multiple communities.
Accelerating Deep Convolutional Neural Networks(CNNs) has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning has been regarded as a promising solution, which is due to its advantage in significant speedup and memory reduction of both network model and intermediate feature maps. Previous works utilized "smaller-norm-less-important" criterion to prune filters with smaller lp-norm values by pruning and retraining alternately. However, they ignore the effects of $feedback: most current approaches that prune filters only consider the statistics of the filters (e.g., prune filter with small lp-norm values), without considering the performance of the pruned model as an important feedback signal in the next iteration of filter pruning. To solve the problem of non-feedback, we propose a novel filter pruning method, namely Filter Pruning via Probabilistic Model-based Optimization (FPPMO). FPPMO solves the problem of non-feedback by pruning filters in a probabilistic manner. We introduce a pruning probability for each filter, and pruning is guided by sampling from the pruning probability distribution. An optimization method is proposed to update the pruning probability based on the performance of the pruned model in the pruning process. When applied to two image classification benchmarks, the effectiveness of our FPPMO is validated. Notably, on CIFAR-10, our FPPMO reduces more than 57% FLOPs on ResNet-110 with even 0.08% relative accuracy improvement. Moreover, on ILSVRC-2012, our FPPMO reduces more than 50% FLOPs on ResNet-101 without top-5 accuracy drop. Which proving that our FPPMO outperforms the state-of-the-art filter pruning method.
Knowledge tracing is a fundamental task in intelligent education for tracking the knowledge states of students on necessary concepts. In recent years, Deep Knowledge Tracing (DKT) utilizes recurrent neural networks to model student learning sequences. This approach has achieved significant success and has been widely used in many educational applications. However, in practical scenarios, it tends to suffer from the following critical problems due to data isolation: 1) Data scarcity. Educational data, which is usually distributed across different silos (e.g., schools), is difficult to gather. 2) Different data quality. Students in different silos have different learning schedules, which results in unbalanced learning records, meaning that it is necessary to evaluate the learning data quality independently for different silos. 3) Data incomparability. It is difficult to compare the knowledge states of students with different learning processes from different silos. Inspired by federated learning, in this paper, we propose a novel Federated Deep Knowledge Tracing (FDKT) framework to collectively train high-quality DKT models for multiple silos. In this framework, each client takes charge of training a distributed DKT model and evaluating data quality by leveraging its own local data, while a center server is responsible for aggregating models and updating the parameters for all the clients. In particular, in the client part, we evaluate data quality incorporating different education measurement theories, and we construct two quality-oriented implementations based on FDKT, i.e., FDKTCTT and FDKTIRT-where the means of data quality evaluation follow Classical Test Theory and Item Response Theory, respectively. Moreover, in the server part, we adopt hierarchical model interpolation to uptake local effects for model personalization. Extensive experiments on real-world datasets demonstrate the effectiveness and superiority of the FDKT framework.
A lot of research has focused on the efficiency of search engine query processing, and in particular on disjunctive top-k queries that return the highest scoring k results that contain at least one of the query terms. Disjunctive top-k queries over simple ranking functions are commonly used to retrieve an initial set of candidate results that are then reranked by more complex, often machine-learned rankers. Many optimized top-k algorithms have been proposed, including MaxScore, WAND, BMW, and JASS. While the fastest methods achieve impressive results on top-10 and top-100 queries, they tend to become much slower for the larger k commonly used for candidate generation. In this paper, we focus on disjunctive top-k queries for larger k. We propose new algorithms that achieve much faster query processing for values of k up to thousands or tens of thousands. Our algorithms build on top of the live-block filtering approach of Dimopoulos et al, and exploit the SIMD capabilities of modern CPUs. We also perform a detailed experimental comparison of our methods with the fastest known approaches, and release a full model implementation of our methods and of the underlying live-block mechanism, which will allows others to design and experiment with additional methods under the live-block approach.
Graph neural networks (GNNs) have shown great power in modeling graph structured data. However, similar to other machine learning models, GNNs may make predictions biased on protected sensitive attributes, e.g., skin color and gender. Because machine learning algorithms including GNNs are trained to reflect the distribution of the training data which often contains historical bias towards sensitive attributes. In addition, the discrimination in GNNs can be magnified by graph structures and the message-passing mechanism. As a result, the applications of GNNs in sensitive domains such as crime rate prediction would be largely limited. Though extensive studies of fair classification have been conducted on i.i.d data, methods to address the problem of discrimination on non-i.i.d data are rather limited. Furthermore, the practical scenario of sparse annotations in sensitive attributes is rarely considered in existing works. Therefore, we study the novel and important problem of learning fair GNNs with limited sensitive attribute information. FairGNN is proposed to eliminate the bias of GNNs whilst maintaining high node classification accuracy by leveraging graph structures and limited sensitive information. Our theoretical analysis shows that FairGNN can ensure the fairness of GNNs under mild conditions given limited nodes with known sensitive attributes. Extensive experiments on real-world datasets also demonstrate the effectiveness of FairGNN in debiasing and keeping high accuracy.
Finding dense regions of graphs is fundamental in graph mining. We focus on the computation of dense hierarchies and regions with graph nuclei---a generalization of k-cores and trusses. Static computation of nuclei, namely through variants of 'peeling', are easy to understand and implement. However, many practically important graphs undergo continuous change. Dynamic algorithms, maintaining nucleus computations on dynamic graph streams, are nuanced and require significant effort to port between nuclei, e.g., from k-cores to trusses.
We propose a unifying framework to maintain nuclei in dynamic graph streams. First, we show no dynamic algorithm can asymptotically beat re-computation, highlighting the need to experimentally understand variability. Next, we prove equivalence between k-cores on a special hypergraph and nuclei. Our algorithm splits the problem into maintaining the special hypergraph and maintaining k-cores on it. We implement our algorithm and experimentally demonstrate improvements up to 108 x over re-computation. We show algorithmic improvements on k-cores apply to trusses and outperform truss-specific implementations.
Despite achieving strong performance in semi-supervised node classification task, graph neural networks (GNNs) are vulnerable to adversarial attacks, similar to other deep learning models. Existing researches focus on developing either robust GNN models or attack detection methods against adversarial attacks on graphs. However, little research attention is paid to the potential and practice of immunization to adversarial attacks on graphs. In this paper, we propose and formulate the graph adversarial immunization problem, i.e., vaccinating an affordable fraction of node pairs, connected or unconnected, to improve the certifiable robustness of graph against any admissible adversarial attack. We further propose an effective algorithm, called AdvImmune, which optimizes with meta-gradient in a discrete way to circumvent the computationally expensive combinatorial optimization when solving the adversarial immunization problem. Experiments are conducted on two citation networks and one social network. Experimental results demonstrate that the proposed AdvImmune method remarkably improves the ratio of robust nodes by 12%, 42%, 65%, with an affordable immune budget of only 5% edges.
Sponsored search auction is a crucial component of modern search engines. It requires a set of candidate bidwords that advertisers can place bids on. Existing methods generate bidwords from search queries or advertisement content. However, they suffer from the data noise in (query, bidword) and (advertisement, bidword) pairs. In this paper, we propose a triangular bidword generation model (TRIDENT), which takes the high-quality data of paired (query, advertisement) as a supervision signal to indirectly guide the bidword generation process. Our proposed model is simple yet effective: by using bidword as the bridge between search query and advertisement, the generation of search query, advertisement and bidword can be jointly learned in the triangular training framework. This alleviates the problem that the training data of bidword may be noisy. Experimental results, including automatic and human evaluations, show that our proposed TRIDENT can generate relevant and diverse bidwords for both search queries and advertisements. Our evaluation on online real data validates the effectiveness of the TRIDENT's generated bidwords for product search.
Modeling user interests is crucial in real-world recommender systems. In this paper, we present a new user interest representation model for personalized recommendation. Specifically, the key novelty behind our model is that it explicitly models user interests as a hypercuboid instead of a point in the space. In our approach, the recommendation score is learned by calculating a compositional distance between the user hypercuboid and the item. This helps to alleviate the potential geometric inflexibility of existing collaborative filtering approaches, enabling a greater extent of modeling capability. Furthermore, we present two variants of hypercuboids to enhance the capability in capturing the diversities of user interests. A neural architecture is also proposed to facilitate user hypercuboid learning by capturing the activity sequences (e.g., buy and rate) of users. We demonstrate the effectiveness of our proposed model via extensive experiments on both public and commercial datasets. Empirical results show that our approach achieves very promising results, outperforming existing state-of-the-art.
Recently, graph neural networks have been widely used for network embedding because of their prominent performance in pairwise relationship learning. In the real world, a more natural and common situation is the coexistence of pairwise relationships and complex non-pairwise relationships, which is, however, rarely studied. In light of this, we propose a graph neural network-based representation learning framework for heterogeneous hypergraphs, an extension of conventional graphs, which can well characterize multiple non-pairwise relations. Our framework first projects the heterogeneous hypergraph into a series of snapshots and then we take the Wavelet basis to perform localized hypergraph convolution. Since the Wavelet basis is usually much sparser than the Fourier basis, we develop an efficient polynomial approximation to the basis to replace the time-consuming Laplacian decomposition. Extensive evaluations have been conducted and the experimental results show the superiority of our method. In addition to the standard tasks of network embedding evaluation such as node classification, we also apply our method to the task of spammers detection and the superior performance of our framework shows that relationships beyond pairwise are also advantageous in the spammer detection. To make our experiment repeatable, source codes and related datasets are available at https://xiangguosun.mystrikingly.com
This work presents a generalized local factor model, namely Local Collaborative Autoencoders (LOCA). To our knowledge, it is the first generalized framework under the local low-rank assumption that builds on the neural recommendation models. We explore a large number of local models by adopting a generalized framework with different weight schemes for training and aggregating them. Besides, we develop a novel method of discovering a sub-community to maximize the coverage of local models. Our experimental results demonstrate that LOCA is highly scalable, achieving state-of-the-art results by outperforming existing AE-based and local latent factor models on several large-scale public benchmarks.
Given an undirected graph, the Densest-k-Subgraph problem (DkS) seeks to find a subset of k vertices such that the sum of the edge weights in the corresponding subgraph is maximized. The problem is known to be NP-hard, and is also very difficult to approximate, in the worst-case. In this paper, we present a new convex relaxation for the problem. Our key idea is to reformulate DkS as minimizing a submodular function subject to a cardinality constraint. Exploiting the fact that submodular functions possess a convex, continuous extension (known as the Lovasz extension), we propose to minimize the Lovasz extension over the convex hull of the cardinality constraints. Although the Lovasz extension of a submodular function does not admit an analytical form in general, for DkS we show that it does. We leverage this result to develop a highly scalable algorithm based on the Alternating Direction Method of Multipliers (ADMM) for solving the relaxed problem. Coupled with a pair of fortuitously simple rounding schemes, we demonstrate that our approach outperforms existing baselines on real-world graphs and can yield high quality sub-optimal solutions which typically are a posteriori no worse than65-80%of the optimal density.
In signed networks, each edge is labeled as either positive or negative. The edge sign captures the polarity of a relationship. Balance of signed networks is a well-studied property in graph theory. In a balanced (sub)graph, the vertices can be partitioned into two subsets with negative edges present only across the partitions. Balanced portions of a graph have been shown to increase coherence among its members and lead to better performance. While existing works have focused primarily on finding the largest balanced subgraph inside a graph, we study the network design problem of maximizing balance of a target community (subgraph). In particular, given a budget b and a community of interest within the signed network, we aim to make the community as close to being balanced as possible by deleting up to b edges. Besides establishing NP-hardness, we also show that the problem is non-monotone and non-submodular. To overcome these computational challenges, we propose heuristics based on the spectral relation of balance with the Laplacian spectrum of the network. Since the spectral approach lacks approximation guarantees, we further design a greedy algorithm, and its randomized version, with provable bounds on the approximation quality. The bounds are derived by exploiting pseudo-submodularity of the balance maximization function. Empirical evaluation on eight real-world signed networks establishes that the proposed algorithms are effective, efficient, and scalable to graphs with millions of edges.
Influence diffusion estimation is a crucial problem in social network analysis. Most prior works mainly focus on predicting the total influence spread, i.e., the expected number of influenced nodes given an initial set of active nodes (aka. seeds). However, accurate estimation of susceptibility, i.e., the probability of being influenced for each individual, is more appealing and valuable in real-world applications. Previous methods generally adopt Monte Carlo simulation or heuristic rules to estimate the influence, resulting in high computational cost or unsatisfactory estimation error when these methods are used to estimate susceptibility. In this work, we propose to leverage graph neural networks (GNNs) for predicting susceptibility. As GNNs aggregate multi-hop neighbor information and could generate over-smoothed representations, the prediction quality for susceptibility is undesirable. To address the shortcomings of GNNs for susceptibility estimation, we propose a novel DeepIS model with a two-step approach: (1) a coarse-grained step where we estimate each node's susceptibility coarsely; (2) a fine-grained step where we aggregate neighbors' coarse-grained susceptibility estimations to compute the fine-grained estimate for each node. The two modules are trained in an end-to-end manner. We conduct extensive experiments and show that on average DeepIS achieves five times smaller estimation error than state-of-the-art GNN approaches and two magnitudes faster than Monte Carlo simulation.
Categorizing documents into a given label hierarchy is intuitively appealing due to the ubiquity of hierarchical topic structures in massive text corpora. Although related studies have achieved satisfying performance in fully supervised hierarchical document classification, they usually require massive human-annotated training data and only utilize text information. However, in many domains, (1) annotations are quite expensive where very few training samples can be acquired; (2) documents are accompanied by metadata information. Hence, this paper studies how to integrate the label hierarchy, metadata, and text signals for document categorization under weak supervision. We develop HiMeCat, an embedding-based generative framework for our task. Specifically, we propose a novel joint representation learning module that allows simultaneous modeling of category dependencies, metadata information and textual semantics, and we introduce a data augmentation module that hierarchically synthesizes training documents to complement the original, small-scale training set. Our experiments demonstrate a consistent improvement of HiMeCat over competitive baselines and validate the contribution of our representation learning and data augmentation modules.
Graph Neural Networks (GNNs) have shown to be powerful tools for graph analytics. The key idea is to recursively propagate and aggregate information along the edges of the given graph. Despite their success, however, the existing GNNs are usually sensitive to the quality of the input graph. Real-world graphs are often noisy and contain task-irrelevant edges, which may lead to suboptimal generalization performance in the learned GNN models. In this paper, we propose PTDNet, a parameterized topological denoising network, to improve the robustness and generalization performance of GNNs by learning to drop task-irrelevant edges. PTDNet prunes task-irrelevant edges by penalizing the number of edges in the sparsified graph with parameterized networks. To take into consideration the topology of the entire graph, the nuclear norm regularization is applied to impose the low-rank constraint on the resulting sparsified graph for better generalization. PTDNet can be used as a key component in GNN models to improve their performances on various tasks, such as node classification and link prediction. Experimental studies on both synthetic and benchmark datasets show that PTDNet can improve the performance of GNNs significantly and the performance gain becomes larger for more noisy datasets.
Citing comprehensive and correct related work is crucial in academic writing. It can not only support the author's claims but also help readers trace other related research papers. Nowadays, with the rapid increase in the number of scientific literatures, it has become increasingly challenging to search for high-quality citations and write the manuscript. In this paper, we present an automatic writing assistant model, AutoCite, which not only infers potentially related work but also automatically generates the citation context at the same time. Specifically, AutoCite involves a novel multi-modal encoder and a multi-task decoder architecture. Based on the multi-modal inputs, the encoder in AutoCite learns paper representations with both citation network structure and textual contexts. The multi-task decoder in AutoCite couples and jointly learns citation prediction and context generation in a unified manner. To effectively join the encoder and decoder, we introduce a novel representation fusion component, i.e., gated neural fusion, which feeds the multi-modal representation inputs from the encoder and creates outputs for the downstream multi-task decoder adaptively. Extensive experiments on five real-world citation network datasets validate the effectiveness of our model.
Attributed network embedding aims to learn low dimensional node representations by combining both the network's topological structure and node attributes. Most of the existing methods either propagate the attributes over the network structure or learn the node representations by an encoder-decoder framework. However, propagation based methods tend to prefer network structure to node attributes, whereas encoder-decoder methods tend to ignore the longer connections beyond the immediate neighbors. In order to address these limitations while enjoying the best of the two worlds, we design cross fusion layers for unsupervised attributed network embedding. Specifically, we first construct two separate views to handle network structure and node attributes, and then design cross fusion layers to allow flexible information exchange and integration between the two views. The key design goals of the cross fusion layers are three-fold: 1) allowing critical information to be propagated along the network structure, 2) encoding the heterogeneity in the local neighborhood of each node during propagation, and 3) incorporating an additional node attribute channel so that the attribute information will not be overshadowed by the structure view. Extensive experiments on three datasets and three downstream tasks demonstrate the effectiveness of the proposed method.
Predicting crowd flows is crucial for urban planning, traffic management and public safety. However, predicting crowd flows is not trivial because of three challenges: 1) highly heterogeneous mobility data collected by various services; 2) complex spatio-temporal correlations of crowd flows, including multi-scale spatial correlations along with non-linear temporal correlations. 3) diversity in long-term temporal patterns. To tackle these challenges, we proposed an end-to-end architecture, called pyramid dilated spatial-temporal network (PDSTN), to effectively learn spatial-temporal representations of crowd flows with a novel attention mechanism. Specifically, PDSTN employs the ConvLSTM structure to identify complex features that capture spatial-temporal correlations simultaneously, and then stacks multiple ConvLSTM units for deeper feature extraction. For further improving the spatial learning ability, a pyramid dilated residual network is introduced by adopting several dilated residual ConvLSTM networks to extract multi-scale spatial information. In addition, a novel attention mechanism, which considers both long-term periodicity and the shift in periodicity, is designed to study diverse temporal patterns. Extensive experiments were conducted on three highly heterogeneous real-world mobility datasets to illustrate the effectiveness of PDSTN beyond the state-of-the-art methods. Moreover, PDSTN provides intuitive interpretation into the prediction.
We generalize triadic closure, along with previous generalizations of triadic closure, under an intuitive umbrella generalization: the Subgraph-to-Subgraph Transition (SST). We present algorithms and code to model graph evolution in terms of collections of these SSTs. We then use the SST framework to create link prediction models for both static and temporal, directed and undirected graphs which produce highly interpretable results. Quantitatively, our models match out-of-the-box performance of state of the art graph neural network models, thereby validating the correctness and meaningfulness of our interpretable results.
Anomaly detection in time series is a research area of increasing importance. In order to safeguard the availability and stability of services, large companies need to monitor various time-series data to detect anomalies in real time for troubleshooting, thereby reducing potential economic losses. However, in many practical applications, time-series anomaly detection is still an intractable problem due to the huge amount of data, complex data patterns, and limited computational resources. SPOT is an efficient streaming algorithm for anomaly detection, but it is only sensitive to extreme values in the whole data distribution. In this paper, we propose FluxEV, a fast and effective unsupervised anomaly detection framework. By converting the non-extreme anomalies to extreme values, our framework addresses the limitation of SPOT and achieves a huge improvement in the detection accuracy. Moreover, Method of Moments is adopted to speed up the parameter estimation in the automatic thresholding. Extensive experiments show that FluxEV greatly outperforms the state-of-the-art baselines on two large public datasets while ensuring high efficiency.
Node classification is an important research topic in graph learning. Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification. However, existing GNNs address the problem where node samples for different classes are balanced; while for many real-world scenarios, some classes may have much fewer instances than others. Directly training a GNN classifier in this case would under-represent samples from those minority classes and result in sub-optimal performance. Therefore, it is very important to develop GNNs for imbalanced node classification. However, the work on this is rather limited. Hence, we seek to extend previous imbalanced learning techniques for i.i.d data to the imbalanced node classification task to facilitate GNN classifiers. In particular, we choose to adopt synthetic minority over-sampling algorithms, as they are found to be the most effective and stable. This task is non-trivial, as previous synthetic minority over-sampling algorithms fail to provide relation information for newly synthesized samples, which is vital for learning on graphs. Moreover, node attributes are high-dimensional. Directly over-sampling in the original input domain could generates out-of-domain samples, which may impair the accuracy of the classifier. We propose a novel framework, \method, in which an embedding space is constructed to encode the similarity among the nodes. New samples are synthesize in this space to assure genuineness. In addition, an edge generator is trained simultaneously to model the relation information, and provide it for those new samples. This framework is general and can be easily extended into different variations. The proposed framework is evaluated using three different datasets, and it outperforms all baselines with a large margin.
Lack of training data in low-resource languages presents huge challenges to sequence labeling tasks such as named entity recognition (NER) and machine reading comprehension (MRC). One major obstacle is the errors on the boundary of predicted answers. To tackle this problem, we propose CalibreNet, which predicts answers in two steps. In the first step, any existing sequence labeling method can be adopted as a base model to generate an initial answer. In the second step, CalibreNet refines the boundary of the initial answer. To tackle the challenge of lack of training data in low-resource languages, we dedicatedly develop a novel unsupervised phrase boundary recovery pre-training task to enhance the multilingual boundary detection capability of CalibreNet. Experiments on two cross-lingual benchmark datasets show that the proposed approach achieves SOTA results on zero-shot cross-lingual NER and MRC tasks.
Predicting pairwise relationships between nodes in graphs is a fundamental task in data mining with many real-world applications, such as link prediction on social networks, relation prediction on knowledge graphs, etc. A dominating methodology is to first use advanced graph representation methods to learn generic node representations and then build a pairwise prediction classifier with the target nodes' vectors concatenated as input. However, such methods suffer from low interpretability, as it is difficult to explain why certain relationships are predicted only based on their prediction scores. In this paper, we propose to model the pairwise interactions between neighboring nodes (i.e., contexts) of target pairs. The new formulation enables us to build more appropriate representations for node pairs and gain better model interpretability (by highlighting meaningful interactions). To this end, we introduce a unified framework with two general perspectives, node-centric and pair-centric, about how to model context pair interactions. We also propose a novel pair-centric context interaction model and a new pre-trained embedding, which represents the pair semantics and shows many attractive properties. We test our models on two common pairwise prediction tasks: link prediction task and relation prediction task, and compare them with graph feature-based, embedding-based, and Graph Neural Network (GNN)-based baselines. Our experimental results show the superior performance of the pre-trained pair embeddings and that the pair-centric interaction model outperforms all baselines by a large margin.
We consider the problem of learning efficient and inductive graph convolutional networks for text classification with a large number of examples and features. Existing state-of-the-art graph embedding based methods such as predictive text embedding (PTE) and TextGCN have shortcomings in terms of predictive performance, scalability and inductive capability. To address these limitations, we propose a heterogeneous graph convolutional network (HeteGCN) modeling approach that unites the best aspects of PTE and TextGCN together. The main idea is to learn feature embeddings and derive document embeddings using a HeteGCN architecture with different graphs used across layers. We simplify TextGCN by dissecting into several HeteGCN models which (a) helps to study the usefulness of individual models and (b) offers flexibility in fusing learned embeddings from different models. In effect, the number of model parameters is reduced significantly, enabling faster training and improving performance in small labeled training set scenario. Our detailed experimental studies demonstrate the efficacy of the proposed approach.
This paper proposes a scalable multilevel framework for the spectral embedding of large undirected graphs. The proposed method first computes much smaller yet sparse graphs while preserving the key spectral (structural) properties of the original graph, by exploiting a nearly-linear time spectral graph coarsening approach. Then, the resultant spectrally-coarsened graphs are leveraged for the development of much faster algorithms for multilevel spectral graph embedding (clustering) as well as visualization of large data sets. We conducted extensive experiments using a variety of large graphs and datasets and obtained very promising results. For instance, we are able to coarsen the "coPapersCiteseer" graph with 0.43 million nodes and 16 million edges into a much smaller graph with only 13K (32X fewer) nodes and 17K (950X fewer) edges in about 16 seconds; the spectrally-coarsened graphs allow us to achieve up to 1,100X speedup for multilevel spectral graph embedding (clustering) and up to 60X speedup for t-SNE visualization of large data sets.
We present an online Item Intelligent Publishing System used in large scale Consumer to Consumer (C2C) transaction platform, named as I2PS, it is designed for personal seller to publish their items in an automatic way with the uploaded images. Our I2PS is deployed in Xianyu mobile App, the largest second-hand item shopping platform in China. The proposed system not only can guide seller how to photograph the publishing items with more details based on category recognition module, but also can intelligently tell the seller exactly what this product is and which attributes it has based on various recognition methods. The seller does not need to input extra product information, so that the item's publish process in Xianyu can be performed without any difficulty. In this paper, we introduce several techniques we used to develop the I2PS for product understanding, including product's category recognition, Standard Product Unit (SPU) recognition, multi-label attribute recognition and their corresponding pre-processing technologies. Our system deployed in Xianyu can help tens of millions personal sellers to publish their items, and improves publishing success rate by more than 15% and reduces publishing duration by more than 20%. The demo video is available at https://youtu.be/3NRx2hECIHc.
This paper demonstrates FinSense, a system that improves the working efficiency of financial information processing. Given the draft of a financial news story, FinSense extracts the explicit-mentioned stocks and further infers the implicit stocks, providing insightful information for decision making. We propose a novel graph convolutional network model that performs implicit financial instrument inference toward the in-domain data. In addition, FinSense generates candidate headlines for the draft, reducing a significant amount of time in journalism production. The proposed system also provides assistance to investors to sort out the information in the financial news articles.
In past years several works have noted that Twitter data are essential in diverse fields and may have a lot of applications. Nevertheless, the API proposed by Twitter sternly restricts access to public data generated by users. These restrictions have the consequences of greatly slowing down the contributions of researchers and of limiting their scope. In this paper we introduce TwiScraper, a collaborative project to enhance Twitter data collection by scraping methods. We present a module allowing user-centered data collection: Twi-FFN.
In this technical demonstration, we showcase the World's first personality-driven marketing content generation platform, called SoMin.ai. The platform combines deep multi-view personality profiling framework and style generative adversarial networks facilitating the automatic creation of content that appeals to different human personality types. The platform can be used for enhancement of the social networking user experience as well as for content marketing routines. Guided by the MBTI personality type, automatically derived from a user social network content, SoMin.ai generates new social media content based on the preferences of other users with a similar personality type aiming at enhancing the user experience on social networking venues as well diversifying the efforts of marketers when crafting new content for digital marketing campaigns. The real-time user feedback to the platform via the platform's GUI fine-tunes the content generation model and the evaluation results demonstrate the promising performance of the proposed multi-view personality profiling framework when being applied in the content generation scenario. By leveraging content generation at a large scale, marketers will be able to execute more effective digital marketing campaigns at a lower cost.
Recommendation systems help to predict user demand and improve the quality of services offered. While the performance of a recommendation system depends on the quality and quantity of feedback from users, the two major approaches to feedback sacrifice quality for quantity or vice versa; implicit feedback is more abundant but less reliable, while explicit feedback is more credible but harder to collect. Although a hybrid approach has the potential to combine the strengths of both kinds of feedback, the existing approaches using explicit feedback are not suitable for such a combination. In this study, we design a novel feedback suitable for the hybrid approach and use it improve the performance of a recommendation system. The system enables us to collect more varied and less biased feedback from users. It improves performance without requiring major changes to the inference model. It also provides a unique and rich source of information of the model itself. We demonstrate an application of Colorful Feedback showing how it can improve an existing recommendation model.
With the outbreak of COVID-19, it is urgent and necessary to design a system that can access to information from COVID-19 related documents. Current methods fail to do so since the knowledge about COVID-19, an emerging disease, keeps changing and growing. In this study, we design a dynamic document-based question answering system, namely Web Understanding and Learning with AI (WULAI-QA). WULAI-QA employs feature engineering and online learning to adapt to the non-stationary environment and maintains good and steady performance. We evaluate WULAI-QA's performance on a public question answering (https://www.datafountain.cn/competitions/424) and rank first. We demonstrate that WULAI-QA can learn from user feedback and is easy to use. We believe that WULAI-QA will definitely help people understand COVID-19 and play an important role to fight against the pandemic.
E-commerce platforms often use a predefined structured hierarchy of product categories. Apart from helping buyers sort between different product types, listing categorization is also critical for multiple downstream tasks, including the platform's main listing search. Traditionally, when creating a new listing, sellers need to assign the product they sell to a single category. However, the high diversity of product types in the platform, along with the hierarchy's low level of granularity result in tens of thousands of different possible categories that sellers need to pick from. This, in turn, creates a unique classification challenge, especially for sellers with a large number of listings. Moreover, the expected cost of making a category classification error is high, as it can impact the likelihood that their listing will get discovered by relevant buyers, and eventually sold.
To help with the challenge of category recognition we present CatReComm - an interactive real-time system that is generalized to provide category recommendations in different e-commerce scenarios. We present results from using the system for two main sub-tasks - listing and search-query category recognition, and demonstrate an end-to-end scenario of the former one. The system uses a convolutional sequence-to-sequence approach, and to the best of our knowledge, is the first to use this approach for category recognition. We define a new metric for evaluating this model which captures the hierarchical characteristics of the data and supports displaying multiple classification results. Finally, our experimental results show the effectiveness and efficiency on real-world data.
Modern search engines retrieve results mainly based on the keyword matching techniques, and thus fail to answer analytical queries like "apps with more than 1 billion monthly active users" or "population growth of the US from 2015 to 2019", which requires numerical reasoning or aggregating results from multiple web pages. Such analytical queries are very common in the data analysis area, the expected results would be structured tables or charts. In most cases, these structured results are not available or accessible, they scatter in various text sources. In this work, we build AnaSearch, a search system to support analytical queries, and return structured results that can be visualized in the form of tables or charts. We collect and build structured quantitative data from the unstructured text on the web automatically. With AnaSearch, data analysts could easily derive insights for decision making with keyword or natural language queries. Specifically, we build AnaSearch under the COVID-19 news data, which makes it easy to compare with manually collected structured data.
Federated Learning (FL) allows to collaboratively build machine learning models between different entities without the need for sharing or gathering the data. In FL, typically there is a global server and a set of clients (stakeholders) to build shared machine learning models. In contrast to distributed machine learning, the controller of the training process (here the global server) never sees the data of the stakeholders participating in FL. Every stakeholder owns his own data and doesn't share it. During the training and learning process, only the model updates (e.g. gradients) are shared. To our best of knowledge, we did not find a publicly available practical federated learning framework for stakeholders. We have built a framework that enables FL for a small number of stakeholders. In the paper, we describe the framework architecture, communication protocol, and algorithms. Our framework is open-sourced and it is easy to set up for stakeholders and ensures that no private information is leaked during the training process.
In this paper, we demonstrate our Global Personalized Recommender (GPR) system for restaurants. GPR does not use any explicit reviews, ratings, or domain-specific metadata but rather leverages over 3 billion anonymized payment transactions to learn user and restaurant behavior patterns. The design and development of GPR have been challenging, primarily due to the scale and skew of the data. Our system supports over 450M cardholders from over 200 countries and 2.5M restaurants in over 35K cities worldwide, respectively. Additionally, GPR being a global recommender system, needs to account for the regional variations in people's food choices and habits. We address the challenges by combining three different recommendation algorithms instead of using a single revolutionary model in the backend. The individual recommendation models are scalable and adapt to varying data skew challenges to ensure high-quality personalized recommendations for any user anywhere in the world.
The impact of online social media on societal events and institutions is profound, and with the rapid increases in user uptake, we are just starting to understand its ramifications. Social scientists and practitioners who model online discourse as a proxy for real-world behavior often curate large social media datasets. A lack of available tooling aimed at non-data science experts frequently leaves this data (and the insights it holds) underutilized. Here, we propose birdspotter -- a tool to analyze and label Twitter users --, and birdspotter.ml -- an exploratory visualizer for the computed metrics. birdspotter provides an end-to-end analysis pipeline, from the processing of pre-collected Twitter data to general-purpose labeling of users and estimating their social influence, within a few lines of code. The package features tutorials and detailed documentation. We also illustrate how to train birdspotter into a fully-fledged bot detector that achieves better than state-of-the-art performances without making Twitter API calls, and we showcase its usage in an exploratory analysis of a topical COVID-19 dataset.
Click-through rate (CTR) prediction is a crucial task in recommender systems and online advertising. The embedding-based neural networks have been proposed to learn both explicit feature interactions through a shallow component and deep feature interactions by a deep neural network (DNN) component. These sophisticated models, however, slow down the prediction inference by at least hundreds of times. To address the issue of significantly increased serving latency and high memory usage for real-time serving in production, this paper presents DeepLight: a framework to accelerate the CTR predictions in three aspects: 1) accelerate the model inference via explicitly searching informative feature interactions in the shallow component; 2) prune redundant parameters at the inter-layer level in the DNN component; 3) prune the dense embedding vectors to make them sparse in the embedding matrix. By combining the above efforts, the proposed approach accelerates the model inference by 46X on Criteo dataset and 27X on Avazu dataset without any loss on the prediction accuracy. This paves the way for successfully deploying complicated embedding-based neural networks in real-world serving systems.
Solving cold-start problems is indispensable to provide meaningful recommendation results for new users and items. Under sparsely observed data, unobserved user-item pairs are also a vital source for distilling latent users' information needs. Most present works leverage unobserved samples for extracting negative signals. However, such an optimisation strategy can lead to biased results toward already popular items by frequently handling new items as negative instances. In this study, we tackle the cold-start problems for new users/items by appropriately leveraging unobserved samples. We propose a knowledge graph (KG)-aware recommender based on graph neural networks, which augments labelled samples through pseudo-labelling. Our approach aggressively employs unobserved samples as positive instances and brings new items into the spotlight. To avoid exhaustive label assignments to all possible pairs of users and items, we exploit a KG for selecting probably positive items for each user. We also utilise an improved negative sampling strategy and thereby suppress the exacerbation of popularity biases. Through experiments, we demonstrate that our approach achieves improvements over the state-of-the-art KG-aware recommenders in a variety of scenarios; in particular, our methodology successfully improves recommendation performance for cold-start users/items.
Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers. Despite the multiple benefits of Bayesian methods (such as uncertainty-aware predictions, incorporation of experts knowledge, and hierarchical modeling), the quality of classic Bayesian inference depends critically on whether observations conform with the assumed data generating model, which is impossible to guarantee in practice. In this work, we propose a variational inference method that, in a principled way, can simultaneously scale to large datasets, and robustify the inferred posterior with respect to the existence of outliers in the observed data. Reformulating Bayes theorem via the β-divergence, we posit a robustified generalized Bayesian posterior as the target of inference. Moreover, relying on the recent formulations of Riemannian coresets for scalable Bayesian inference, we propose a sparse variational approximation of the robustified posterior and an efficient stochastic black-box algorithm to construct it. Overall our method allows releasing cleansed data summaries that can be applied broadly in scenarios involving structured and unstructured data contamination. We illustrate the applicability of our approach in diverse simulated and real datasets, and various statistical models, including Gaussian mean inference, logistic and neural linear regression, demonstrating its superiority to existing Bayesian summarization methods in the presence of outliers.
Effectively measuring, understanding, and improving mobile app performance is of paramount importance for mobile app developers. Across the mobile Internet landscape, companies run online controlled experiments (A/B tests) with thousands of performance metrics in order to understand how app performance causally impacts user retention and to guard against service or app regressions that degrade user experiences. To capture certain characteristics particular to performance metrics, such as enormous observation volume and high skewness in distribution, an industry-standard practice is to construct a performance metric as a quantile over all performance events in control or treatment buckets in A/B tests. In our experience with thousands of A/B tests provided by Snap, we have discovered some pitfalls in this industry-standard way of calculating performance metrics that can lead to unexplained movements in performance metrics and unexpected misalignment with user engagement metrics. In this paper, we discuss two major pitfalls in this industry-standard practice of measuring performance for mobile apps. One arises from strong heterogeneity in both mobile devices and user engagement, and the other arises from self-selection bias caused by post-treatment user engagement changes. To remedy these two pitfalls, we introduce several scalable methods including user-level performance metric calculation and imputation and matching for missing metric values. We have extensively evaluated these methods on both simulation data and real A/B tests, and have deployed them into Snap's in-house experimentation platform.
Representation learning is the keystone for collaborative filtering. The learned representations should reflect both explicit factors that are revealed by extrinsic attributes such as movies' genres, books' authors, and implicit factors that are implicated in the collaborative signal. Existing methods fail to decompose these two types of factors, making it difficult to infer the deep motivations behind user behaviors, and thus suffer from sub-optimal solutions. In this paper, we propose Decomposed Collaborative Filtering (DCF) to address the above problems. For the explicit representation learning, we devise a user-specific relation aggregator to aggregate the most important attributes. For the implicit part, we propose Decomposed Graph Convolutional Network (DGCN), which decomposes users and items into multiple factor-level representations, then utilizes factor-level attention and attentive relation aggregation to model implicit factors behind collaborative signals in fine-grained level. Moreover, to reflect more diverse implicit factors, we augment the model with disagreement regularization. We conduct experiments on three public accessible datasets and the results demonstrate the significant improvement of our method over several state-of-the-art baselines. Further studies verify the efficacy and interpretability benefits bought from the fine-grained implicit relation modeling. Our Code is available on https://github.com/cmaxhao/DCF.
To aid users in choice-making, explainable recommendation models seek to provide not only accurate recommendations but also accompanying explanations that help to make sense of those recommendations. Most of the previous approaches rely on evaluative explanations, assessing the quality of an individual item along some aspects of interest to the user. In this work, we are interested in comparative explanations, the less studied problem of assessing a recommended item in comparison to another reference item.
In particular, we propose to anchor reference items on the previously adopted items in a user's history. Not only do we aim at providing comparative explanations involving such items, but we also formulate comparative constraints involving aspect-level comparisons between the target item and the reference items. The framework allows us to incorporate these constraints and integrate them with recommendation objectives involving both types of subjective and objective aspect-level quality assumptions. Experiments on public datasets of several product categories showcase the efficacies of our methodology as compared to baselines at attaining better recommendation accuracies and intuitive explanations.
Online Travel Platforms are virtual two-sided marketplaces where guests search for accommodations and accommodation providers list their properties such as hotels and vacation rentals. The large majority of hotels are rated by official institutions with a number of stars indicating the quality of service they provide. It is a simple and effective mechanism that contributes to match supply with demand by helping guests to find options meeting their criteria and accommodation suppliers to market their product to the right segment directly impacting the number of transactions on the platform. Unfortunately, no similar rating system exists for the large majority of vacation rentals, making it difficult for guests to search and compare options and hard for vacation rentals suppliers to market their product effectively. In this work we describe a machine learned quality rating system for vacation rentals. The problem is challenging, mainly due to explainability requirements and the lack of ground truth. We present techniques to address these challenges and empirical evidence of their efficacy. Our system was successfully deployed and validated through Online Controlled Experiments performed in Booking.com, a large Online Travel Platform, and running for more than one year, impacting more than a million accommodations and millions of guests.
In the Click-Through Rate (CTR) prediction scenario, user's sequential behaviors are well utilized to capture the user interest in the recent literature. However, despite being extensively studied, these sequential methods still suffer from three limitations. First, existing methods mostly utilize attention on the behavior of users, which is not always suitable for CTR prediction, because users often click on new products that are irrelevant to any historical behaviors. Second, in the real scenario, there are numerous users that have operations a long time ago, but turn relatively inactive in recent times. Thus, it is hard to precisely capture user's current preferences through early behaviors. Third, multiple representations of user's historical behaviors in different feature subspaces are largely ignored. To remedy these issues, we propose a Multi-Interactive Attention Network (MIAN) to comprehensively extract the latent relationship among all kinds of fine-grained features (e.g., gender, age and occupation in user-profile). Specifically, MIAN contains a Multi-Interactive Layer (MIL) that integrates three local interaction modules to capture multiple representations of user preference through sequential behaviors and simultaneously utilize the fine-grained user-specific as well as context information. In addition, we design a Global Interaction Module (GIM) to learn the high-order interactions and balance the different impacts of multiple features. Finally, Offline experiment results from three datasets, together with an Online A/B test in a large-scale recommendation system, demonstrate the effectiveness of our proposed approach.
In e-commerce advertising, the ad platform usually relies on auction mechanisms to optimize different performance metrics, such as user experience, advertiser utility, and platform revenue. However, most of the state-of-the-art auction mechanisms only focus on optimizing a single performance metric, e.g., either social welfare or revenue, and are not suitable for e-commerce advertising with various, dynamic, difficult to estimate, and even conflicting performance metrics. In this paper, we propose a new mechanism called Deep GSP auction, which leverages deep learning to design new rank score functions within the celebrated GSP auction framework. These new rank score functions are implemented via deep neural network models under the constraints of monotone allocation and smooth transition. The requirement of monotone allocation ensures Deep GSP auction nice game theoretical properties, while the requirement of smooth transition guarantees the advertiser utilities would not fluctuate too much when the auction mechanism switches among candidate mechanisms to achieve different optimization objectives. We deployed the proposed mechanisms in a leading e-commerce ad platform and conducted comprehensive experimental evaluations with both offline simulations and online A/B tests. The results demonstrated the effectiveness of the Deep GSP auction compared to the state-of-the-art auction mechanisms.
Recommender models trained on historical observational data alone can be brittle when domain experts subject them to counterfactual evaluation. In many domains, experts can articulate common, high-level mappings or rules between categories of inputs (user's history) and categories of outputs (preferred recommendations). One challenge is to determine how to train recommender models to adhere to these rules. In this work, we introduce the goal of domain-specific concordance: the expectation that a recommender model follow a set of expert-defined categorical rules. We propose a regularization-based approach that optimizes for robustness on rule-based input perturbations. To test the effectiveness of this method, we apply it in a medication recommender model over diagnosis-medicine categories, and in movie and music recommender models, on rules over categories based on movie tags and song genres. We demonstrate that we can increase the category-based robustness distance by up to 126% without degrading accuracy, but rather increasing it by up to 12% compared to baseline models in the popular MIMIC-III, MovieLens-20M and Last.fm Million Song datasets.
Query autocompletion is an essential feature in search engines that predicts and suggests query completions to a user's incomplete prefix input, a critical feature to enhance the user experience. While a generic lookup-based system can provide completions with great efficiency, it is unable to address prefixes not seen in the past. On the other hand, a generative system can complete unseen queries with superior accuracy but requires substantial computational overhead at runtime, making it costly for a large-scale system. Here, we present an efficient, fully-generative query autocompletion framework. Our framework employs an n-gram language model at a subword-level and exploits the n-gram model's inherent data structure to precompute completions prior to runtime. Evaluation results on public dataset show that our framework is not only as effective as previous systems with neural language models, but also reduces computational overhead at runtime, expediting the speed by more than two orders of magnitude. The goal of this work is to showcase a generative query completion system that is an attractive choice for large-scale deployments.
Textual explanations have proved to help improve user satisfaction on machine-made recommendations. However, current mainstream solutions loosely connect the learning of explanation with the learning of recommendation: for example, they are often separately modeled as rating prediction and content generation tasks. In this work, we propose to strengthen their connection by enforcing the idea of sentiment alignment between a recommendation and its corresponding explanation. At training time, the two learning tasks are joined by a latent sentiment vector, which is encoded by the recommendation module and used to make word choices for explanation generation. At both training and inference time, the explanation module is required to generate explanation text that matches sentiment predicted by the recommendation module. Extensive experiments demonstrate our solution outperforms a rich set of baselines in both recommendation and explanation tasks, especially on the improved quality of its generated explanations. More importantly, our user studies confirm our generated explanations help users better recognize the differences between recommended items and understand why an item is recommended.
Sharing recommendation is becoming ubiquitous at almost every e-commerce website, where a user will be recommended a list of users when he wants to share something with others. With the tremendous growth of online shopping users, sharing recommendation confronts several distinct difficulties: 1) how to establish a unified recommender model for large numbers of sharing scenarios; 2) how to handle with long-tail even cold start scenarios with limited training data; 3) how to incorporate social influence in order to make more accurate recommendations.
To tackle with the above challenges, we firstly build multiple expert networks to integrate different scenarios. During model training one specific scenario can learn to differentiate importance of each expert network automatically based on corresponding context information. With respect to the long-tail issue, we propose to maintain a complete scenario tree such that each scenario can utilize context knowledge from root node to leaf node to select the expert networks. At the same time, making use of the tree-based full path message contributes to alleviating training data sparsity problem. Moreover, we construct a large-scale heterogeneous user-to-user graph which is derived from various social behaviors at e-commerce websites. Then a novel scenario-aware multi-view graph attention network is leveraged to augment user representations socially. In addition, an auxiliary inconsistency loss is applied to balance the load of expert networks, along with main click-through rate (CTR) prediction loss, the whole framework is trained in an end-to-end fashion. Both offline experiments and online A/B test results demonstrate the superiority of proposed approach over a bunch of state-of-the-art models.
Given a collection of items to display, such as news, videos, or products, how can we optimize their presentation order to maximize user engagements, such as click-through rate, viewing time, and the number of purchases? The problem becomes more complicated when the items are displayed in a grid-based, 2-dimensional presentation on a widescreen. For example, many E-Commerce websites such as Amazon and Etsy are displaying their products in a grid-like format, and so are streaming services like Youtube and Netflix. Unlike 1-dimensional space, where products can be naturally ranked in a vertical order, the presentation in 2-dimensional space poses a novel challenge about how to find the best presentation order - should we put the best listing on the top left corner, or the central position on the second row? We are aware that many traditional methods can be applied to solve the problem, such as conducting an attention heatmap web test, or a randomization experiment by shuffling positions of listings. However, both tests are costly to perform and they may downgrade the quality of users' search experience. By contrast, we focus on utilizing existing search log data to reveal propensity of positions, which is readily available and ubiquitously abundant.
In a nutshell, the study presents how we find an optimal way of presentation in a grid-based environment - more relevant content should be placed in a more noticeable position. The position noticeability is further quantified to help ranking models better understand the signal of relevance manifested in user feedbacks. Our investigation paves the way for an end-to-end item presentation framework that learns the optimal layout for optimizing user engagements. Experimental results based on real-world data show the superiority of the proposed approach over state-of-the-art methods.
Recent advances in path-based explainable recommendation systems have attracted increasing attention thanks to the rich information provided by knowledge graphs. Most existing explainable recommendation only utilizes static knowledge graph and ignores the dynamic user-item evolutions, leading to less convincing and inaccurate explanations. Although there are some works that realize that modelling user's temporal sequential behaviour could boost the performance and explainability of the recommender systems, most of them either only focus on modelling user's sequential interactions within a path or independently and separately of the recommendation mechanism. In this paper, we propose a novel Temporal Meta-path Guided Explainable Recommendation (TMER), which utilizes well-designed item-item path modelling between consecutive items with attention mechanisms to sequentially model dynamic user-item evolutions on dynamic knowledge graph for explainable recommendations. Compared with existing works that use heavy recurrent neural networks to model temporal information, we propose simple but effective neural networks to capture users' historical item features and path-based context to characterise next purchased item. Extensive evaluations of TMER on three real-world benchmark datasets show state-of-the-art performance compared against recent strong baselines.
Studying information propagation in social media is an important task with plenty of applications for business and science. Generating realistic synthetic information cascades can help the research community in developing new methods and applications, testing sociological hypotheses and different what-if scenarios by simply changing few parameters. We demonstrate womg, a synthetic data generator which combines topic modeling and a topic-aware propagation model to create realistic information-rich cascades, whose shape depends on many factors, including the topic of the item and its virality, the homophily of the social network, the interests of its users and their social influence.
To address the increasingly significant issue of fake news, we develop a news reading platform in which we propose an implicit approach to reduce people's belief in fake news. Specifically, we leverage reinforcement learning to learn an intervention module on top of a recommender system (RS) such that the module is activated to replace RS to recommend news toward the verification once users touch the fake news. To examine the effect of the proposed method, we conduct a comprehensive evaluation with 89 human subjects and check the effective rate of change in belief but without their other limitations. Moreover, 84% participants indicate the proposed platform can help them defeat fake news. The demo video is available on YouTube https://youtu.be/wKI6nuXu_SM.
We present Community Connect, a custom social media platform for conducting controlled experiments of human behavior. The key distinguishing factor of Community Connect is the ability to control the visibility of user posts based on the groups they belong to, allowing careful and controlled investigation into how information propagates through a social network. We release this platform as a resource to the broader community, to facilitate research on data collected through controlled experiments on social networks.
Incorporating users' personal facts enhances the quality of many downstream services. Automated extraction of such personal knowledge has recently received considerable attention. However, often the operation of extraction models is not exposed to the user, making predictions inexplicable. In this work we present a web demonstration platform showcasing a recent personal knowledge extraction model, CHARM, which provides information on how the prediction was made and which data was decisive for it. Our demonstration explores two potential sources of input data: conversational transcripts and social media submissions.
Patients with progressive neurological disorders such as Parkinson's disease, Huntington's disease, and Amyotrophic Lateral Sclerosis (ALS) suffer both chronic and episodic difficulties with locomotion. Real-time assessment and visualization of sensor data can be valuable to physicians monitoring the progression of these conditions. We present a system that utilizes the attention based bi-directional recurrent neural network (RNN) presented in [2] to evaluate foot pressure sensor data streamed directly from a pair of sensors attached to a patient. The demonstration also supports indirect streaming from recorded sessions, such as those stored in a FHIR [1] enabled electronic medical records repository, for post-hoc evaluation and comparison of a patient's gait over time. The system evaluates and visualizes the streamed gait in a real time web interface to provide a personalized normality rating that highlights the strengths and weaknesses of a patient's gait.
The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest. Moreover, attention trends of different items exhibit mutual influence via mechanisms such as hyperlinks or recommendations. Many visualisation tools exist for time series, network evolution, or network influence; however, few systems connect all three. In this work, we present AttentionFlow, a new system to visualise networks of time series and the dynamic influence they have on one another. Centred around an ego node, our system simultaneously presents the time series on each node using two visual encodings: a tree ring for an overview and a line chart for details. AttentionFlow supports interactions such as overlaying time series of influence, and filtering neighbours by time or flux. We demonstrate AttentionFlow using two real-world datasets, VevoMusic and WikiTraffic. We show that attention spikes in songs can be explained by external events such as major awards, or changes in the network such as the release of a new song. Separate case studies also demonstrate how an artist's influence changes over their career, and that correlated Wikipedia traffic is driven by cultural interests. More broadly, AttentionFlow can be generalised to visualise networks of time series on physical infrastructures such as road networks, or natural phenomena such as weather and geological measurements.
Machine learning predictors have been increasingly applied in production settings, including in one of the world's largest hiring platforms, Hired, to provide a better candidate and recruiter experience. The ability to provide actionable feedback is desirable for candidates to improve their chances of achieving success in the marketplace. Until recently, however, methods aimed at providing actionable feedback have been limited in terms of realism and latency. In this work, we demonstrate how, by applying a newly introduced method based on Generative Adversarial Networks (GANs), we are able to overcome these limitations and provide actionable feedback in real-time to candidates in production settings. Our experimental results highlight the significant benefits of utilizing a GAN-based approach on our dataset relative to two other state-of-the-art approaches (including over 1000x latency gains). We also illustrate the potential impact of this approach in detail on two real candidate profile examples.
Leakage of personal information in conversations raises serious privacy concerns. Malicious people or bots could pry into sensitive personal information of vulnerable people, such as juveniles, through conversations with them or their digital personal assistants. To address the problem, we present a privacy-leakage warning system that monitors conversations in social media and intercepts the outgoing text messages from a user or a digital assistant, if they impose potential privacy leakage risks. Such messages are redirected to authorized users for approval, before they are sent out. We demonstrate how our system is deployed and used on a social media conversation platform, e.g., Facebook Messenger.
Modeling online discourse dynamics is a core activity in understanding the spread of information, both offline and online, and emergent online behavior. There is currently a disconnect between the practitioners of online social media analysis --- usually social, political and communication scientists --- and the accessibility to tools capable of examining online discussions of users. Here we present evently, a tool for modeling online reshare cascades, and particularly retweet cascades, using self-exciting processes. It provides a comprehensive set of functionalities for processing raw data from Twitter public APIs, modeling the temporal dynamics of processed retweet cascades and characterizing online users with a wide range of diffusion measures. This tool is designed for researchers with a wide range of computer expertise, and it includes tutorials and detailed documentation. We illustrate the usage of evently with an end-to-end analysis of online user behavior on a topical dataset relating to COVID-19. We show that, by characterizing users solely based on how their content spreads online, we can disentangle influential users and online bots.
Web archiving is the process of gathering data from the Web, storing it and ensuring the data is preserved in an archive for future explorations. Despite the increasing number of web archives, the absence of meaningful exploration methods remains a major hurdle in the way of turning them into a useful information source. With the creation of profiles describing metadata information about the archived documents it is possible to offer a more exploitable environment that goes beyond the simple keyword-based search. By exploring the expressive power of SPARQL language and providing a user friendly web-based search interface, users can run sophisticated queries searching for documents that meet their information needs.
The need for improved segmentation, targeting, personalization fuel the practice of data sharing among companies. Concurrently, data sharing faces the headwind of new laws emphasizing users' privacy in data. Under the premise that sharing of data occurs from a provider to a recipient, we propose a practicable approach of generating representational data for sharing that achieves value-addition for the recipient's tasks while preserving privacy of users. Prior art shows that the mechanism to improve value-addition inevitably weakens privacy in the generated data. In a first of a kind contribution, our system offers tunable controls to adjust the extent of privacy desired by the provider and the extent of value-addition expected by the recipient. Our experiments on a public data show that under common organizational practice of data-sharing, data generation for value-addition is achievable while preserving privacy. Our demonstration starkly shows the trade-off between privacy-protection and value addition, through user-controlled knobs and offers a prototype of a platform for data sharing which is mindful of this trade-off.
Automatic hate speech detection has become a crucial task nowadays, due the increase of hate on the Internet and its negative consequences. Therefore, in our PhD we propose the design and implementation of methods for the automatic processing of hate messages. The study is focused on the hate messages on Twitter. The hypothesis on which the research is based is that the prediction of hate speech, considering textual content, can be improved by the combination of features such as the activity and communities of users, as well as the images that can be shared with the tweets. In this way, we intend to develop strategies for the automatic detection of hate with multimodal and also multilingual (both in English and Spanish) approaches. Furthermore, our research includes the study of counter-narrative as an alternative to mitigate the effects of hate speech. To address the problem, we employ deep learning techniques, deepening the study of approaches based on representation with graphs.
Recommendation Systems (RS) are designed to assist users in decision making by recommending the most appropriate information or products for them. Nonetheless, many RS suffer from limitations such as data sparsity and cold-start. Side information (SI) can be integrated into a recommender system to tackle these limitations. In my Ph.D. research, I seek to build on and extend the use of SI for RS. Specifically, I propose new types and representations of SI and develop new methods to integrate SI into RS to boost its performance. This paper presents the conceptual foundation and motivation of my Ph.D. research.
There is a diverse variety of demographic data that can be analyzed with modern methods of data mining to achieve better results. On the one hand, the main chosen task is to compare different methods for the next event prediction and gender prediction, on the other hand, we pay special attention to interpretable patterns describing demographic behavior in the studied problems. There were considered interpretable methods as decision trees and their ensembles and semi- or non-interpretable methods, such as the SVM method with different customized kernels tailored for demographers' needs and neural networks, respectively. The best accuracy results were obtained with two-channel Convolutional Neural Networks.
Multimodal machine learning deals with building models that can process information from multiple modalities (i.e., ways of doing or experiencing something). Experiments involving humans are used to guarantee drug safety in the complex task of drug development. Drug-related data is readily available and comes in various modalities. The proposed study aims to develop novel methods for multimodal machine learning that can be used to process the diverse multimodal data used in drug development and other challenging tasks that could benefit from the use of multimodal data. We present a series of drug-related tasks which are used to both evaluate the models proposed in this ongoing study and discover new drug knowledge. This research will make far-reaching contributions to the field of machine learning, as well as practical contributions in the medical domain.
Fairness is a critical system-level objective in recommender systems that has been the subject of extensive recent research. It is especially important in multi-sided recommendation platforms where it may be important to optimize utilities not just for the end user, but also for other entities such as item sellers or producers who desire a fair representation of their items. Existing solutions either lack the multi-sided nature of fairness in recommendations, or do not properly address various aspects of multi-sided fairness in recommendations. In this thesis, we aim at first investigating the impact of unfair recommendations on the system and how it can negatively affect major entities in the system. Then, we seek to propose a general graph-based solution that works as a post processing approach after recommendation generation to tackle the unfairness of recommendations. We plan to perform extensive experiments to evaluate the effectiveness of the proposed approach.
Graphs are ubiquitous data structures in various fields, such as social media, transportation, linguistics and chemistry. To solve downstream graph-related tasks, it is of great significance to learn effective representations for graphs. My research strives to help meet this demand; due to the huge success of deep learning methods, especially graph neural networks, in graph-related problems, my emphasis has primarily been on improving their power for graph representation learning. More specifically, my research spans across the following three main areas: (1) robustness of graph neural networks, where we seek to study the performance of them under random noise and carefully-crafted attacks; (2) self-supervised learning in graph neural networks, where we aim to alleviate their need for costly annotated data by constructing self-supervision to help them fully exploit unlabeled data; and (3) applications of graph neural networks, where my work is to apply graph neural networks in various applications such as traffic flow prediction. This research statement, 'Graph Mining with Graph Neural Networks', is focused on my research endeavors specifically related to the aforementioned three topics.
User intention is an important factor to be considered for recommender systems. Different from inherent user preference addressed in traditional recommendation algorithms, which is generally static and consistent, user intention always changes dynamically in different contexts. Recent studies (represented by sequential recommendation) begin to focus on predicting what users want beyond what users like, which can better capture dynamic user intention and have attracted a surge of interest. However, user intention modeling is non-trivial because it is generally influenced by various factors, such as repeat consumption behavior, item relation, temporal dynamics, etc. To better capture dynamic user intention in sequential recommendation, we plan to investigate the influential factors and construct corresponding models to improve the performance. We also want to develop an adaptive way to model temporal evolutions of the effects caused by different factors. Based on the above investigations, we further plan to integrate these factors to deal with extremely long history sequences, where long-term user preference and short-term user demand should be carefully balanced.
Personalization is one of the key applications in machine learning with widespread usage across e-commerce, entertainment, production, healthcare and many other industries. While various machine learning techniques present novel state-of-the-art advances and super-human performance year-over-year, personalization and recommender-systems applications are often late-adopters of novel solutions due to problem hardness and implementation complexity. This tutorial presents recent advances across the personalization industry and demonstrates their practical applications in real case-studies of world-leading online platforms. Key trends such as deep learning, causality and active exploration with bandits are depicted with real examples and demonstrated alongside their business considerations and implementation challenges.Rising topics like explainability, fairness, natural interfaces and content generation are covered, touching on aspects of both technology and user experience. Our tutorial relies on recent advances in the field and on work conducted at Booking.com, where we implement personalization models on one of the world's leading online travel platform.
In this tutorial we aim to present a comprehensive survey of the advances in deep learning techniques specifically designed for anomaly detection (deep anomaly detection for short). Deep learning has gained tremendous success in transforming many data mining and machine learning tasks, but popular deep learning techniques are inapplicable to anomaly detection due to some unique characteristics of anomalies, e.g., rarity, heterogeneity, boundless nature, and prohibitively high cost of collecting large-scale anomaly data. Through this tutorial, audiences would gain a systematic overview of this area, learn the key intuitions, objective functions, underlying assumptions, advantages and disadvantages of different categories of state-of-the-art deep anomaly detection methods, and recognize its broad real-world applicability in diverse domains. We also discuss what challenges the current deep anomaly detection methods can address and envision this area from multiple different perspectives. Any audience who may be interested in deep learning, anomaly/outlier/novelty detection, out-of-distribution detection, representation learning with limited labeled data, and self-supervised representation learning would find it very helpful in attending this tutorial. Researchers and practitioners in finance, cybersecurity, healthcare would also find the tutorial helpful in practice.
Peer review is the backbone of scientific research. Yet peer review is called "biased," "broken," and "unscientific" in many scientific disciplines. This problem is further compounded with the near-exponentially growing number of submissions in various computer science conferences. Due to the prevalence of "Matthew effect'' of rich getting richer in academia, any source of unfairness in the peer review system, such as those discussed in this tutorial, can considerably affect the entire career trajectory of (young) researchers.
This tutorial will discuss a number of systemic challenges in peer review such as biases, subjectivity, miscalibration, dishonest behavior, and noise. For each issue, the tutorial will first present insightful experiments to understand the issue. Then the tutorial will present computational techniques designed to address these challenges. Many open problems will be highlighted which are envisaged to be exciting to the WSDM audience, and will lead to significant impact if solved.
Recent years have witnessed the emerging of conversational systems, including both physical devices and mobile-based applications. Both the research community and industry believe that conversational systems will have a major impact on human-computer interaction, and specifically, the IR/DM/RecSys communities have begun to explore Conversational Recommendation Systems. Conversational recommendation aims at finding or recommending the most relevant information (e.g., web pages, answers, movies, products) for users based on textual- or spoken-dialogs, through which users can communicate with the system more efficiently using natural language conversations. Due to users' constant need to look for information to support both work and daily life, conversational recommendation system will be one of the key techniques towards an intelligent web. The tutorial focuses on the foundations and algorithms for conversational recommendation, as well as their applications in real-world systems such as search engine, e-commerce and social networks. The tutorial aims at introducing and communicating conversational recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.
Probability Ranking Principle (PRP), which assumes that each document has a unique and independent probability to satisfy a particular information need, is one of the fundamental principles for ranking. Traditionally, heuristic ranking features and well-known learning-to-rank approaches have been designed by following the PRP principle. Recently, neural IR models, which adopt deep learning to enhance the ranking performances, also obey the PRP principle. Though it has been widely used for nearly five decades, in-depth analysis shows that PRP is not an optimal principle for ranking, due to its independent assumption that each document should be independent of the rest candidates. Counter examples include pseudo relevance feedback, interactive information retrieval, search result diversification, etc. To solve the problem, researchers recently proposed to model the dependencies among the documents during the designing of ranking models. A number of ranking models have been proposed and state-of-the-art ranking performances have been achieved. This tutorial aims to give a comprehensive survey on these recently developed ranking models that go beyond the PRP principle. The tutorial tries to categorize these models based on their intrinsic assumptions: assuming that the documents are independent, sequentially dependent, or globally dependent. In this way, we expect the researchers focusing on ranking in search and recommendation can have a novel angle of view on the designing of ranking models, and therefore can stimulate new ideas on developing novel ranking models. The material of this tutorial can be found in https://github.com/pl8787/wsdm2021-beyond-prp-tutorial.
Learning from graph and relational data plays a major role in many applications including social network analysis, marketing, e-commerce, information retrieval, knowledge modeling, medical and biological sciences, engineering, and others. Recently, Graph Neural Networks (GNNs) have emerged as a promising new learning framework capable of bringing the power of deep representation learning to graph and relational data. This ever-growing body of research has shown that GNNs achieve state-of-the-art performance for problems such as link prediction, fraud detection, target-ligand binding activity prediction, knowledge-graph completion, and product recommendations. In practice, many of the real-world graphs are very large. It is urgent to have scalable solutions to train GNN on large graphs efficiently.
The objective of this tutorial is twofold. First, it will provide an overview of the theory behind GNNs, discuss the types of problems that GNNs are well suited for, and introduce some of the most widely used GNN model architectures and problems/applications that are designed to solve. Second, it will introduce the Deep Graph Library (DGL), a scalable GNN framework that simplifies the development of efficient GNN-based training and inference programs at a large scale. To make things concrete, the tutorial will cover state-of-the-art training methods to scale GNN to large graphs and provide hands-on sessions to show how to use DGL to perform scalable training in different settings (multi-GPU training and distributed training). This hands-on part will start with basic graph applications (e.g., node classification and link prediction) to set up the context and move on to train GNNs on large graphs. It will provide tutorials to demonstrate how to apply the techniques in DGL to train GNNs for real-world applications.
Commonsense knowledge is a foundational cornerstone of artificial intelligence applications. Whereas information extraction and knowledge base construction for instance-oriented assertions, such as Brad Pitt's birth date, or Angelina Jolie's movie awards, has received much attention, commonsense knowledge on general concepts (politicians, bicycles, printers) and activities (eating pizza, fixing printers) has only been tackled recently. In this tutorial we present state-of-the-art methodologies towards the compilation and consolidation of such commonsense knowledge (CSK). We cover text-extraction-based, multi-modal and Transformer-based techniques, with special focus on the issues of web search and ranking, as of relevance to the WSDM community.
The goal of this tutorial is to provide the WSDM community with recent advances on the assessment and mitigation of data and algorithmic bias in recommender systems. We first introduce conceptual foundations, by presenting the state of the art and describing real-world examples of how bias can impact on recommendation algorithms from several perspectives (e.g., ethical and system objectives). The tutorial continues with a systematic showcase of algorithmic countermeasures to uncover, assess, and reduce bias along the recommendation design process. A practical part then provides attendees with implementations of pre-, in-, and post-processing bias mitigation algorithms, leveraging open-source tools and public datasets; in this part, tutorial participants are engaged in the design of bias countermeasures and in articulating impacts on stakeholders. We conclude the tutorial by analyzing emerging open issues and future directions in this rapidly evolving research area (Website: https://biasinrecsys.github.io/wsdm2021).
We present Neural Structured Learning (NSL) in TensorFlow [1], a new learning paradigm to train neural networks by leveraging structured signals in addition to feature inputs. Structure can be explicit as represented by a graph, or implicit, either induced by adversarial perturbation or inferred using techniques like embedding learning. NSL is open-sourced as part of the TensorFlow [2] ecosystem and is widely used in Google across many products and services. In this tutorial, we provide an overview of the NSL framework including various libraries, tools, and APIs as well as demonstrate the practical use of NSL in different applications. The NSL website is hosted at www.tensorflow.org/neural_structured_learning, which includes details about the theoretical foundations of the technology, extensive API documentation, and hands-on tutorials.
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This tutorial, based on a forthcoming book, provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. We provide a synthesis of existing work as a single point of entry for both researchers and practitioners. Our coverage is grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly. Two themes pervade our treatment: techniques for handling long documents and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of their application are well understood. Nevertheless, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, we also attempt to prognosticate the future.
Over the years, the Web has become a premier source of information in almost every area we can think about. When considering tourism, the Web became the primary source of information for travelers. When planning trips, people search for information about destinations, accommodations, attractions, means of transportation, in short, everything related to their future trip. Once done searching they reserve almost everything online. The blessing of the easily accessible information comes with the curse of information overload. This brings Web search techniques and recommendation systems come into play. This is especially true recently with the appearance of COVID-19 and the uncertainty and transformative power it brings to travelling. WebTour 2021 brings together researchers and practitioners working on developing and improving tools and techniques for improving users ability to better find relevant information that matches their needs.
The second Workshop on Integrity in Social Networks and Media is held in conjunction with the 14th ACM Conference on Web Search and Data Mining (WSDM) in Jerusalem, Israel. The goal of the workshop is to bring together researchers and practitioners to discuss content and interaction integrity challenges in social networks and social media platforms.
Recent years have witnessed the success of machine learning and especially deep learning in many research areas such as Vision and Language Processing, Information Retrieval and Recommender Systems, Social Networks and Conversational Agents. Though various learning approaches have demonstrated satisfying performance in perceptual tasks such as associative learning and matching by extracting useful similarity patterns from data, the area still sees a large amount of research needed to advance the ability of reasoning towards cognitive intelligence in the coming years. This includes but is not limited to neural logical reasoning, neural-symbolic reasoning, causal reasoning, knowledge reasoning and commonsense reasoning. The workshop focuses on the research of machine reasoning techniques and their application in various intelligent tasks. It will gather researchers as well as practitioners in the field for discussions, idea communications, and research promotions. It will also generate insightful debates about the recent progress in machine intelligence to a broader community, including but not limited to CV, IR, NLP, ML, DM, AI and beyond.
The workshop on Supporting and Understanding of (multi-party) conversational Dialogues (SUD) seeks to encourage researchers to investigate automated methods to analyze and understand conversations, and also explore methodologies for proactively providing assistance to the communicating parties during conversations, ranging from summarizing the minutes of meetings to automatically keeping track of action items etc. The workshop will have (1) a regular research paper track, and a more focused (2) data challenge track, inviting papers on a specific task of contextualizing entities of interest from conversation dialogues.
By offering courses and resources, learning platforms on the Web have been attracting lots of participants, and the interactions with these systems have generated a vast amount of learning-related data. Their collection, processing and analysis have promoted a significant growth of learning analytics and have opened up new opportunities for supporting and assessing educational experiences. To provide all the stakeholders involved in the educational process with a timely guidance, being able to understand student's behavior and enable models which provide data-driven decisions pertaining to the learning domain is a primary property of online platforms, aiming at maximizing learning outcomes. In this workshop, we focus on collecting new contributions in this emerging area and on providing a common ground for researchers and practitioners (Website: https://mirkomarras.github.io/l2d-wsdm2021).