WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Ethical Challenges in AI

In the first part we address four current specific challenges through examples: (1) discrimination (e.g., facial recognition, justice, sharing economy, language models); (2) stupid models (e.g., lack of semantic and context understanding); (3) physiognomy (e.g., facial bio-metrics based predictions); and (4) indiscriminate use of computing resources (e.g., large language models). These examples do have a personal bias but set the context for the second part where we address four generic challenges: (1) too many principles, (2) cultural differences; (3) regulation and (4) our cognitive biases. We finish discussing what we can do to address these challenges in the near future.

Knowledge is Power: Symbolic Knowledge Distillation, Commonsense Morality, & Multimodal Script Knowledge

Scale appears to be the winning recipe in today's AI leaderboards. And yet, extreme-scale neural models are still brittle to make errors that are often nonsensical and even counterintuitive. In this talk, I will argue for the importance of knowledge, especially commonsense knowledge, and demonstrate how smaller models developed in academia can still have an edge over larger industry-scale models, if powered with knowledge.

First, I will introduce "symbolic knowledge distillation", a new framework to distill larger neural language models into smaller commonsense models, which leads to a machine-authored KB that wins, for the first time, over a human-authored KB in all criteria: scale, accuracy, and diversity.

Next, I will present an experimental conceptual framework toward computational social norms and commonsense morality, so that neural language models can learn to reason that "helping a friend" is generally a good thing to do, but "helping a friend spread fake news" is not.

Finally, I will discuss an approach to multimodal script knowledge demonstrating the power of complex raw data, which leads to new SOTA performances on a dozen leaderboards that require grounded, temporal, and causal commonsense reasoning.

Graph Neural Network Research at AWS AI

In the course of just a few years, Graph Neural Networks (GNNs) have emerged as the prominent supervised learning approach that brings the power of deep representation learning to graph and relational data. An ever-growing body of research has shown that GNNs achieve state-of-the-art performance for problems such as link prediction, fraud detection, target-ligand binding activity prediction, knowledge-graph completion, and product recommendations. As a result, GNNs are quickly moving from the realm of academic research involving small graphs to powering commercial applications and very large graphs. This talk will provide an overview of some of the research that AWS AI has been doing to facilitate this transition, which includes developing the Deep Graph Library (DGL)-an open source framework for writing and training GNN-based models, improving the computational efficiency and scaling of GNN model training for extremely large graphs, developing novel GNN-based solutions for different applications, and making it easy for developers to train and use GNN models by integrating graph-based ML techniques in graph databases.

SESSION: Research Papers

k-Clustering with Fair Outliers

Clustering problems and clustering algorithms are often overly sensitive to the presence of outliers: even a handful of points can greatly affect the structure of the optimal solution and its cost. This is why many algorithms for robust clustering problems have been formulated in recent years. These algorithms discard some points as outliers, excluding them from the clustering. However, outlier selection can be unfair: some categories of input points may be disproportionately affected by the outlier removal algorithm.

We study the problem of k-clustering with fair outlier removal and provide the first approximation algorithm for well-known clustering formulations, such as k-means and k-median. We analyze this algorithm and prove that it has strong theoretical guarantees. We complement this result with an empirical evaluation showing that, while standard methods for outlier removal have a disproportionate impact across categories of input points, our algorithm equalizes the impact while retaining strong experimental performances on multiple real--world datasets. We also show how the fairness of outlier removal can influence the performance of a downstream learning task. Finally, we provide a coreset construction, which makes our algorithm scalable to very large datasets.

Wikipedia Reader Navigation: When Synthetic Data Is Enough

Every day millions of people read Wikipedia. When navigating the vast space of available topics using hyperlinks, readers describe trajectories on the article network. Understanding these navigation patterns is crucial to better serve readers' needs and address structural biases and knowledge gaps. However, systematic studies of navigation on Wikipedia are hindered by a lack of publicly available data due to the commitment to protect readers' privacy by not storing or sharing potentially sensitive data. In this paper, we ask: How well can Wikipedia readers' navigation be approximated by using publicly available resources, most notably the Wikipedia clickstream data? We systematically quantify the differences between real navigation sequences and synthetic sequences generated from the clickstream data, in 6 analyses across 8 Wikipedia language versions. Overall, we find that the differences between real and synthetic sequences are statistically significant, but with small effect sizes, often well below 10%. This constitutes quantitative evidence for the utility of the Wikipedia clickstream data as a public resource: clickstream data can closely capture reader navigation on Wikipedia and provides a sufficient approximation for most practical downstream applications relying on reader data. More broadly, this study provides an example for how clickstream-like data can generally enable research on user navigation on online platforms while protecting users' privacy.

Harvesting More Answer Spans from Paragraph beyond Annotation

AutomaticA nswer spanE xtraction (AE) focuses on identifying key information from paragraphs that can be asked. It has been used to facilitate downstream question generation tasks or data augmentation for question answering. Current work of AE heavily relies on the annotated answer spans fromM achineR eadingC omprehension (MRC) datasets. However, these methods suffer from the partial annotation problem due to the annotation protocols of MRC tasks. To tackle this problem, we propose \mymethod, a S tructured Co ntext graph network with P ositive -unlabeled learning. \mymethod first represents the paragraph by constructing a graph with both syntactic and semantic edges, then adopts a unified pointer network for answer span identification. \mymethod narrows the discrenpency between AE and MRC by formulating AE as aP ositive-\textitu nlabeled (PU) learning problem, thus recovering more answer spans from paragraphs. To evaluate newly extracted spans without annotation, we also present an automatic metric from the perspective of question answering and text summarization, which correlates well with human judgments. Comprehensive experiments on both AE and downstream tasks demonstrate the effectiveness of our proposed framework. Our code is available at \url

Sampling Multiple Nodes in Large Networks: Beyond Random Walks

Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can be prohibitive for large real-world networks. We propose a new method for sampling multiple nodes that bypasses the dependence in the mixing time by explicitly searching for less accessible components in the network. We test our approach on a variety of real-world and synthetic networks with up to tens of millions of nodes, demonstrating a query complexity improvement of up to x20 compared to the state of the art.

'It's on the tip of my tongue': A new Dataset for Known-Item Retrieval

The tip of the tongue known-item retrieval (TOT-KIR) task involves the 'one-off' retrieval of an item for which a user cannot recall a precise identifier. The emergence of several online communities where users pose known-item queries to other users indicates the inability of existing search systems to answer such queries. Research in this domain is hampered by the lack of large, open or realistic datasets. Prior datasets relied on either annotation by crowd workers, which can be expensive and time-consuming, or generating synthetic queries, which can be unrealistic. Additionally, small datasets make the application of modern (neural) retrieval methods unviable, since they require a large number of data-points. In this paper, we collect the largest dataset yet with 15K query-item pairs in two domains, namely, Movies and Books, from an online community using heuristics, rendering expensive annotation unnecessary while ensuring that queries are realistic. We show that our data collection method is accurate by conducting a data study. We further demonstrate that methods like BM25 fall short of answering such queries, corroborating prior research. The size of the dataset makes neural methods feasible, which we show outperforms lexical baselines, indicating that neural/dense retrieval is superior for the TOT-KIR task.

CAN: Feature Co-Action Network for Click-Through Rate Prediction

Feature interaction has been recognized as an important problem in machine learning, which is also very essential for click-through rate (CTR) prediction tasks. In recent years, Deep Neural Networks (DNNs) can automatically learn implicit nonlinear interactions from original sparse features, and therefore have been widely used in industrial CTR prediction tasks. However, the implicit feature interactions learned in DNNs cannot fully retain the complete representation capacity of the original and empirical feature interactions (e.g., cartesian product) without loss. For example, a simple attempt to learn the combination of feature A and feature B < A, B > as the explicit cartesian product representation of new features can outperform previous implicit feature interaction models including factorization machine (FM)-based models and their variations. This indicates there is still a big gap between explicit and implicit feature interaction models. However, to learn all the explicit feature interaction (cartesian product) representations requires a very large sample size along with N times of original parameter space (where N is quite large in most industrial applications). In this paper, we propose a Co-Action Network (CAN) to approximate the explicit pairwise feature interactions without introducing too many additional parameters. More specifically, giving feature A and its associated feature B, their feature interaction is modeled by learning two sets of parameters: 1) the embedding of feature A, and 2) a Multi-Layer Perceptron (MLP) to represent feature B. The approximated feature interaction can be obtained by passing the embedding of feature A through the MLP network of feature B. We refer to such pairwise feature interaction as feature co-action, and such a Co-Action Network unit can provide a very powerful capacity to fitting complex feature interactions. In addition, FM can be viewed as a special case of the CAN unit when the MLP is a single layer with only one output. Experimental results on public and industrial datasets show that CAN outperforms state-of-the-art CTR models and the cartesian product method. Moreover, CAN has been deployed in the display advertisement system in Alibaba, obtaining 12% improvement on CTR and 8% on Revenue Per Mille (RPM), which is a great improvement to the business. The code for experiments in this paper is open-sourced\footnote

Towards Understanding and Answering Comparative Questions

In this paper, we analyze comparative questions and answers. At least 3%~of the questions submitted to search engines are comparative; ranging from simple facts like "Did Messi or Ronaldo score more goals in 2021?'' to life-changing and probably highly subjective questions like "Is it better to move abroad or stay?''. Ideally, answers to subjective comparative questions would reflect diverse opinions so that the asker can come to a well-informed decision. To better understand the information needs behind comparative questions, we develop approaches to extract the mentioned comparison objects and aspects. As a first step to answer comparative questions, we develop an approach that detects the stances of potential result nuggets (i.e., text passages containing the comparison objects). Our approaches are trained and evaluated on a set of 31,000~English questions from existing datasets that we label as comparative or not. In the 3,500~comparative questions, we label the comparison objects, aspects, and predicates. For 950~questions, we collect answers from online forums and label the stance towards the comparison objects. In the experiments, our approaches recall~71% of the comparative questions with a perfect precision of~1.0, recall~92% of subjective comparative questions with a precision of~0.98, and identify the comparison objects and aspects with an F1 of~0.93 and~0.80, respectively. The stance detector fine-tuned on pairs of objects and answers achieves an accuracy of~0.63.

Graph Collaborative Reasoning

Graphs can represent relational information among entities and graph structures are widely used in many intelligent tasks such as search, recommendation, and question answering. However, most of the graph-structured data in practice suffer from incompleteness, and thus link prediction becomes an important research problem. Though many models are proposed for link prediction, the following two problems are still less explored: (1) Most methods model each link independently without making use of the rich information from relevant links, and (2) existing models are mostly designed based on associative learning and do not take reasoning into consideration. With these concerns, in this paper, we propose Graph Collaborative Reasoning (GCR), which can use the neighbor link information for relational reasoning on graphs from logical reasoning perspectives. We provide a simple approach to translate a graph structure into logical expressions so that the link prediction task can be converted into a neural logic reasoning problem. We apply logical constrained neural modules to build the network architecture according to the logical expression and use backpropagation to efficiently learn the model parameters, which bridges differentiable learning and symbolic reasoning in a unified architecture. To show the effectiveness of our work, we conduct experiments on graph-related tasks such as link prediction and recommendation based on commonly used benchmark datasets, and our graph collaborative reasoning approach achieves state-of-the-art performance.

A Personalized Cross-Platform Post Style Transfer Method Based on Transformer and Bi-Attention Mechanism

To meet different social purposes, users usually share content related to the same topic or event to multiple social media platforms (cross-platform content sharing). As the differences of social norms and audiences among these social ecosystems, there are differences in the use of words and expressions in different platforms, resulting in different language styles among different platforms. In reality, it is usually difficult for users to grasp the consistency between the language style of posts to be published and that of a platform as the problem of context collapse. To address this problem, firstly, we conduct an study to investigate users' content sharing practices across two Chinese popular social media platforms (Douban and Weibo). The results indicate that: 1) there are significant linguistic differences between different platforms; 2) users' content sharing practices are personalized, and the style of their newly shared content is correlated with their historical posts. Secondly, based on the above findings, we propose a personalized cross-platform post style transfer model. The model can automatically transfer users' posts from one platform's language style to the target platform's language style, while preserving the content and reflecting users' personalized characteristics as much as possible. Experiments on the datasets collected from Douban and Weibo show that our model generally outperforms other comparison models on both style transfer and personalization metrics.

Modeling Scale-free Graphs with Hyperbolic Geometry for Knowledge-aware Recommendation

Aiming to alleviate data sparsity and cold-start problems of tradi- tional recommender systems, incorporating knowledge graphs (KGs) to supplement auxiliary information has recently gained considerable attention. Via unifying the KG with user-item interactions into a tripartite graph, recent works explore the graph topologies to learn the low-dimensional representations of users and items with rich semantics. These real-world tripartite graphs are usually scale-free, however, the intrinsic hierarchical graph structures of which are underemphasized in existing works, consequently, leading to suboptimal recommendation performance. To address this issue and provide more accurate recommendation, we propose a knowledge-aware recommendation method with Lorentz model of the hyperbolic geometry, namely Lorentzian Knowledge-enhanced Graph convolutional networks for Recommendation (LKGR). LKGR facilitates better modeling of scale-free tripartite graphs after the data unification. Specifically, we employ different information propagation strategies in the hyperbolic space to explicitly encode heterogeneous information from historical interactions and KGs. Additionally, our proposed knowledge-aware attention mechanism enables the model to automatically measure the information contribution, producing the coherent information aggregation in the hyperbolic space. Extensive experiments on three real-world benchmarks demonstrate that LKGR outperforms state-of-the-art methods by 3.6-15.3% of Recall@20 on Top-K recommendation.

Estimating Causal Effects of Multi-Aspect Online Reviews with Multi-Modal Proxies

Online reviews enable consumers to engage with companies and provide important feedback. Due to the complexity of the high-dimensional text, these reviews are often simplified as a single numerical score, e.g., ratings or sentiment scores. This work empirically examines the causal effects of user-generated online reviews on a granular level: we consider multiple aspects, e.g., the Food and Service of a restaurant. Understanding consumers' opinions toward different aspects can help evaluate business performance in detail and strategize business operations effectively. Specifically, we aim to answer interventional questions such as What will the restaurant popularity be if the quality w.r.t. its aspect Service is increased by 10%? The defining challenge of causal inference with observational data is the presence of "confounder'', which might not be observed or measured, e.g., consumers' preference to food type, rendering the estimated effects biased and high-variance. To address this challenge, we have recourse to the multi-modal proxies such as the consumer profile information and interactions between consumers and businesses. We show how to effectively leverage the rich information to identify and estimate causal effects of multiple aspects embedded in online reviews. Empirical evaluations on synthetic and real-world data corroborate the efficacy and shed light on the actionable insight of the proposed approach.

Causal Mediation Analysis with Hidden Confounders

An important problem in causal inference is to break down the total effect of a treatment on an outcome into different causal pathways and to quantify the causal effect in each pathway. For instance, in causal fairness, the total effect of being a male employee (i.e., treatment) constitutes its direct effect on annual income (i.e., outcome) and the indirect effect via the employee's occupation (i.e., mediator). Causal mediation analysis (CMA) is a formal statistical framework commonly used to reveal such underlying causal mechanisms. One major challenge of CMA in observational studies is handling confounders, variables that cause spurious causal relationships among treatment, mediator, and outcome. Conventional methods assume sequential ignorability that implies all confounders can be measured, which is often unverifiable in practice. This work aims to circumvent the stringent sequential ignorability assumptions and consider hidden confounders. Drawing upon proxy strategies and recent advances in deep learning, we propose to simultaneously uncover the latent variables that characterize hidden confounders and estimate the causal effects. Empirical evaluations using both synthetic and semi-synthetic datasets validate the effectiveness of the proposed method. We further show the potentials of our approach for causal fairness analysis.

AdaptKT: A Domain Adaptable Method for Knowledge Tracing

Knowledge tracing is a crucial and fundamental task in online education systems, which can predict students' knowledge state for personalized learning. Unfortunately, existing methods are domain-specific, whereas there are many domains (e.g., subjects, schools) in the real education scene and some domains suffer from the problem of lacking sufficient data. Therefore, how to exploit the knowledge in other domains, to improve the model's performance for target domain remains pretty much open. We term this problem as Domain Adaptation for Knowledge Tracing (DAKT), which aims to transfer knowledge from the source domain to the target one for knowledge tracing. In this paper, we propose a novel adaptable method, namely Adaptable Knowledge Tracing (AdaptKT), which contains three phases to explore this problem. Specifically, phase I is instance selection. Given the question texts of two domains, we train an auto-encoder to select and embed similar instances from both domains. Phase II is distribution discrepancy minimizing. After obtaining the selected instances and their linguistic representations, we train a knowledge tracing model and adopt the Maximum Mean Discrepancy (MMD) to minimize the discrepancy between the distributions of the domain-specific knowledge states. Phase III is fine-tuning of the output layer. We replace the output layer of the model that trained in phase II by a new one to make the knowledge tracing model's output dimension matches the number of knowledge concepts in the target domain. The new output layer is trained while other parameters that before it are frozen. We conduct extensive experiments on two large-scale real-world datasets, where the experimental results clearly demonstrate the effectiveness of AdaptKT for solving DAKT problem. We will public the code on the Github after the acceptance of the paper.

An Adaptive Unified Allocation Framework for Guaranteed Display Advertising

Guaranteed Display (GD) is widely used in e-commerce marketing for advertisers to acquire an agreed-upon number of impressions with target audiences. With the main objective to maximize the contract delivery rate under contract constraints, user interest (such as click-through rate and conversion rate) is also essential to improve the long-time return on investment for advertisers and the e-commerce platform. In this paper, we design an adaptive unified allocation framework (AUAF) by not only considering supply of audience impressions in request-level but also avoiding over-allocation of audience impressions. Specifically, our allocation model simultaneously optimizes the contract delivery and the match between advertisements and user interests with explicit constraint to prevent unnecessary allocation. Facing the challenge of serving billion-scale requests per day, a parameter-server based parallel optimization algorithm is also developed, enabling the proposed allocation model to be efficiently optimized and incrementally updated in minutes. Thus, the offline optimization results and the online decisions can be synchronized for real-time serving. In other words, our approach can achieve adaptive pacing that is consistent with the optimal allocation solution. Our extensive experimental results demonstrate that the proposed AUAF framework can improve both contract delivery rate and average click-through rate (CTR), which we use to measure the user interest in this paper. The improvements on CTR are statistically significant in comparison with existing methods. Moreover, since March 2020, AUAF has been deployed in the guaranteed display advertising system of Alibaba, bringing more than 10% increase on CTR without loss of contract delivery rate, which has resulted in significant value creation for the business.

The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?

There has been sustained interest from both academia and industry throughout the years due to the importance and practicability of recommendation systems. However, several recent papers have pointed out critical issues with the evaluation process in recommender systems. Likewise, this paper takes an in-depth look at a fundamental but often neglected aspect of the evaluation procedure, i.e. the datasets themselves. To do so, we adopt a systematic and comprehensive approach to understand the datasets used for implicit feedback based top-K recommendation. We start by examining recent papers from top-tier conferences to find out how different datasets have been utilised thus far. Next, we look at the characteristics of these datasets to understand their similarities and differences. Finally, we conduct an empirical study to determine whether the choice of datasets used for evaluation can influence the observations and/or conclusions obtained. Our findings suggest that greater attention needs to be paid to the selection process of datasets used for evaluating recommender systems in order to improve the robustness of the obtained results.

S-Walk: Accurate and Scalable Session-based Recommendation with Random Walks

Session-based recommendation (SR) predicts the next items from a sequence of previous items consumed by an anonymous user. Most existing SR models focus only on modeling intra-session characteristics but pay less attention to inter-session relationships of items, which has the potential to improve accuracy. Another critical aspect of recommender systems is computational efficiency and scalability, considering practical feasibility in commercial applications. To account for both accuracy and scalability, we propose a novel session-based recommendation with a random walk, namely S-Walk. Precisely, S-Walk effectively captures intra- and inter-session correlations by handling high-order relationships among items using random walks with restart (RWR). By adopting linear models with closed-form solutions for transition and teleportation matrices that constitute RWR, S-Walk is highly efficient and scalable. Extensive experiments demonstrate that S-Walk achieves comparable or state-of-the-art performance in various metrics on four benchmark datasets. Moreover, the model learned by S-Walk can be highly compressed without sacrificing accuracy, conducting two or more orders of magnitude faster inference than existing DNN-based models, making it suitable for large-scale commercial systems.

ANTHEM: Attentive Hyperbolic Entity Model for Product Search

Product search is a fundamentally challenging problem due to the large-size of product catalogues and the complexity of extracting semantic information from products. In addition to this, the black-box nature of most search systems also hamper a smooth customer experience. Current approaches in this area utilize lexical and semantic product information to match user queries against products. However, these models lack (i) a hierarchical query representation, (ii) a mechanism to detect and capture inter-entity relationships within a query, and (iii) a query composition method specific to e-commerce domain. To address these challenges, in this paper, we propose an AtteNTive Hyperbolic Entity Model (ANTHEM), a novel attention-based product search framework that models query entities as two-vector hyperboloids, learns inter-entity intersections and utilizes attention to unionize individual entities and inter-entity intersections to predict product matches from the search space. ANTHEM utilizes the first and second vector of hyperboloids to determine the query's semantic position and to tune its surrounding search volume, respectively. The attention networks capture the significance of intra-entity and inter-entity intersections to the final query space. Additionally, we provide a mechanism to comprehend ANTHEM and understand the significance of query entities towards the final resultant products. We evaluate the performance of our model on real data collected from popular e-commerce sites. Our experimental study on the offline data demonstrates compelling evidence of ANTHEM's superior performance over state-of-the-art product search methods with an improvement of more than 10% on various metrics. We also demonstrate the quality of ANTHEM's query encoder using a query matching task.

Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases

Answering complex questions over knowledge bases (KB-QA) faces huge input data with billions of facts, involving millions of entities and thousands of predicates. For efficiency, QA systems first reduce the answer search space by identifying a set of facts that is likely to contain all answers and relevant cues. The most common technique for doing this is to apply named entity disambiguation (NED) on the question, and retrieve KB facts for the disambiguated entities. This work presents CLOCQ, an efficient method that prunes irrelevant parts of the search space using KB-aware signals. CLOCQ uses a top-k query processor over score-ordered lists of KB items that combine signals about lexical matching, relevance to the question, coherence among candidate items, and connectivity in the KB graph. Experiments with two recent QA benchmarks for complex questions demonstrate the superiority of CLOCQ over state-of-the-art baselines with respect to answer presence, size of the search space, and runtimes.

Towards Robust Graph Neural Networks for Noisy Graphs with Sparse Labels

Graph Neural Networks (GNNs) have shown their great ability in modeling graph structured data. However, real-world graphs usually contain structure noises and have limited labeled nodes. The performance of GNNs would drop significantly when trained on such graphs, which hinders the adoption of GNNs on many applications. Thus, it is important to develop noise-resistant GNNs with limited labeled nodes. However, the work on this is rather limited. Therefore, we study a novel problem of developing robust GNNs on noisy graphs with limited labeled nodes. Our analysis shows that both the noisy edges and limited labeled nodes could harm the message-passing mechanism of GNNs. To mitigate these issues, we propose a novel framework which adopts the noisy edges as supervision to learn a denoised and dense graph, which can down-weight or eliminate noisy edges and facilitate message passing of GNNs to alleviate the issue of limited labeled nodes. The generated edges are further used to regularize the predictions of unlabeled nodes with label smoothness to better train GNNs. Experimental results on real-world datasets demonstrate the robustness of the proposed framework on noisy graphs with limited labeled nodes.

Predicting Human Mobility via Graph Convolutional Dual-attentive Networks

Human mobility prediction is of great importance for various applications such as smart transportation and personalized recommender systems. Although many traditional pattern-based methods and deep models ($e.g.,$ recurrent neural networks) based methods have been developed for this task, they essentially do not well cope with the sparsity and inaccuracy of trajectory data and the complicated high-order nature of the sequential dependency, which are typical challenges in mobility prediction. To solve the problems, this paper proposes a novel framework named G raph C onvolutional D ual-a ttentive N etworks (GCDAN), which consists of two modules: spatio-temporal embedding and trajectory encoder-decoder. The first module employs a bidirectional diffusion graph convolution to preserve the spatial dependency in the location embedding. The second module employs a dual-attentive mechanism based on a Sequence to Sequence architecture to effectively extract the long-range sequential dependency within a trajectory and the correlation between different trajectories for predictions. Extensive experiments on three real-world datasets show that GCDAN achieves significant performance gain compared with state-of-the-art baselines.

Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance Prediction

Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weakly supervised approaches, our method also does not rely on the outputs from different QPP estimators. In particular, our model leverages information from the semantic interactions between the terms of a query and those in the top-documents retrieved with it. The architecture of the model comprises multiple layers of 2D convolution filters followed by a feed-forward layer of parameters. Experiments on standard test collections demonstrate that our proposed supervised approach outperforms other state-of-the-art supervised and unsupervised approaches.

Improving Personalized Search with Dual-Feedback Network

Personalized search improves the quality of search results by modeling historical user behavior. In recent years, many methods based on deep learning have greatly improved the performance of personalized search. However, most of the existing methods only focus on modeling positive user behavior signals, which leads to incomplete user interest modeling. At the same time, the user's search behavior hides much explicit or implicit feedback information. For example, clicking and staying for a certain period represents implicit positive feedback, and skipping behavior represents implicit negative feedback. Intuitively, this information can be utilized to construct a more complete and accurate user profile. In this paper, we propose a dual-feedback modeling framework, which integrates multi-granular user feedback information to model the user's current search intention. Specifically, we propose a feedback extraction network to refine the dual-feedback representation in multiple stages. For enhancing the user's real-time search quality, we design an additional dual-feedback feature gating module to capture the user's real-time feedback in the current session. We conducted a large number of experiments on two real-world datasets, and the experimental results show that our method can effectively improve the performance of personalized search.

Combinatorial Bandits under Strategic Manipulations

Strategic behavior against sequential learning methods, such as "click framing'' in real recommendation systems, have been widely observed. Motivated by such behavior we study the problem of combinatorial multi-armed bandits (CMAB) under strategic manipulations of rewards, where each arm can modify the emitted reward signals for its own interest. This characterization of the adversarial behavior is a relaxation of previously well-studied settings such as adversarial attacks and adversarial corruption. We propose a strategic variant of the combinatorial UCB algorithm, which has a regret of at most O(mlog T + m B_max ) under strategic manipulations, where T is the time horizon, m is the number of arms, and B_max is the maximum budget of an arm. We provide lower bounds on the budget for arms to incur certain regret of the bandit algorithm. Extensive experiments on online worker selection for crowdsourcing systems, online influence maximization and online recommendations with both synthetic and real datasets corroborate our theoretical findings on robustness and regret bounds, in a variety of regimes of manipulation budgets.

Understanding and Improvement of Adversarial Training for Network Embedding from an Optimization Perspective

Network Embedding aims to learn a function mapping the nodes to Euclidean space contribute to multiple learning analysis tasks on networks. However, both the noisy information behind the real-world networks and the overfitting problem negatively impact the quality of embedding vectors. To tackle these problems, researchers utilize Adversarial Perturbations on Parameters (APP) and achieve state-of-the-art performance. Unlike the mainstream methods introducing perturbations on the network structure or the data feature, Adversarial Training for Network Embedding (AdvTNE) adopts APP to directly perturb the model parameters, thus providing a new chance to understand the mechanism behind it. In this paper, we explain APP theoretically from an optimization perspective. Considering the Power-law property of networks and the optimization objective, we analyze the reason for its remarkable results on network embedding. Based on the above analysis and the Sigmoid saturation region problem, we propose a new Sine-base activation to enhance the performance of AdvTNE. We conduct extensive experiments on four real networks to validate the effectiveness of our method in node classification and link prediction. The results demonstrate that our method is competitive with state-of-the-art methods.

Semi-supervised Stance Detection of Tweets Via Distant Network Supervision

Detecting and labeling stance in social media text is strongly motivated by hate speech detection, poll prediction, engagement forecasting, and concerted propaganda detection. Today's best neural stance detectors need large volumes of training data, which is difficult to curate given the fast-changing landscape of social media text and issues on which users opine. Homophily properties over the social network provide strong signal of coarse-grained user-level stance. But semi-supervised approaches for tweet-level stance detection fail to properly leverage homophily. In light of this, We present SANDS, a new semi-supervised stance detector. SANDS starts from very few labeled tweets. It builds multiple deep feature views of tweets. It also uses a distant supervision signal from the social network to provide a surrogate loss signal to the component learners. We prepare two new tweet datasets comprising over 236,000 politically tinted tweets from two demographics (US and India) posted by over 87,000 users, their follower-followee graph, and over 8,000 tweets annotated by linguists. SANDS achieves a macro-F1 score of 0.55 (0.49) on US (India)-based datasets, outperforming 17 baselines (including variants of SANDS) substantially, particularly for minority stance labels and noisy text. Numerous ablation experiments on SANDS disentangle the dynamics of textual and network-propagated stance signals.

External Evaluation of Ranking Models under Extreme Position-Bias

Implicit feedback from users behavior is a natural and scalable source for training and evaluating ranking models in human-interactive systems. However, inherent biases such as the position bias are key obstacles to its effective usage. This is further accentuated in cases of extreme bias, where behavioral feedback can be collected exclusively on the top ranked result. In fact, in such cases, state-of-art debiasing methods cannot be applied. A prominent use case of extreme position bias is the voice shopping medium, where only a small amount of information can be presented to the user during a single interaction, resulting in user behavioral signals that are almost exclusively limited to the top offer. There is no way to know how the user would have reacted to a different offer than the top one he was actually exposed to. Thus, any new ranker we wish to evaluate with respect to a behavioral metric, requires online experimentation. We propose a novel approach, based on anexternal estimator model, for accurately predicting offline the performance of a new ranker. The accuracy of our solution is proven theoretically, as well as demonstrated by a line of experiments. In these experiments, we focus on the use case of purchase prediction, and show that our estimator can accurately predict offline the purchase rate of different rankers over a segment of voice shopping traffic. Our prediction is validated online, as being compared to the actual performance obtained by each ranker when being exposed to users.

Modeling Users' Contextualized Page-wise Feedback for Click-Through Rate Prediction in E-commerce Search

Modeling user's historical feedback is essential for Click-Through Rate Prediction in personalized search and recommendation. Existing methods usually only model users' positive feedback information such as click sequences which neglects the context information of the feedback. In this paper, we propose a new perspective for context-aware users' behavior modeling by including the whole page-wisely exposed products and the corresponding feedback as contextualized page-wise feedback sequence. The intra-page context information and inter-page interest evolution can be captured to learn more specific user preference. We design a novel neural ranking model RACP(Recurrent Attention over Contextualized Page sequence), which utilizes page-context aware attention to model the intra-page context. A recurrent attention process is used to model the cross-page interest convergence evolution as denoising the interest in the previous pages. Experiments on public and real-world industrial datasets verify our model's effectiveness.

Variational User Modeling with Slow and Fast Features

Recommender systems play a key role in helping users find their favorite music to play among an often extremely large catalog of items on online streaming services. To correctly identify users' interests, recommendation algorithms rely on past user behavior and feedback to aim at learning users' preferences through the logged interactions. User modeling is a fundamental part of this large-scale system as it enables the model to learn an optimal representation for each user. For instance, in music recommendation, the focus of this paper, users' interests at any time is shaped by their general preferences for music as well as their recent or momentary interests in a particular type of music. In this paper, we present a novel approach for learning user representation based on general and slow-changing user interests as well as fast-moving current preferences. We propose a variational autoencoder-based model that takes fast and slow-moving features and learns an optimal user representation. Our model, which we call FS-VAE, consists of sequential and non-sequential encoders to capture patterns in user-item interactions and learn users' representations. We evaluate FS-VAE on a real-world music streaming dataset. Our experimental results show a clear improvement in learning optimal representations compared to state-of-the-art baselines on the next item recommendation task. We also demonstrate how each of the model components, slow input feature, and fast ones play a role in achieving the best results in next item prediction and learning users' representations.

How Do You Test a Test?: A Multifaceted Examination of Significance Tests

We examine three statistical significance tests -- a recently proposed ANOVA model and two baseline tests -- using a suite of measures to determine which is better suited for offline evaluation. We apply our analysis to both the runs of a whole TREC track and also to the runs submitted by six participant groups. The former reveals test behavior in the heterogeneous settings of a large-scale offline evaluation initiative; the latter, almost overlooked in past work (to the best of our knowledge), reveals what happens in the much more restricted case of variants of a single system, i.e. the typical context in which companies and research groups operate. We find the ANOVA test strikingly consistent in large-scale settings, but worryingly inconsistent in some participant experiments. Of greater concern, the participant only experiments show one of our baseline tests (a test widely used in research) can produce a substantial number of inconsistent results. We discuss the implications of this inconsistency for possible publication bias.

Efficient Graph Convolution for Joint Node Representation Learning and Clustering

Attributed graphs are used to model a wide variety of real-world networks. Recent graph convolutional network-based representation learning methods have set state-of-the-art results on the clustering of attributed graphs. However, these approaches deal with clustering as a downstream task while better performances can be attained by incorporating the clustering objective into the representation learning process. In this paper, we propose, in a unified framework, an objective function taking into account both tasks simultaneously. Based on a variant of the simple graph convolutional network, our model does clustering by minimizing the difference between the convolved node representations and their reconstructed cluster representatives. We showcase the efficiency of the derived algorithm against state-of-the-art methods both in terms of clustering performance and computational cost on thede facto benchmark graph clustering datasets. We further demonstrate the usefulness of the proposed approach for graph visualization through generating embeddings that exhibit a clustering structure.

Leveraging Multi-view Inter-passage Interactions for Neural Document Ranking

The configuration of 512 window size prevents transformers from being directly applicable to document ranking that requires larger context. Hence, recent works propose to estimate document relevance with fine-grained passage-level relevance signals. A limitation of such models, however, is that scoring each passage independently falls short in modeling inter-passage interactions and leads to unsatisfactory results. In this paper, we propose a Multiview inter-passage Interaction based Ranking model (MIR), to combine intra-passage interactions and inter-passage interactions in a complementary manner. The former captures local semantic relations inside each passage, whereas the latter draws global dependencies between different passages. Moreover, we represent inter-passage relationships via multi-view attention patterns, allowing information propagation at token, sentence, and passage-level. The representations at different levels of granularity, being aware of global context, are then aggregated into a document-level representation for ranking. Experimental results on two benchmarks show that modeling inter-passage interactions brings substantial improvements over existing passage-level methods.

HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling

Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests. In these communities, users can post articles, give comment, raise a question and answer it. These data form the heterogeneous information sources where each information source have their own special structure and context (comments attached to an article or related question with answers). Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question. However, various types of information sources in the community are not fully explored by these CQA methods and these multiple information sources (MIS) can provide more related knowledge to user's questions. Thus, we propose a question-aware heterogeneous graph transformer to incorporate the MIS in the user community to automatically generate the answer. To evaluate our proposed method, we conduct the experiments on two datasets: $\textMSM ^\textplus $ the modified version of benchmark dataset MS-MARCO and the AntQA dataset which is the first large-scale CQA dataset with four types of MIS. Extensive experiments on two datasets show that our model outperforms all the baselines in terms of all the metrics.

Toward Pareto Efficient Fairness-Utility Trade-off in Recommendation through Reinforcement Learning

The issue of fairness in recommendation is becoming increasingly essential as Recommender Systems (RS) touch and influence more and more people in their daily lives. In fairness-aware recommendation, most of the existing algorithmic approaches mainly aim at solving a constrained optimization problem by imposing a constraint on the level of fairness while optimizing the main recommendation objective, e.g., click through rate (CTR). While this alleviates the impact of unfair recommendations, the expected return of an approach may significantly compromise the recommendation accuracy due to the inherent trade-off between fairness and utility. This motivates us to deal with these conflicting objectives and explore the optimal trade-off between them in recommendation. One conspicuous approach is to seek aPareto efficient/optimal solution to guarantee optimal compromises between utility and fairness. Moreover, considering the needs of real-world e-commerce platforms, it would be more desirable if we can generalize the wholePareto Frontier, so that the decision-makers can specify any preference of one objective over another based on their current business needs. Therefore, in this work, we propose a fairness-aware recommendation framework usingmulti-objective reinforcement learning (MORL), called MoFIR (pronounced "more fair ''), which is able to learn a single parametric representation for optimal recommendation policies over the space of all possible preferences. Specially, we modify traditional Deep Deterministic Policy Gradient (DDPG) by introducingconditioned network (CN) into it, which conditions the networks directly on these preferences and outputs Q-value-vectors. Experiments on several real-world recommendation datasets verify the superiority of our framework on both fairness metrics and recommendation measures when compared with all other baselines. We also extract the approximate Pareto Frontier on real-world datasets generated by MoFIR and compare to state-of-the-art fairness methods.

Differentially Private Ensemble Classifiers for Data Streams

Learning from continuous data streams via classification/regression is prevalent in many domains. Adapting to evolving data characteristics (concept drift) while protecting data owners' private information is an open challenge. We present a differentially private ensemble solution to this problem with two distinguishing features: it allows anunbounded number of ensemble updates to deal with the potentially never-ending data streams under a fixed privacy budget, and it ismodel agnostic, in that it treats any pre-trained differentially private classification/regression model as a black-box. Our method outperforms competitors on real-world and simulated datasets for varying settings of privacy, concept drift, and data distribution.

Multi-Scale Variational Graph AutoEncoder for Link Prediction

Link prediction has become a significant research problem in deep learning, and the graph-based autoencoder model is one of the most important methods to solve it. The existing graph-based autoencoder models only learn a single set of distributions, which cannot accurately represent the mixed distribution in real graph data. Meanwhile, existing learning models have been greatly restricted when the graph data has insufficient attribute information and inaccurate topology information. In this paper, we propose a novel graph embedding framework, termed multi-scale variational graph autoencoder (MSVGAE), which learns multiple sets of low-dimensional vectors of different dimensions through the graph encoder to represent the mixed probability distribution of the original graph data, and performs multiple sampling in each dimension. Furthermore, a self-supervised learning strategy (i.e., graph feature reconstruction auxiliary learning) is introduced to fully use the graph attribute information to help the graph structure learning. Experiment studies on real-world graphs demonstrate that the proposed model achieves state-of-the-art performance compared with other baseline methods in link prediction tasks. Besides, the robustness analysis shows that the proposed MSVGAE method has obvious advantages in the processes of graph data with insufficient attribute information and inaccurate topology information.

Learning Multi-granularity Consecutive User Intent Unit for Session-based Recommendation

Session-based recommendation aims to predict a user's next action based on previous actions in the current session. The major challenge is to capture authentic and complete user preferences in the entire session. Recent work utilizes graph structure to represent the entire session and adopts Graph Neural Network (GNN) to encode session information. This modeling choice has been proved to be effective and achieved remarkable results. However, most of the existing studies only consider each item within the session independently and do not capture session semantics from a high-level perspective. Such limitation often leads to severe information loss and increases the difficulty of capturing long-range dependencies within a session. Intuitively, compared with individual items, a session snippet, i.e., a group of locally consecutive items, is able to provide supplemental user intents which are hardly captured by existing methods. In this work, we propose to learn multi-granularity consecutive user intent unit to improve the recommendation performance. Specifically, we creatively propose Multi-granularity Intent Heterogeneous Session Graph (MIHSG) which captures the interactions between different granularity intent units and relieves the burden of long-dependency. Moreover, we propose the Intent Fusion Ranking (IFR) module to compose the recommendation results from various granularity user intents. Compared with current methods that only leverage intents from individual items, IFR benefits from different granularity user intents to generate more accurate and comprehensive session representation, thus eventually boosting recommendation performance. We conduct extensive experiments on five session-based recommendation datasets and the results demonstrate the effectiveness of our method. Compared to current state-of-the-art methods, we achieve as large as 10.21% gain on HR@20 and 15.53% gain on MRR@20.

Outside In: Market-aware Heterogeneous Graph Neural Network for Employee Turnover Prediction

As an emerging initiative of proactive human resource management, employee turnover prediction is critically important for employers to retain talents and avoid the loss of intellectual capital. While considerable research efforts have been made in this direction, most of them only focus on modeling the within-company career trajectories of employees where the influence of external job market has been largely neglected. To this end, in this paper, we propose an enhanced framework of employee turnover prediction by jointly modeling the turnover clues from both internal and external views. Specifically, from the external-market view, we construct a heterogeneous graph which connects the employees with external job markets through shared skills. In this way, we can capture the potential popularity of employees in external markets specific to skills. Meanwhile, from the internal-company view, we design a graph convolutional network with hierarchical attention mechanism to capture the influence of organizational structure (e.g., superiors, subordinates, and peers) and colleagues with similar skills. Furthermore, both modules are modeled with Bidirectional LSTM and survival analysis to learn effective and dynamic representations of employee turnover prediction. Finally, we conduct extensive experiments on a large-scale real-world talent dataset with state-of-the-art methods, which clearly demonstrate the effectiveness of our approach as well as some interesting findings that could help us understand the employee turnover patterns, such as different impacts of external systems and collaborators from different groups.

Dy-HIEN: Dynamic Evolution based Deep Hierarchical Intention Network for Membership Prediction

\beginabstract Many video websites offer packages composed of paid videos. Users who purchase a package become members of the website, and thus can enjoy the membership service, such as watching the paid videos. It is practically important to predict which users will become members so that the website can recommend them the suitable packages for purchasing. Existing works generally predict the purchase behavior of users through capturing their interests in items. However, such works cannot be directly applied to the studied problem due to the following challenges. First, some important features of videos and packages change over time, such as the number of clicks and the update of the videos. Existing methods are not capable to capture such dynamic features. Second, a user's purchasing intention is very hard to capture. A user watching a video does not necessarily mean that he/she would like to purchase the corresponding package. In this paper, we propose a Dy namic E volution based Deep H ierarchical I ntention N etwork (Dy-HIEN for short) for membership prediction, which contains two modules. In the first module, we design a dynamic embedding learning method, applying multi-relational heterogeneous information network and attention mechanism to effectively represent the embedding of videos and packages. In the second module, a hierarchical method is proposed to extract the purchase intention of users. First, the video play history is divided into sessions based on the clicks on packages, and then time-order encoder and kernel functions are applied to mine the intention pattern associated with the package clicked in each session. Extensive experiments on real-world datasets are conducted to demonstrate the advantages of the proposed model on a variety of evaluation metrics. \endabstract

Keyword Assisted Embedded Topic Model

By illuminating latent structures in a corpus of text, topic models are an essential tool for categorizing, summarizing, and exploring large collections of documents. Probabilistic topic models, such as latent Dirichlet allocation (LDA), describe how words in documents are generated via a set of latent distributions called topics. Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics. As LDA and its extensions are unsupervised models, they aren't defined to make efficient use of a user's prior knowledge of the domain. To this end, we propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors over the vocabulary. Using both quantitative metrics and human responses on a topic intrusion task, we demonstrate that KeyETM produces better topics than other guided, generative models in the literature\footnoteCode for this work can be found at \url .

It Is Different When Items Are Older: Debiasing Recommendations When Selection Bias and User Preferences Are Dynamic

User interactions with recommender systems (RSs) are affected by user selection bias, e.g., users are more likely to rate popular items (popularity bias) or items that they expect to enjoy beforehand (positivity bias). Methods exist for mitigating the effects of selection bias in user ratings on the evaluation and optimization of RSs. However, these methods treat selection bias as static, despite the fact that the popularity of an item may change drastically over time and the fact that user preferences may also change over time.

We focus on the age of an item and its effect on selection bias and user preferences. Our experimental analysis reveals that the rating behavior of users on the MovieLens dataset is better captured by methods that consider effects from the age of item on bias and preferences. We theoretically show that in a dynamic scenario in which both the selection bias and user preferences are dynamic, existing debiasing methods are no longer unbiased. To address this limitation, we introduce DebiAsing in the dyNamiC scEnaRio (DANCER), a novel debiasing method that extends the inverse propensity scoring debiasing method to account for dynamic selection bias and user preferences. Our experimental results indicate that DANCER improves rating prediction performance compared to debiasing methods that incorrectly assume that selection bias is static in a dynamic scenario. To the best of our knowledge, DANCER is the first debiasing method that accounts for dynamic selection bias and user preferences in RSs.

POLE: Polarized Embedding for Signed Networks

From the 2016 U.S. presidential election to the 2021 Capitol riots to the spread of misinformation related to COVID-19, many have blamed social media for today's deeply divided society. Recent advances in machine learning for signed networks hold the promise to guide small interventions with the goal of reducing polarization in social media. However, existing models are especially ineffective in predicting conflicts (or negative links) among users. This is due to a strong correlation between link signs and the network structure, where negative links between polarized communities are too sparse to be predicted even by state-of-the-art approaches. To address this problem, we first design a partition-agnostic polarization measure for signed graphs based on the signed random-walk and show that many real-world graphs are highly polarized. Then, we propose POLE (POLarized Embedding for signed networks), a signed embedding method for polarized graphs that captures both topological and signed similarities jointly via signed autocovariance. Through extensive experiments, we show that POLE significantly outperforms state-of-the-art methods in signed link prediction, particularly for negative links with gains of up to one order of magnitude.

Triangle Graph Interest Network for Click-through Rate Prediction

Click-through rate prediction is a critical task in online advertising. Currently, many existing methods attempt to extract user potential interests from historical click behavior sequences. However, it is difficult to handle sparse user behaviors or broaden interest exploration. Recently, some researchers incorporate the item-item co-occurrence graph as an auxiliary. Due to the elusiveness of user interests, those works still fail to determine the real motivation of user click behaviors. Besides, those works are more biased towards popular or similar commodities. They lack an effective mechanism to break the diversity restrictions. In this paper, we point out two special properties of triangles in the item-item graphs for recommendation systems: Intra-triangle homophily and Inter-triangle heterophiy. Based on this, we propose a novel and effective framework named Triangle Graph Interest Network (TGIN). For each clicked item in user behavior sequences, we introduce the triangles in its neighborhood of the item-item graphs as a supplement. TGIN regards these triangles as the basic units of user interests, which provide the clues to capture the real motivation for a user clicking an item. We characterize every click behavior by aggregating the information of several interest units to alleviate the elusive motivation problem. The attention mechanism determines users' preference for different interest units. By selecting diverse and relative triangles, \short brings in novel and serendipitous items to expand exploration opportunities of user interests. Then, we aggregate the multi-level interests of historical behavior sequences to improve CTR prediction. Extensive experiments on both of public and industrial datasets clearly verify the effectiveness of our framework.

On Generalizing Static Node Embedding to Dynamic Settings

Temporal graph embedding has been widely studied thanks to its superiority in tasks such as prediction and recommendation. Despite the advances in algorithms and novel frameworks such as deep learning, there has been relatively little work on systematically studying the properties of temporal network models and their cornerstones, the graph time-series representations that are used in these approaches. This paper aims to fill this gap by introducing a general framework that extends an arbitrary existing static embedding approach to handle dynamic tasks, and conducting a systematic study of seven base static embedding methods and six temporal network models. Our framework generalizes static node embeddings derived from the time-series representation of stream data to the dynamic setting by modeling the temporal dependencies with classic models such as the reachability graph. While previous works on dynamic modeling and embedding have focused on representing a stream of timestamped edges using a time-series of graphs based on a specific time-scale (\eg, 1 month), we introduce the notion of an ε-graph time-series that uses a fixed number of edges for each graph, and show its superiority in practical settings over the standard solution. From the 42 methods that our framework subsumes, we find that leveraging the new ε-graph time-series representation and capturing temporal dependencies with the proposed reachability or summary graph tend to perform well. Furthermore, the new dynamic embedding methods based on our framework perform comparably and on average better than the state-of-the-art embedding methods designed specifically for temporal graphs in link prediction tasks.

KGNN: Harnessing Kernel-based Networks for Semi-supervised Graph Classification

This paper studies semi-supervised graph classification, which is an important problem with various applications in social network analysis and bioinformatics. This problem is typically solved by using graph neural networks (GNNs), which yet rely on a large number of labeled graphs for training and are unable to leverage unlabeled graphs. We address the limitations by proposing the Kernel-based Graph Neural Network (KGNN). A KGNN consists of a GNN-based network as well as a kernel-based network parameterized by a memory network. The GNN-based network performs classification through learning graph representations to implicitly capture the similarity between query graphs and labeled graphs, while the kernel-based network uses graph kernels to explicitly compare each query graph with all the labeled graphs stored in a memory for prediction. The two networks are motivated from complementary perspectives, and thus combing them allows KGNN to use labeled graphs more effectively. We jointly train the two networks by maximizing their agreement on unlabeled graphs via posterior regularization, so that the unlabeled graphs serve as a bridge to let both networks mutually enhance each other. Experiments on a range of well-known benchmark datasets demonstrate that KGNN achieves impressive performance over competitive baselines.

Leveraging World Events to Predict E-Commerce Consumer Demand under Anomaly

Consumer demand forecasting is of high importance for many e-commerce applications, including supply chain optimization, advertisement placement, and delivery speed optimization. However, reliable time series sales forecasting for e-commerce is difficult, especially during periods with many anomalies, as can often happen during pandemics, abnormal weather, or sports events. Although many time series algorithms have been applied to the task, prediction during anomalies still remains a challenge. In this work, we hypothesize that leveraging external knowledge found in world events can help overcome the challenge of prediction under anomalies. We mine a large repository of 40 years of world events and their textual representations. Further, we present a novel methodology based on transformers to construct an embedding of a day based on the relations of the day's events. Those embeddings are then used to forecast future consumer behavior. We empirically evaluate the methods over a large e-commerce products sales dataset, extracted from eBay, one of the world's largest online marketplaces. We show over numerous categories that our method outperforms state-of-the-art baselines during anomalies.

GAGE: Geometry Preserving Attributed Graph Embeddings

Node embedding is the task of extracting concise and informative representations of certain entities that are connected in a network. Various real-world networks include information about both node connectivity and certain node attributes, in the form of features or time-series data. Modern representation learning techniques employ both the connectivity and attribute information of the nodes to produce embeddings in an unsupervised manner. In this context, deriving embeddings that preserve the geometry of the network and the attribute vectors would be highly desirable, as they would reflect both the topological neighborhood structure and proximity in feature space. While this is fairly straightforward to maintain when only observing the connectivity or attribute information of the network, preserving the geometry of both types of information is challenging. A novel tensor factorization approach for node embedding in attributed networks is proposed in this paper, that preserves the distances of both the connections and the attributes. Furthermore, an effective and lightweight algorithm is developed to tackle the learning task and judicious experiments with multiple state-of-the-art baselines suggest that the proposed algorithm offers significant performance improvements in downstream tasks.

Query Interpretations from Entity-Linked Segmentations

Web search queries can be ambiguous: is "source of the nile'' meant to find information on the actual river or on a board game of that name? We tackle this problem by deriving entity-based query interpretations: given some query, the task is to derive all reasonable ways of linking suitable parts of the query to semantically compatible entities in a background knowledge base. Our suggested approach focuses on effectiveness but also on efficiency since web search response times should not exceed some hundreds of milliseconds. In our approach, we use query segmentation as a pre-processing step that finds promising segment-based "interpretation skeletons''. The individual segments from these skeletons are then linked to entities from a knowledge base and the reasonable combinations are ranked in a final step. An experimental comparison on a combined corpus of all existing query entity linking datasets shows our approach to have a better interpretation accuracy at a better run time than the previously most effective methods.

An Unsupervised Detection Framework for Chinese Jargons in the Darknet

With the continuous development of the darknet technology, the scale of darknet and have increased rapidly in recent years, leading to rampant crime in these anonymous trading markets. Monitoring these markets can effectively combat the criminal forces that hide behind them. One of the difficulties in understanding the darknet is that criminals usually use jargons to disguise transactions and thus avoid surveillance. These jargons usually distort the original meaning of innocent-looking words in obscure ways, posing significant challenges for crime tracking. Current research on Chinese jargon detection mainly adopts the method of keyword filtering, however, such methods have little effect on the complex and ever-changing structure of darknet jargons. We propose a Chinese jargon detection framework based on unsupervised learning. The main idea is to compare similarity with high-dimensional word embedding features from different corpus to find jargons. Firstly, we collect data from six Chinese Tor websites to build a dark corpus dataset. Afterwards, we build a word-based pre-training model called DC-BERT, which can generate high-quality contextual word embeddings. Finally, we construct a cross-corpus jargon detection framework based on similarity analysis, which can effectively detect Chinese jargons in the darknet. The experimental results show that the proposed framework is both innovative and practical, reaching a detection accuracy of 91.5%.

ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic

Unsupervised opinion summarization techniques are designed to condense the review data and summarize informative and salient opinions in the absence of golden references. Existing dominant methods generally follow a two-stage framework: first creating the synthetic "review-summary" paired datasets and then feeding them into the generative summary model for supervised training. However, these methods mainly focus on semantic similarity in synthetic dataset creation, ignoring the consistency of aspects and sentiments in synthetic pairs. Such inconsistency also brings a gap to the training and inference of the summarization model.

To alleviate this problem, we propose ConsistSum, an unsupervised opinion summarization method devoting to capture the consistency of aspects and sentiment between reviews and summaries. Specifically, ConsistSum first extracts the preliminary "review-summary" pairs from the raw corpus by evaluating the distance of aspect distribution and sentiment distribution. Then, we refine the preliminary summary with the constrained Metropolis-Hastings sampling to produce highly consistent synthetic datasets. In the summarization phase, we adopt the generative model T5 as the summarization model. T5 is fine-tuned for the opinion summarization task by incorporating the loss of predicting aspect and opinion distribution. Experimental results on two benchmark datasets, $i.e.$, Yelp and Amazon, demonstrate the superior performance of ConsistSum over the state-of-the-art baselines.

Cluster-Aware Heterogeneous Information Network Embedding

Heterogeneous Information Network (HIN) embedding refers to the low-dimensional projections of the HIN nodes that preserve the HIN structure and semantics. HIN embedding has emerged as a promising research field for network analysis as it enables downstream tasks such as clustering and node classification. In this work, we propose VaCA-HINE for joint learning of cluster embeddings as well as cluster-aware HIN embedding. We assume that the connected nodes are highly likely to fall in the same cluster, and adopt a variational approach to preserve the information in the pairwise relations in a cluster-aware manner. In addition, we deploy contrastive modules to simultaneously utilize the information in multiple meta-paths, thereby alleviating the meta-path selection problem - a challenge faced by many of the famous HIN embedding approaches. The HIN embedding, thus learned, not only improves the clustering performance but also preserves pairwise proximity as well as the high-order HIN structure. We show the effectiveness of our approach by comparing it with many competitive baselines on three real-world datasets on clustering and downstream node classification.

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application to the ranking setting faces a critical variance issue due to the huge item space. To tackle this problem, previous studies introduce some assumptions on user behavior to make the combinatorial item space tractable. However, an unrealistic assumption may, in turn, cause serious bias. Therefore, appropriately controlling the bias-variance tradeoff by imposing a reasonable assumption is the key for success in OPE of ranking policies. To achieve a well-balanced bias-variance tradeoff, we propose the Cascade Doubly Robust estimator building on the cascade assumption, which assumes that a user interacts with items sequentially from the top position in a ranking. We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions on user behavior. Furthermore, compared to a previous estimator based on the same cascade assumption, the proposed estimator reduces the variance by leveraging a control variate. Comprehensive experiments on both synthetic and real-world e-commerce data demonstrate that our estimator leads to more accurate OPE than existing estimators in a variety of settings.

Introducing the Expohedron for Efficient Pareto-optimal Fairness-Utility Amortizations in Repeated Rankings

We consider the problem of computing a sequence of rankings that maximizes consumer-side utility while minimizing producer-side individual unfairness of exposure. While prior work has addressed this problem using linear or quadratic programs on bistochastic matrices, such approaches, relying on Birkhoff-von Neumann (BvN) decompositions, are too slow to be implemented at large scale. In this paper we introduce a geometrical object, a polytope that we call expohedron, whose points represent all achievable exposures of items for a Position Based Model (PBM). We exhibit some of its properties and lay out a Carathéodory decomposition algorithm with complexity $O(n^2łog(n))$ able to express any point inside the expohedron as a convex sum of at most n vertices, where n is the number of items to rank. Such a decomposition makes it possible to express any feasible target exposure as a distribution over at most n rankings. Furthermore we show that we can use this polytope to recover the whole Pareto frontier of the multi-objective fairness-utility optimization problem, using a simple geometrical procedure with complexity $O(n^2łog(n))$. Our approach compares favorably to linear or quadratic programming baselines in terms of algorithmic complexity and empirical runtime and is applicable to any merit that is a non-decreasing function of item relevance. Furthermore our solution can be expressed as a distribution over only $\ndoc$ permutations, instead of the $(n-1)^2 + 1$ achieved with BvN decompositions. We perform experiments on synthetic and real-world datasets, confirming our theoretical results.

Multi-Resolution Attention for Personalized Item Search

Personalized item search has become an essential tool for online platforms---where users interact with a large corpus of items (e.g., click, purchase, like) via a search query---to provide their users with a more satisfactory search experience. The record (or history) of users' past interactions serves as a valuable asset to achieve personalization. While user history data can span over a long period of time, only a part of the history is relevant to a user's current search intent. Moreover, since historical interactions take place at aperiodic points in time, modeling their relevance to the current search query entangles complex temporal dependencies. We propose multi-resolution attention to address these challenges for personalized item search. Our approach captures higher-order temporal relations between user queries and their history across several temporal subspaces (i.e., resolutions), each corresponding to distinct temporal ranges with adaptive time boundaries that are also learned directly from data. We achieve this by coupling the conventional multi-head attention module with a differentiable soft-thresholding mechanism, which essentially operates as a masking function in the temporal domain. Comparisons with strong baselines on an open-source benchmark dataset confirm the efficacy of our approach.

Linear, or Non-Linear, That is the Question!

There were fierce debates on whether the non-linear embedding propagation of GCNs is appropriate to GCN-based recommender systems. It was recently found that the linear embedding propagation shows better accuracy than the non-linear embedding propagation. Since this phenomenon was discovered especially in recommender systems, it is required that we carefully analyze the linearity and non-linearity issue. In this work, therefore, we revisit the issues of i) which of the linear or non-linear propagation is better and ii) which factors of users/items decide the linearity/non-linearity of the embedding propagation. We propose a novel Hybrid method of linear and non-linear collaborative filtering method (HMLET, pronounced as Hamlet). In our design, there exist both linear and non-linear propagation steps, when processing each user or item node, and our gating module chooses one of them, which results in a hybrid model of the linear and non-linear GCN-based collaborative filtering (CF). The proposed model yields the best accuracy in three public benchmark datasets. Moreover, we classify users/items into the following three classes depending on our gating modules' selections: Full-Non-Linearity (FNL), Partial-Non-Linearity (PNL), and Full-Linearity (FL). We found that there exist strong correlations between nodes' centrality and their class membership, i.e., important user/item nodes exhibit more preferences towards the non-linearity during the propagation steps. To our knowledge, we are the first who design a hybrid method and report the correlation between the graph centrality and the linearity/non-linearity of nodes. All HMLET codes and datasets are available at:

Efficient Two-stage Label Noise Reduction for Retrieval-based Tasks

The existence of noisy labels in datasets has always been an essential dilemma in deep learning studies. Previous works detected noisy labels by analyzing the predicted probability distribution generated by the model trained on the same data and calculating the probabilities of each label to be regarded as noise. However, the predicted probability distribution from the whole dataset may introduce overfitting, and the overfitting on noisy labels may induce the probability distribution of clean and noisy items to be not conditional independent, making identification more challenging. Additionally, label noise reduction on image datasets has received much attention, while label noise reduction on text datasets has not. This paper proposes a noisy label reduction method for text datasets, which could be applied at retrieval-based tasks by getting a conditional independent probability distribution to identify noisy labels accurately. The method first generates a candidate set containing noisy labels, predicts the category probabilities by the model trained on the rest cleaner data, and then identifies noisy items by analyzing a confidence matrix. Moreover, we introduce a warm-up module and a sharpened cross-entropy loss function for efficiently training in the first stage. Empirical results on different rates of uniform and random label noise in five text datasets demonstrate that our method can improve the label noise reduction accuracy and end-to-end classification accuracy. Further, we find that the iteration of the label noise reduction method is efficient to high-rate label noise datasets, and our method will not hurt clean datasets too much.

Differential Query Semantic Analysis: Discovery of Explicit Interpretable Knowledge from E-Com Search Logs

We present a novel strategy for analyzing E-Com search logs called Differential Query Semantic Analysis (DQSA) to discover explicit interpretable knowledge from search logs in the form of a semantic lexicon that makes context-specific mapping from a query segment (word or phrase) to the preferred attribute values of a product. Evaluation on a set of size-related query segments and attribute values shows that DQSA can effectively discover meaningful mappings of size-related query segments to their preferred specific attributes and attributes values in the context of a product type. DQSA has many uses including improvement of E-Com search accuracy by bridging the vocabulary gap, comparative analysis of search intent, and alleviation of the problem of tail queries and products.

Collaborative Curating for Discovery and Expansion of Visual Clusters

In many visually-oriented applications, users can select and group images that they find interesting into coherent clusters. For instance, we encounter these in the form of hashtags on Instagram, galleries on Flickr, or boards on Pinterest. The selection and coherence of such user-curated visual clusters arise from a user's preference for a certain type of content as well as her own perception of which images are similar and thus belong to a cluster. We seek to model such curation behaviors towards supporting users in their future activities such as expanding existing clusters or discovering new clusters altogether. This paper proposes a framework, namely COLLABORATIVE CURATING that jointly models the interrelated modalities of preference expression and similarity perception. Extensive experiments on real-world datasets from a visual curating platform show that the proposed framework significantly outperforms baselines focusing on either clustering behaviors or preferences alone.

A Cooperative Neural Information Retrieval Pipeline with Knowledge Enhanced Automatic Query Reformulation

This paper presents a neural information retrieval pipeline that integrates cooperative learning of query reformulation and neural retrieval models. Our pipeline first exploits an automatic query reformulator to reformulate the user-issued query and then submits the reformulated query to the neural retrieval model. We simultaneously optimize the quality of reformulated queries and ranking performance with an alternate training strategy where query reformulator and neural retrieval model learn from the feedback of each other. Besides, we incorporate knowledge information into automatic query reformulation. The reformulated queries are further improved and contribute to a better ranking performance of the following neural retrieval model. We study two representative neural retrieval models KNRM and BERT in our pipeline. Experiments on two datasets show that our pipeline consistently improves the retrieval performance of the original neural retrieval models while only increases negligible time on automatic query reformulation.

Unsupervised Cross-Domain Adaptation for Response Selection Using Self-Supervised and Adversarial Training

Recently, many neural context-response matching models have been developed for retrieval-based dialogue systems. Although existing models achieve impressive performance through learning on a large amount of in-domain parallel dialogue data, they usually perform worse in another new domain. How to transfer a response retrieval model trained in high-resource domains to other low-resource domains is a crucial problem for scalable dialogue systems. To this end, we investigate the unsupervised cross-domain adaptation for response selection when the target domain has no parallel dialogue data. Specifically, we propose a two-stage method to adapt a response selection model to a new domain using self-supervised and adversarial training based on pre-trained language models (PLMs). To efficiently incorporate domain awareness and target-domain knowledge to PLMs, we first design a self-supervised post-training procedure, including domain discrimination (DD) task, target-domain masked language model (MLM) task and target-domain next sentence prediction (NSP) task. Based on this, we further conduct the adversarial fine-tuning to empower the model to match the proper response with extracted domain-shared features as much as possible. Experimental results show that our proposed method achieves consistent and significant improvements on several cross-domain response selection datasets.

RecGURU: Adversarial Learning of Generalized User Representations for Cross-Domain Recommendation

Cross-domain recommendation can help alleviate the data sparsity issue in traditional sequential recommender systems. In this paper, we propose the RecGURU algorithm framework to generate a Generalized User Representation (GUR) incorporating user information across domains in sequential recommendation, even when there is minimum or no common users in the two domains. We propose a self-attentive autoencoder to derive latent user representations, and a domain discriminator, which aims to predict the origin domain of a generated latent representation. We propose a novel adversarial learning method to train the two modules to unify user embeddings generated from different domains into a single global GUR for each user. The learned GUR captures the overall preferences and characteristics of a user and thus can be used to augment the behavior data and improve recommendations in any single domain in which the user is involved. Extensive experiments have been conducted on two public cross-domain recommendation datasets as well as a large dataset collected from real-world applications. The results demonstrate that RecGURU boosts performance and outperforms various state-of-the-art sequential recommendation and cross-domain recommendation methods. The collected data will be released to facilitate future research.

Graph Embedding with Hierarchical Attentive Membership

This paper studies a remarkable property of graphs which is the latent hierarchical grouping of nodes, where each node manifests its membership to a specific group based on the context composed by its neighboring nodes. When modeling the neighborhood structure for graph representation learning, most prior works ignore such latent groups and nodes' membership to different groups, not to mention the hierarchy. Thus, they fall short of delivering a comprehensive understanding of the nodes under different contexts in a graph. In this paper, we propose a novel hierarchical attentive membership model for graph embedding, where the latent memberships for each node are dynamically discovered based on its neighboring context. Both group-level and individual-level attentions are performed when aggregating neighboring states to generate node embeddings. We introduce structural constraints to explicitly regularize the inferred memberships of each node, such that a well-defined hierarchical grouping structure is captured. The proposed model outperformed a set of state-of-the-art graph embedding solutions on node classification and link prediction tasks in a variety of graphs including citation networks and social networks. Qualitative evaluations visualize the learned node embeddings along with the inferred memberships, which proved the concept of membership hierarchy and enables explainable embedding learning in graphs.

Surrogate Representation Learning with Isometric Mapping for Gray-box Graph Adversarial Attacks

Gray-box graph attacks aim to disrupt the victim model's performance by using inconspicuous attacks with limited knowledge of the victim model. The details of the victim model and the labels of the test nodes are invisible to the attacker. The attacker constructs an imaginary surrogate model trained under supervision to obtain the gradient on the node attributes or graph structure. However, there is a lack of discussion on the training of surrogate models and the reliability of provided gradient information. The general node classification models lose the topology of the nodes on the graph, which is, in fact, an exploitable prior for the attacker. This paper investigates the effect of surrogate representation learning on the transferability of gray-box graph adversarial attacks. We propose Surrogate Representation Learning with Isometric Mapping (SRLIM) to reserve the topology in the surrogate embedding. By isometric mapping, our proposed SRLIM can constrain the topological structure of nodes from the input layer to the embedding space, that is, to maintain the similarity of nodes in the propagation process. Experiments prove the effectiveness of our approach through the improvement in the performance of the adversarial attacks generated by the gradient-based attacker in untargeted poisoning gray-box scenarios.

Improving Knowledge Tracing with Collaborative Information

Knowledge tracing, which estimates students' knowledge states by predicting the probability that they correctly answer questions, is an essential task for online learning platforms. It has gained much attention in the decades due to its importance to downstream tasks like learning material arrangement, etc. The previous deep learning-based methods trace students' knowledge states with the explicitly intra-student information, i.e., they only consider the historical information of individuals to make predictions. However, they neglect the inter-student information, which contains the response correctness of other students who have similar question-answering experiences, may offer some valuable clues. Based on this consideration, we propose a method called Collaborative Knowledge Tracing (CoKT) in this paper, which sufficiently exploits the inter-student information in knowledge tracing. It retrieves the sequences of peer students who have similar question-answering experiences to obtain the inter-student information, and integrates the inter-student information with the intra-student information to trace students' knowledge states and predict their correctness in answering questions. We validate the effectiveness of our method on four real-world datasets and compare it with 11 baselines. The experimental results reveal that CoKT achieves the best performance.

An Ensemble Model for Combating Label Noise

The labels crawled from web services (e.g. querying images from search engines and collecting tags from social media images) are often prone to noise, and the presence of such label noise degrades the classification performance of the resulting deep neural network (DNN) models. In this paper, we propose an ensemble model consisting of two networks to prevent the model from memorizing noisy labels. Within our model, we have one network generate an anchoring label from its prediction on a weakly-augmented image. Meanwhile, we force its peer network, taking the strongly-augmented version of the same image as input, to generate prediction close to the anchoring label for knowledge distillation. By observing the loss distribution, we use a mixture model to dynamically estimate the clean probability of each training sample and generate a confidence clean set. Then we train both networks simultaneously by the clean set to minimize our loss function which contains unsupervised matching loss (i.e., measure the consistency of the two networks) and supervised classification loss (i.e. measure the classification performance). We theoretically analyze the gradient of our loss function to show that it implicitly prevents memorization of the wrong labels. Experiments on two simulated benchmarks and one real-world dataset demonstrate that our approach achieves substantial improvements over the state-of-the-art methods.

Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization

For years, machine learning has become the dominant approach to a variety of information retrieval tasks. The performance of machine learning algorithms heavily depends on their hyperparameters. It is hence critical to identity the optimal hyperparameter configuration when applying machine learning algorithms. Most of existing hyperparameter optimization methods assume a static relationship between hyperparameter configuration and algorithmic performance and are thus not suitable for many information retrieval applications with non-stationary environments such as e-commerce recommendation and online advertising. To address this limitation, we study online hyperparameter optimization, where the hyperparameter configuration is optimized on the fly. We formulate online hyperparameter optimization as a non-stationary continuum-armed bandits problem in which each arm corresponds to a hyperparameter configuration and the algorithmic performance is viewed as reward. For this problem, we develop principled methods with strong theoretical guarantees in terms of dynamic regret. The key idea is to adaptively discretize the continuous arm set and estimate the mean reward of each arm via weighted averaging. As a case application, we show how our methods can be applied to optimize the hyperparameter of vector-based candidate generation algorithm and empirically demonstrate the effectiveness and efficiency of our methods on public advertising dataset and online A/B testing. Furthermore, to the best of our knowledge, our methods are the first to achieve sub-linear dynamic regret bounds for continuum-armed bandits, which may be of independent interest.

The Multi-vehicle Ride-Sharing Problem

Ride-sharing is one of the most popular models of economical and eco-friendly transportation in modern smart cities, especially when riding hybrid and electric vehicles. Usually multiple passengers with similar itineraries are grouped together, which significantly reduces travel cost (or time), road congestion, and traffic emissions. In this paper, we study the ride-sharing problem where each vehicle is shared by exactly $łambda$ riders for any fixed $łambda>0$, and the goal is to minimize the total travel distance. The min-cost ride-sharing problem is intractable even in the case of exactly two riders sharing a vehicle \citeBeiZ18-carsharing, and hence we can only hope for an approximate solution. We propose a novel two-phase algorithm: a hierarchical grouping phase that partitions requests into disjoint groups of fixed size, followed by an assignment of request groups to individual vehicles and planning a feasible route for each vehicle. This is the first non-trivial approximation algorithm for the ride-sharing problem with vehicle capacity larger than two. We verify the efficacy of our algorithm on both synthetic and realworld datasets. Our experimental results show that, the ride-sharing scheme produced by our algorithm not only has small total travel distance compared to state-of-the-art baselines, but also enjoys a small makespan and total latency, which crucially relate to each single rider's traveling time. This suggests that our algorithm also enhances rider experience while being energy-efficient.

Ada-GNN: Adapting to Local Patterns for Improving Graph Neural Networks

Graph Neural Networks (GNNs) have demonstrated strong power in mining various graph-structure data. Since real-world graphs are usually on a large scale, training scalable GNNs has become one of the research trends in recent years. Existing methods only produce one single model to serve all nodes. However, different nodes may exhibit various properties thus require diverse models, especially when the graph is large. Forcing all nodes to share a unified model will decrease the model's expressiveness. What is worse, some small groups' patterns are prone to be ignored by the model due to their minority, making these nodes unpredictable and even some raising potential unfairness problems. In this paper, we propose a model-agnostic framework Ada-GNN that provides personalized GNN models for specific sets of nodes. Intuitively, it is desirable that every node has its own model. But considering the efficiency and scalability of the framework, we generate specific GNN models at the subgroup-level rather than individual node-level. To be specific, Ada-GNN first splits the original graph into several non-overlapped subgroups and tags each node with its subgroup label. After that, a meta adapter is proposed to adapt a base GNN model to each subgroup rapidly. To better facilitate the global-to-local knowledge adaption, we design a feature enhancement module that captures the distinctions among different subgroups to improve the Ada-GNN's performance. Ada-GNN is model-agnostic and can be equipped to almost all existing scalable GNN based methods such as GraphSAGE, ClusterGCN, SIGN, and SAGN. We conduct extensive experiments with six popular scalable GNN as base methods on two large-scale datasets, and the results consistently demonstrate the generality and superiority of Ada-GNN.

RLMob: Deep Reinforcement Learning for Successive Mobility Prediction

Human mobility prediction is an important task in the field of spatiotemporal sequential data mining and urban computing. Despite the extensive work on mining human mobility behavior, little attention was paid to the problem of successive mobility prediction. The state-of-the-art methods of human mobility prediction are mainly based on supervised learning. To achieve higher predictability and adapt well to the successive mobility prediction, there are four key challenges: 1) disability to the circumstance that the optimizing target is discrete-continuous hybrid and non-differentiable. In our work, we assume that the user's demands are always multi-targeted and can be modeled as a discrete-continuous hybrid function; 2) difficulty to alter the recommendation strategy flexibly according to the changes in user needs in real scenarios; 3) error propagation and exposure bias issues when predicting multiple points in successive mobility prediction; 4) cannot interactively explore user's potential interest that does not appear in the history. While previous methods met these difficulties, reinforcement learning (RL) is an intuitive answer for this task to settle these issues. We innovatively introduce RL to the successive prediction task. In this paper, we formulate this problem as a Markov Decision Process. We further propose a framework - RLMob to solve our problem. A simulated environment is carefully designed. An actor-critic framework with an instance of Proximal Policy Optimization (PPO) is applied to adapt to our scene with a large state space. Experiments show that on the task, the performance of our approach is consistently superior to that of the compared approaches.

ComGA: Community-Aware Attributed Graph Anomaly Detection

Graph anomaly detection, here, aims to find rare patterns that are significantly different from other nodes. Attributed graphs containing complex structure and attribute information are ubiquitous in our life scenarios such as bank account transaction graph and paper citation graph. Anomalous nodes on attributed graphs show great difference from others in the perspectives of structure and attributes, and give rise to various types of graph anomalies. In this paper, we investigate three types of graph anomalies: local, global, and structure anomalies. And, graph neural networks (GNNs) based anomaly detection methods attract considerable research interests due to the power of modeling attributed graphs. However, the convolution operation of GNNs aggregates neighbors information to represent nodes, which makes node representations more similar and cannot effectively distinguish between normal and anomalous nodes, thus result in sub-optimal results. To improve the performance of anomaly detection, we propose a novel community-aware attributed graph anomaly detection framework (ComGA). We design a tailored deep graph convolutional network (tGCN) to anomaly detection on attributed graphs. Extensive experiments on eight real-life graph datasets demonstrate the effectiveness of ComGA.

VAE++: Variational AutoEncoder for Heterogeneous One-Class Collaborative Filtering

Neural network-based models for collaborative filtering have received widespread attention, among which variational autoencoder (VAE) has shown unique advantages in the task of item recommendation. However, most existing VAE-based models only focus on one type of user feedback, leading to their performance bottlenecks. To overcome this limitation, we propose a novel VAE-based recommendation model called VAE++, which can effectively utilize heterogeneous feedback to boost recommendation performance. Specifically, it combines three different types of signals, i.e., purchase feedback, examination feedback and their mixed feedback, via two well-designed modules, i.e., a target representation enhancement module and a target representation refinement module. The former exploits the mixed feedback to improve the learning of purchase representations, while the latter leverages the examination feedback to further refine them. In particular, purchase and examination preferences are jointly decoded in one decoder to ensure the high quality of the reconstructed samples. Extensive experiments on three public datasets show that our VAE++ achieves the best results compared with several state-of-the-art methods.

Adversarial Attack on Graph Neural Networks as An Influence Maximization Problem

Graph neural networks (GNNs) have attracted increasing interests. With broad deployments of GNNs in real-world applications, there is an urgent need for understanding the robustness of GNNs under adversarial attacks, especially in realistic setups. In this work, we study the problem of attacking GNNs in a restricted and realistic setup, by perturbing the features of a small set of nodes, with no access to model parameters and model predictions. Our formal analysis draws a connection between this type of attacks and an influence maximization problem on the graph. This connection not only enhances our understanding on the problem of adversarial attack on GNNs, but also allows us to propose a group of effective and practical attack strategies. Our experiments verify that the proposed attack strategies significantly degrade the performance of three popular GNN models and outperform baseline adversarial attack strategies.

Diversified Subgraph Query Generation with Group Fairness

This paper investigates the problem of subgraph query generation with output that satisfies both diversity and fairness constraints. Given a set of groups with associated cardinality requirements, it is to compute subgraph queries with diversified output that meanwhile covers the groups with the desired cardinality. Such need is evident in web and social search with fairness constraints. We formalize subgraph query generation as a bi-criteria optimization problem on the diversity and fairness properties of queries, and verify its hardness and approximability. We show that the problem is in Σp2 , and remains NP-complete even for single-node queries. Despite the hardness, (1) we show that approximations exist whenever a corresponding subset selection process provides good solutions, and provide feasible algorithms with performance guarantees for two practical query generation scenarios. We also present a fast heuristic algorithm for the general problem, which early terminates without enumerating queries. We experimentally verify that our algorithms can efficiently generate queries with desired diversity and coverage properties for targeted groups.

Learning Fair Node Representations with Graph Counterfactual Fairness

Fair machine learning aims to mitigate the biases of model predictions against certain subpopulations regarding sensitive attributes such as race and gender. Among the many existing fairness notions, counterfactual fairness measures the model fairness from a causal perspective by comparing the predictions of each individual from the original data and the counterfactuals. In counterfactuals, the sensitive attribute values of this individual had been modified. Recently, a few works extend counterfactual fairness to graph data, but most of them neglect the following facts that can lead to biases: 1) the sensitive attributes of each node's neighbors may causally affect the prediction w.r.t. this node; 2) the sensitive attributes may causally affect other features and the graph structure. To tackle these issues, in this paper, we propose a novel fairness notion - graph counterfactual fairness, which considers the biases led by the above facts. To learn node representations towards graph counterfactual fairness, we propose a novel framework based on counterfactual data augmentation. In this framework, we generate counterfactuals corresponding to perturbations on each node's and their neighbors' sensitive attributes. Then we enforce fairness by minimizing the discrepancy between the representations learned from the original graph and the counterfactuals for each node. Experiments on both synthetic and real-world graphs show that our framework outperforms the state-of-the-art baselines in graph counterfactual fairness, and also achieves comparable prediction performance.

Deep Graph-level Anomaly Detection by Glocal Knowledge Distillation

Graph-level anomaly detection (GAD) describes the problem of detecting graphs that are abnormal in their structure and/or the features of their nodes, as compared to other graphs. One of the challenges in GAD is to devise graph representations that enable the detection of both locally- and globally-anomalous graphs, i.e., graphs that are abnormal in their fine-grained (node-level) or holistic (graph-level) properties, respectively. To tackle this challenge we introduce a novel deep anomaly detection approach for GAD that learns rich global and local normal pattern information by joint random distillation of graph and node representations. The random distillation is achieved by training one GNN to predict another GNN with randomly initialized network weights. Extensive experiments on 16 real-world graph datasets from diverse domains show that our model significantly outperforms seven state-of-the-art models. Code and datasets are available at

Fast Learning of MNL Model from General Partial Rankings with Application to Network Formation Modeling

Multinomial Logit (MNL) is one of the most popular discrete choice models and has been widely used to model ranking data. However, there is a long-standing technical challenge of learning MNL from many real-world ranking data: exact calculation of the MNL likelihood of partial rankings is generally intractable. In this work, we develop a scalable method for approximating the MNL likelihood of general partial rankings in polynomial time complexity. We also extend the proposed method to learn mixture of MNL. We demonstrate that the proposed methods are particularly helpful for applications to choice-based network formation modeling, where the formation of new edges in a network is viewed as individuals making choices of their friends over a candidate set. The problem of learning mixture of MNL models from partial rankings naturally arises in such applications. And the proposed methods can be used to learn MNL models from network data without the strong assumption that temporal orders of all the edge formation are available. We conduct experiments on both synthetic and real-world network data to demonstrate that the proposed methods achieve more accurate parameter estimation and better fitness of data compared to conventional methods.

Pretraining Multi-modal Representations for Chinese NER Task with Cross-Modality Attention

Named Entity Recognition (NER) aims to identify the pre-defined entities from the unstructured text. Compared with English NER, Chinese NER faces more challenges: the ambiguity problem in entity boundary recognition due to unavailable explicit delimiters between Chinese characters, and the out-of-vocabulary (OOV) problem caused by rare Chinese characters. However, two important features specific to the Chinese language are ignored by previous studies: glyphs and phonetics, which contain rich semantic information of Chinese. To overcome these issues by exploiting the linguistic potential of Chinese as a logographic language, we present MPM-CNER (short for Multi-modal Pretraining Model for Chinese NER), a model for learning multi-modal representations of Chinese semantics, glyphs, and phonetics, via four pretraining tasks: Radical Consistency Identification (RCI), Glyph Image Classification (GIC), Phonetic Consistency Identification (PCI), and Phonetic Classification Modeling (PCM). Meanwhile, a novel cross-modality attention mechanism is proposed to fuse these multimodal features for further improvement. The experimental results show that our method outperforms the state-of-the-art baseline methods on four benchmark datasets, and the ablation study also verifies the effectiveness of the pre-trained multi-modal representations.

Speaker and Time-aware Joint Contextual Learning for Dialogue-act Classification in Counselling Conversations

The onset of the COVID-19 pandemic has brought the mental health of people under risk. Social counselling has gained remarkable significance in this environment. Unlike general goal-oriented dialogues, a conversation between a patient and a therapist is considerably implicit, though the objective of the conversation is quite apparent. In such a case, understanding the intent of the patient is imperative in providing effective counselling in therapy sessions, and the same applies to a dialogue system as well. In this work, we take forward a small but an important step in the development of an automated dialogue system for mental-health counselling. We develop a novel dataset, named HOPE, to provide a platform for the dialogue-act classification in counselling conversations. We identify the requirement of such conversation and propose twelve domain-specific dialogue-act (DAC) labels. We collect ~ 12.9K utterances from publicly-available counselling session videos on YouTube, extract their transcripts, clean, and annotate them with DAC labels. Further, we propose SPARTA, a transformer-based architecture with a novel speaker- and time-aware contextual learning for the dialogue-act classification. Our evaluation shows convincing performance over several baselines, achieving state-of-the-art on HOPE. We also supplement our experiments with extensive empirical and qualitative analyses of SPARTA.

Learning Relevant Questions for Conversational Product Search using Deep Reinforcement Learning

We propose RelQuest, a conversational product search model based on reinforcement learning to generate questions from product descriptions in each round of the conversation, directly maximizing any desired metrics (i.e., the ultimate goal of the conversation), objectives, or even an arbitrary user satisfaction signal. By enabling systems to ask questions about user needs, conversational product search has gained increasing attention in recent years. Asking the right questions through conversations helps the system collect valuable feedback to create better user experiences and ultimately increase sales. In contrast, existing conversational product search methods are based on an assumption that there is a set of effectively pre-defined candidate questions for each product to be asked. Moreover, they make strong assumptions to estimate the value of questions in each round of the conversation. Estimating the true value of questions in each round of the conversation is not trivial since it is unknown. Experiments on real-world user purchasing data show the effectiveness of RelQuest at generating questions that maximize standard evaluation measures such as NDCG.

MTLTS: A Multi-Task Framework To Obtain Trustworthy Summaries From Crisis-Related Microblogs

Occurrences of catastrophes such as natural or man-made disasters trigger the spread of rumours over social media at a rapid pace. Presenting a trustworthy and summarized account of the unfolding event in near real-time to the consumers of such potentially unreliable information thus becomes an important task. In this work, we propose MTLTS, the first end-to-end solution for the task that jointly determines the credibility and summary-worthiness of tweets. Our credibility verifier is designed to recursively learn the structural properties of a Twitter conversation cascade, along with the stances of replies towards the source tweet. We then take a hierarchical multi-task learning approach, where the verifier is trained at a lower layer, and the summarizer is trained at a deeper layer where it utilizes the verifier predictions to determine the salience of a tweet. Different from existing disaster-specific summarizers, we model tweet summarization as a supervised task. Such an approach can automatically learn summary-worthy features, and can therefore generalize well across domains. When trained on the PHEME dataset [29], not only do we outperform the strongest baselines for the auxiliary task of verification/rumour detection, we also achieve 21 - 35% gains in the verified ratio of summary tweets, and 16 - 20% gains in ROUGE1-F1 scores over the existing state-of-the-art solutions for the primary task of trustworthy summarization.

Reconfiguration Problems on Submodular Functions

\emphReconfiguration problems require finding a step-by-step transformation between a pair of feasible solutions for a particular problem. The primary concern in Theoretical Computer Science has been revealing their computational complexity for classical problems.

This paper presents an initial study on reconfiguration problems derived from a submodular function, which has more of a flavor of Data Mining. Our submodular reconfiguration problems request to find a solution sequence connecting two input solutions such that each solution has an objective value above a threshold in a submodular function $f: 2^[n] \to \mathbbR _+$ and is obtained from the previous one by applying a simple transformation rule. We formulate three reconfiguration problems: Monotone Submodular Reconfiguration (MSReco), which applies to influence maximization, and two versions of Unconstrained Submodular Reconfiguration (USReco), which apply to determinantal point processes. Our contributions are summarized as follows: \beginitemize \item We prove that MSReco and USReco are both PSPACE-complete. \item We design a $\frac1 2 $-approximation algorithm for MSReco and a $\frac1 n $-approximation algorithm for (one version of) USReco. \item We devise inapproximability results that approximating the optimum value of MSReco within a $(1-\frac1+ε n^2 )$-factor is PSPACE-hard, and we cannot find a $(\frac5 6 +ε)$-approximation for USReco. \item We conduct numerical study on the reconfiguration version of influence maximization and determinantal point processes using real-world social network and movie rating data. \enditemize

Heterogeneous Global Graph Neural Networks for Personalized Session-based Recommendation

Predicting the next interaction of a short-term interaction session is a challenging task in session-based recommendation. Almost all existing works rely on item transition patterns, and neglect user historical sessions while modeling user preference, which often leads to non-personalized recommendation. And existing personalized session-based recommenders are limited to sessions of the current user, and ignore the useful item-transition patterns from other user's historical sessions. To address these issues, we propose a novel Heterogeneous Global Graph Neural Networks (HG-GNN) to exploit the item transitions over all sessions in a subtle manner for better inferring user preference from the current and historical sessions. To effectively exploit the item transitions over all sessions from users, our global graph contains item transitions of sessions, user-item interactions and global co-occurrence items. Moreover, to capture user preference from sessions comprehensively, we propose a graph augmented preference encoder to learn the session representation. Specifically, we design a novel heterogeneous graph neural network (HGNN) on heterogeneous global graph to learn long-term user preference and item representations with rich semantics. Based on the HGNN, we propose the Personalized Session Encoder to combine the general user preference and temporal interest of the current session to generate the personalized session representation for recommendation. Extensive experimental results on three real-world datasets show that our model outperforms other state-of-the-art methods.

Reinforcement Learning over Sentiment-Augmented Knowledge Graphs towards Accurate and Explainable Recommendation

Explainable recommendation has gained great attention in recent years. A lot of work in this research line has chosen to use the knowledge graphs (KG) where relations between entities can serve as explanations. However, existing studies have not considered sentiment on relations in KG, although there can be various types of sentiment on relations worth considering (e.g., a user's satisfaction on an item). In this paper, we propose a novel recommendation framework based on KG integrated with sentiment analysis for more accurate recommendation as well as more convincing explanations. To this end, we first construct a Sentiment-Aware Knowledge Graph (namely, SAKG) by analyzing reviews and ratings on items given by users. Then, we perform item recommendation and reasoning over SAKG through our proposed Sentiment-Aware Policy Learning (namely, SAPL) based on a reinforcement learning strategy. To enhance the explainability for end-users, we further developed an interactive user interface presenting textual explanations as well as a collection of reviews related with the discovered sentiment. Experimental results on three real-world datasets verified clear improvements on both the accuracy of recommendation and the quality of explanations.

EvoKG: Jointly Modeling Event Time and Network Structure for Reasoning over Temporal Knowledge Graphs

How can we perform knowledge reasoning over temporal knowledge graphs (TKGs)? TKGs represent facts about entities and their relations, where each fact is associated with a timestamp. Reasoning over TKGs, i.e., inferring new facts from time-evolving KGs, is crucial for many applications to provide intelligent services. However, despite the prevalence of real-world data that can be represented as TKGs, most methods focus on reasoning over static knowledge graphs, or cannot predict future events. In this paper, we present a problem formulation that unifies the two major problems that need to be addressed for an effective reasoning over TKGs, namely, modeling the event time and the evolving network structure. Our proposed method EvoKG jointly models both tasks in an effective framework, which captures the ever-changing structural and temporal dynamics in TKGs via recurrent event modeling, and models the interactions between entities based on the temporal neighborhood aggregation framework. Further, EvoKG achieves an accurate modeling of event time, using flexible and efficient mechanisms based on neural density estimation. Experiments show that EvoKG outperforms existing methods in terms of effectiveness (up to 77% and 116% more accurate time and link prediction) and efficiency.

Scope-aware Re-ranking with Gated Attention in Feed

Modern recommendation systems introduce the re-ranking stage to optimize the entire list directly. This paper focuses on the design of re-ranking framework in feed to optimally model the mutual influence between items and further promote user engagement. On mobile devices, users browse the feed almost in a top-down manner and rarely compare items back and forth. Besides, users often compare item with its adjacency based on their partial observations. Given the distinct user behavior patterns, the modeling of mutual influence between items should be carefully designed. Existing re-ranking models encode the mutual influence between items with sequential encoding methods. However, previous works may be dissatisfactory due to the ignorance of connections between items on different scopes. In this paper, we first discuss Unidirectivity and Locality on the impacts and consequences, then report corresponding solutions in industrial applications. We propose a novel framework based on the empirical evidence from user analysis. To address the above problems, we design a \underlineS cope-aware \underlineR e-ranking with \underlineG ated \underlineA ttention model (SRGA ) to emulate the user behavior patterns from two aspects: 1) we emphasize the influence along the user's common browsing direction; 2) we strength the impacts of pivotal adjacent items within the user visual window. Specifically, we design a global scope attention to encode inter-item patterns unidirectionally from top to bottom. Besides, we devise a local scope attention sliding over the recommendation list to underline interactions among neighboring items. Furthermore, we design a learned gate mechanism to aggregating the information dynamically from local and global scope attention. Extensive offline experiments and online A/B testing demonstrate the benefits of our novel framework. The proposed SRGA model achieves the best performance in offline metrics compared with the state-of-the-art re-ranking methods. Further, empirical results on live traffic validate that our recommender system, equipped with SRGA in the re-ranking stage, improves significantly in user engagement.

Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation

Recent advancements of sequential deep learning models such as Transformer and BERT have significantly facilitated the sequential recommendation. However, according to our study, the distribution of item embeddings generated by these models tends to degenerate into an anisotropic shape, which may result in high semantic similarities among embeddings. In this paper, both empirical and theoretical investigations of this representation degeneration problem are first provided, based on which a novel recommender model DuoRec is proposed to improve the item embeddings distribution. Specifically, in light of the uniformity property of contrastive learning, a contrastive regularization is designed for DuoRec to reshape the distribution of sequence representations. Given the convention that the recommendation task is performed by measuring the similarity between sequence representations and item embeddings in the same space via dot product, the regularization can be implicitly applied to the item embedding distribution. Existing contrastive learning methods mainly rely on data level augmentation for user-item interaction sequences through item cropping, masking, or reordering and can hardly provide semantically consistent augmentation samples. In DuoRec, a model-level augmentation is proposed based on Dropout to enable better semantic preserving. Furthermore, a novel sampling strategy is developed, where sequences having the same target item are chosen hard positive samples. Extensive experiments conducted on five datasets demonstrate the superior performance of the proposed DuoRec model compared with baseline methods. Visualization results of the learned representations validate that DuoRec can largely alleviate the representation degeneration problem.

A Simple but Effective Bidirectional Framework for Relational Triple Extraction

Tagging based relational triple extraction methods are attracting growing research attention recently. However, most of these methods take a unidirectional extraction framework that first extracts all subjects and then extracts objects and relations simultaneously based on the subjects extracted. This framework has an obvious deficiency that it is too sensitive to the extraction results of subjects. To overcome this deficiency, we propose a bidirectional extraction framework based method that extracts triples based on the entity pairs extracted from two complementary directions. Concretely, we first extract all possible subject-object pairs from two paralleled directions. These two extraction directions are connected by a shared encoder component, thus the extraction features from one direction can flow to another direction and vice versa. By this way, the extractions of two directions can boost and complement each other. Next, we assign all possible relations for each entity pair by a biaffine model. During training, we observe that the share structure will lead to a convergence rate inconsistency issue which is harmful to performance. So we propose a share-aware learning mechanism to address it. We evaluate the proposed model on multiple benchmark datasets. Extensive experimental results show that the proposed model is very effective and it achieves state-of-the-art results on all of these datasets. Moreover, experiments show that both the proposed bidirectional extraction framework and the share-aware learning mechanism have good adaptability and can be used to improve the performance of other tagging based methods. The source code of our work is available at:

Time Masking for Temporal Language Models

Our world is constantly evolving, and so is the content on the web. Consequently, our languages, often said to mirror the world, are dynamic in nature. However, most current contextual language models are static and cannot adapt to changes over time. In this work, we propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts. Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information. We leverage our approach for the tasks of semantic change detection and sentence time prediction, experimenting on diverse datasets in terms of time, size, genre, and language. Our extensive evaluation shows that both tasks benefit from exploiting time masking.

On Sampling Collaborative Filtering Datasets

We study the practical consequences of dataset sampling strategies on the ranking performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or ad-hoc fashion: e.g. by sampling a dataset randomly or by selecting users or items with many interactions. As we demonstrate, commonly-used data sampling schemes can have significant consequences on algorithm performance. Following this observation, this paper makes three main contributions: (1) characterizing the effect of sampling on algorithm performance, in terms of algorithm and dataset characteristics (e.g. sparsity characteristics, sequential dynamics, etc.); (2) designing SVP-CF, which is a data-specific sampling strategy, that aims to preserve the relative performance of models after sampling, and is especially suited to long-tailed interaction data; and (3) developing an oracle, DATA-GENIE, which can suggest the sampling scheme that is most likely to preserve model performance for a given dataset. The main benefit of DATA-GENIE is that it will allow recommender system practitioners to quickly prototype and compare various approaches, while remaining confident that algorithm performance will be preserved, once the algorithm is retrained and deployed on the complete data. Detailed experiments show that using DATA-GENIE, we can discard upto 5x more data than any sampling strategy with the same level of performance.

Understanding and Mitigating the Effect of Outliers in Fair Ranking

Traditional ranking systems are expected to sort items in the order of their relevance and thereby maximize their utility. In fair ranking, utility is complemented with fairness as an optimization goal. Recent work on fair ranking focuses on developing algorithms to optimize for fairness, given position-based exposure. In contrast, we identify the potential of outliers in a ranking to influence exposure and thereby negatively impact fairness. An outlier in a list of items can alter the examination probabilities, which can lead to different distributions of attention, compared to position-based exposure. We formalize outlierness in a ranking, show that outliers are present in realistic datasets, and present the results of an eye-tracking study, showing that users scanning order and the exposure of items are influenced by the presence of outliers. We then introduce OMIT, a method for fair ranking in the presence of outliers. Given an outlier detection method, OMIT improves fair allocation of exposure by suppressing outliers in the top-k ranking. Using an academic search dataset, we show that outlierness optimization leads to a fairer policy that displays fewer outliers in the top-k, while maintaining a reasonable trade-off between fairness and utility.

Enumerating Fair Packages for Group Recommendations

Package-to-group recommender systems recommend a set of unified items to a group of people. Different from conventional settings, it is not easy to measure the utility of group recommendations because it involves more than one user. In particular, fairness is crucial in group recommendations. Even if some members in a group are substantially satisfied with a recommendation, it is undesirable if other members are ignored to increase the total utility. Many methods for evaluating and applying the fairness of group recommendations have been proposed in the literature. However, all these methods maximize the score and output only one package. This is in contrast to conventional recommender systems, which output several (e.g., top-K) candidates. This can be problematic because a group can be dissatisfied with the recommended package owing to some unobserved reasons, even if the score is high. To address this issue, we propose a method to enumerate fair packages efficiently. Our method furthermore supports filtering queries, such as top-K and intersection, to select favorite packages when the list is long. We confirm that our algorithm scales to large datasets and can balance several aspects of the utility of the packages.

Retrieving Black-box Optimal Images from External Databases

Suppose we have a black-box function (e.g., deep neural network) that takes an image as input and outputs a value that indicates preference. How can we retrieve optimal images with respect to this function from an external database on the Internet? Standard retrieval problems in the literature (e.g., item recommendations) assume that an algorithm has full access to the set of items. In other words, such algorithms are designed for service providers. In this paper, we consider the retrieval problem under different assumptions. Specifically, we consider how users with limited access to an image database can retrieve images using their own black-box functions. This formulation enables a flexible and finer-grained image search defined by each user. We assume the user can access the database through a search query with tight API limits. Therefore, a user needs to efficiently retrieve optimal images in terms of the number of queries. We propose an efficient retrieval algorithm Tiara for this problem. In the experiments, we confirm that our proposed method performs better than several baselines under various settings.

Evaluating Mixed-initiative Conversational Search Systems via User Simulation

Clarifying the underlying user information need by asking clarifying questions is an important feature of modern conversational search system. However, evaluation of such systems through answering prompted clarifying questions requires significant human effort, which can be time-consuming and expensive. In this paper, we propose a conversational User Simulator, called USi, for automatic evaluation of such conversational search systems. Given a description of an information need, USi is capable of automatically answering clarifying questions about the topic throughout the search session. Through a set of experiments, including automated natural language generation metrics and crowdsourcing studies, we show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers. Moreover, we make the first steps towards multi-turn interactions, where conversational search systems asks multiple questions to the (simulated) user with a goal of clarifying the user need. To this end, we expand on currently available datasets for studying clarifying questions, i.e., Qulac and ClariQ, by performing a crowdsourcing-based multi-turn data acquisition. We show that our generative, GPT2-based model, is capable of providing accurate and natural answers to unseen clarifying questions in the single-turn setting and discuss capabilities of our model in the multi-turn setting. We provide the code, data, and the pre-trained model to be used for further research on the topic.

Diversified Query Generation Guided by Knowledge Graph

Relevant articles recommendation plays an important role in online news platforms. Directly displaying recalled articles by a search engine lacks a deep understanding of the article contents. Generating clickable queries, on the other hand, summarizes an article in various aspects, which can be henceforth utilized to better connect relevant articles. Most existing approaches for generating article queries, however, do not consider the diversity of queries or whether they are appealing enough, which are essential for boosting user experience and platform drainage. To this end, we propose a Knowledge-Enhanced Diversified QuerY Generator (KEDY), which leverages an external knowledge graph (KG) as guidance. We diversify the query generation with the information of semantic neighbors of the entities in articles. We further constrain the diversification process with entity popularity knowledge to build appealing queries that users may be more interested in. The information within KG is propagated towards more popular entities with popularity-guided graph attention. We collect a news-query dataset from the search logs of a real-world search engine. Extensive experiments demonstrate our proposed KEDY can generate more diversified and insightful related queries than several strong baselines.

Finding a Concise, Precise, and Exhaustive Set of Near Bi-Cliques in Dynamic Graphs

A variety of tasks on dynamic graphs, including anomaly detection, community detection, compression, and graph understanding, have been formulated as problems of identifying constituent (near) bi-cliques (i.e., complete bipartite graphs). Even when we restrict our attention to maximal ones, there can be exponentially many near bi-cliques, and thus finding all of them is practically impossible for large graphs. Then, two questions naturally arise: (Q1) What is a ''good'' set of near bi-cliques? That is, given a set of near bi-cliques in the input dynamic graph, how should we evaluate its quality? (Q2) Given a large dynamic graph, how can we rapidly identify a high-quality set of near bi-cliques in it? Regarding Q1, we measure how concisely, precisely, and exhaustively a given set of near bi-cliques describes the input dynamic graph. We combine these three perspectives systematically on the Minimum Description Length principle. Regarding Q2, we propose CutNPeel, a fast search algorithm for a high-quality set of near bi-cliques. By adaptively re-partitioning the input graph, CutNPeel reduces the search space and at the same time improves the search quality. Our experiments using six real-world dynamic graphs demonstrate that CutNPeel is (a) High-quality: providing near bi-cliques of up to 51.2% better quality than its state-of-the-art competitors, (b) Fast: up to 68.8X faster than the next-best competitor, and (c) Scalable: scaling to graphs with 134 million edges. We also show successful applications of CutNPeel to graph compression and pattern discovery.

Using Conjunctions for Faster Disjunctive Top-k Queries

While current search engines use highly complex ranking functions with hundreds of features, they often perform an initial candidate generation step that uses a very simple ranking function to identify a limited set of promising candidates. A common approach is to use a disjunctive top-k query for this step. There are many methods for disjunctive top-k computation, but they tend to be slow for the required values of k, which are in the hundreds to thousands. We propose a new approach to safe disjunctive top-k computation that, somewhat counterintuitively, uses precomputed conjunctions of inverted lists to speed up disjunctive queries. The approach is based on a generalization of the well-known MaxScore algorithm, and utilizes recent improvements in threshold estimation techniques as well as new ideas to obtain significant improvements in performance. Our algorithms are implemented as an extension of the PISA framework for search-engine query processing, and available as open-source to support replication and follow-up work.

Attributed Graph Modeling with Vertex Replacement Grammars

Recent work at the intersection of formal language theory and graph theory has explored graph grammars for graph modeling. However, existing models and formalisms can only operate on homogeneous (i.e., untyped or unattributed) graphs. We relax this restriction and introduce the Attributed Vertex Replacement Grammar (AVRG), which can be efficiently extracted from heterogeneous (i.e., typed, colored, or attributed) graphs. Unlike current state-of-the-art methods, which train enormous models over complicated deep neural architectures, the AVRG model is unsupervised and interpretable. It is based on context-free string grammars and works by encoding graph rewriting rules into a graph grammar containing graphlets and instructions on how they fit together. We show that the AVRG can encode succinct models of input graphs yet faithfully preserve their structure and assortativity properties. Experiments on large real-world datasets show that graphs generated from the AVRG model exhibit substructures and attribute configurations that match those found in the input networks.

Sequential Modeling with Multiple Attributes for Watchlist Recommendation in E-Commerce

In e-commerce, the watchlist enables users to track items over time and has emerged as a primary feature, playing an important role in users' shopping journey. Watchlist items typically have multiple attributes whose values may change over time (e.g., price, quantity). Since many users accumulate dozens of items on their watchlist, and since shopping intents change over time, recommending the top watchlist items in a given context can be valuable. In this work, we study the watchlist functionality in e-commerce and introduce a novel watchlist recommendation task. Our goal is to prioritize which watchlist items the user should pay attention to next by predicting the next items the user will click. We cast this task as a specialized sequential recommendation task and discuss its characteristics. Our proposed recommendation model, Trans2D, is built on top of the Transformer architecture, where we further suggest a novel extended attention mechanism (Attention2D) that allows to learn complex item-item, attribute-attribute and item-attribute patterns from sequential-data with multiple item attributes. Using a large-scale watchlist dataset from eBay, we evaluate our proposed model, where we demonstrate its superiority compared to multiple state-of-the-art baselines, many of which are adapted for this task.

Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations

User interest exploration is an important and challenging topic in recommender systems, which alleviates the closed-loop effects between recommendation models and user-item interactions.Contextual bandit (CB) algorithms strive to make a good trade-off between exploration and exploitation so that users' potential interests have chances to expose. However, classical CB algorithms can only be applied to a small, sampled item set (usually hundreds), which forces the typical applications in recommender systems limited to candidate post-ranking, homepage top item ranking, ad creative selection, or online model selection (A/B test). In this paper, we introduce two simple but effective hierarchical CB algorithms to make a classical CB model (such as LinUCB and Thompson Sampling) capable to explore users' interest in the entire item space without limiting to a small item set. We first construct a hierarchy item tree via a bottom-up clustering algorithm to organize items in a coarse-to-fine manner. Then we propose ahierarchical CB (HCB) algorithm to explore users' interest on the hierarchy tree. HCB takes the exploration problem as a series of decision-making processes, where the goal is to find a path from the root to a leaf node, and the feedback will be back-propagated to all the nodes in the path. We further propose aprogressive hierarchical CB (pHCB) algorithm, which progressively extends visible nodes which reach a confidence level for exploration, to avoid misleading actions on upper-level nodes in the sequential decision-making process. Extensive experiments on two public recommendation datasets demonstrate the effectiveness and flexibility of our methods.

Choosing the Best of Both Worlds: Diverse and Novel Recommendations through Multi-Objective Reinforcement Learning

Since the inception of Recommender Systems (RS), the accuracy of the recommendations in terms of relevance has been the golden criterion for evaluating the quality of RS algorithms. However, by focusing on item relevance, one pays a significant price in terms of other important metrics: users get stuck in a "filter bubble" and their array of options is significantly reduced, hence degrading the quality of the user experience and leading to churn. Recommendation, and in particular session-based/sequential recommendation, is a complex task with multiple - and often conflicting objectives - that existing state-of-the-art approaches fail to address. In this work, we take on the aforementioned challenge and introduce Scalarized Multi-Objective Reinforcement Learning (SMORL) for the RS setting, a novel Reinforcement Learning (RL) framework that can effectively address multi-objective recommendation tasks. The proposed SMORL agent augments standard recommendation models with additional RL layers that enforce it to simultaneously satisfy three principal objectives: accuracy, diversity, and novelty of recommendations. We integrate this framework with four state-of-the-art session-based recommendation models and compare it with a single-objective RL agent that only focuses on accuracy. Our experimental results on two real-world datasets reveal a substantial increase in aggregate diversity, a moderate increase in accuracy, reduced repetitiveness of recommendations, and demonstrate the importance of reinforcing diversity and novelty as complementary objectives.

Efficient Reachability Query with Extreme Labeling Filter

Being a fundamental graph operator, reachability query has been widely studied by the data mining community in the past decades. In a directed acyclic graph (DAG), one vertex is reachable by another if there exists a chain of directed edges connecting the two vertexes. The state-of-the-art (SOTA) reachability query methods mostly first index all the vertexes in the underlying DAG and assign them with different labels, and then use these indexes and/or labels to efficiently filter out as many unreachable queries as possible. Thus, because a large portion of unreachable queries can be identified without evoking any tedious path-finding process, the overall time taken by a huge number of queries is much shortened with a tolerable compensation on the additional index and/or label preprocessing time and space. In this paper, we propose the Extreme Labeling Filter (ELF), which is a novel generic filter that can be applied to existing reachability query methods to additionally identify a large number of unreachable queries. Based on the analysis of the given DAG in a systematic and autonomous manner, ELF first determines whether to use predecessors or successors to label the vertexes. Based on such self-determined labels, ELF is then able to identify a large number of unreachable queries with a low time complexity of O(1). To evaluate the performance of ELF, we apply it on 4 reachability query methods (1 conventional and 3 SOTA, all designated for reachability query in DAGs) and conduct experiments on 17 datasets of different sizes. The experimental results show that by applying ELF, all methods significantly shorten the query time.

MonLAD: Money Laundering Agents Detection in Transaction Streams

Given a stream of money transactions between accounts in a bank, how can we accurately detect money laundering agent accounts and suspected behaviors in real-time? Money laundering agents try to hide the origin of illegally obtained money by dispersive multiple small transactions and evade detection by smart strategies. Therefore, it is challenging to accurately catch such fraudsters in an unsupervised manner. Existing approaches do not consider the characteristics of those agent accounts and are not suitable to the streaming settings. Therefore, we propose MonLAD and MonLAD-W to detect money laundering agent accounts in a transaction stream by keeping track of their residuals and other features; we devise AnoScore algorithm to find anomalies based on the robust measure of statistical deviation. Experimental results show that MonLAD outperforms the state-of-the-art baselines on real-world data and finds various suspicious behavior patterns of money laundering. Additionally, several detected suspected accounts have been manually-verified as agents in real money laundering scenario.

Graph Few-shot Class-incremental Learning

The ability to incrementally learn new classes is vital to all real-world artificial intelligence systems. A large portion of high-impact applications like social media, recommendation systems, E-commerce platforms, etc. can be represented by graph models. In this paper, we investigate the challenging yet practical problem,Graph Few-shot Class-incremental (Graph FCL) problem, where the graph model is tasked to classify both newly encountered classes and previously learned classes. Towards that purpose, we put forward a Graph Pseudo Incremental Learning paradigm by sampling tasks recurrently from the base classes, so as to produce an arbitrary number of training episodes for our model to practice the incremental learning skill. Furthermore, we design a Hierarchical-Attention-based Graph Meta-learning framework, HAG-Meta from an optimization perspective. We present a task-sensitive regularizer calculated from task-level attention and node class prototypes to mitigate overfitting onto either novel or base classes. To employ the topological knowledge, we add a node-level attention module to adjust the prototype representation. Our model not only achieves greater stability of old knowledge consolidation, but also acquires advantageous adaptability to new knowledge with very limited data samples. Extensive experiments on three real-world datasets, including Amazon-clothing, Reddit, and DBLP, show that our framework demonstrates remarkable advantages in comparison with the baseline and other related state-of-the-art methods.

Uncovering Causal Effects of Online Short Videos on Consumer Behaviors

In recent years, online short videos have become more popular, especially as an online advertising intermediary. To better understand their effects as advertisements, it is essential to analyze the causal relations of online short videos on consumer behaviors. Our study is based on fine-grained consumer behavior data from a world-leading e-commerce platform, i.e., We first decompose the total causal effects into informative effects and persuasive effects following a common practice in the economic literature. Moreover, we extract the subjectivity scores of short videos through a dictionary-based subjectivity analysis model and evaluate the correlation between the subjectivity scores and each causal effect. The findings of this paper are as follows: First, both causal effects (i.e., informative and persuasive effects) are significant. Second, these effects have a strong correlation with the short videos' subjectivity scores. Third, the signs of these correlations vary with the prices of the products. Our results not only shed light on the research of how short videos exert influence on online consumers, but also give sellers advice on better video design and recommendation.

Friend Story Ranking with Edge-Contextual Local Graph Convolutions

Social platforms have paved the way in creating new, modern ways for users to communicate with each other. In recent years, multiple platforms have introduced ''Stories'' features, which enable broadcasting of ephemeral multimedia content. Specifically, ''Friend Stories,'' or Stories meant to be consumed by one's close friends, are a popular feature, promoting significant user-user interactions by allowing people to see (visually) what their friends and family are up to. A key challenge in surfacing Friend Stories for a given user, is in ranking over each viewing user's friends to efficiently prioritize and route limited user attention. In this work, we explore the novel problem of Friend Story Ranking from a graph representation learning perspective. More generally, our problem is a link ranking task, where inferences are made over existing links (relations), unlike common node or graph-based tasks, or link prediction tasks, where the goal is to make inferences about non-existing links. We propose ELR, an edge-contextual approach which carefully considers local graph structure, differences between local edge types and directionality, and rich edge attributes, building on the backbone of graph convolutions. ELR handles social sparsity challenges by considering and attending over neighboring nodes, and incorporating multiple edge types in local surrounding egonet structures. We validate ELR on two large country-level datasets with millions of users and tens of millions of links from Snapchat. ELR shows superior performance over alternatives by 8% and 5% error reduction measured by MSE and MAE correspondingly. Further generality, data efficiency and ablation experiments confirm the advantages of ELR.

DAME: Domain Adaptation for Matching Entities

Entity matching (EM) identifies data records that refer to the same real-world entity. Despite the effort in the past years to improve the performance in EM, the existing methods still require a huge amount of labeled data in each domain during the training phase. These methods treat each domain individually, and capture the specific signals for each dataset in EM, and this leads to overfitting on just one dataset. The knowledge that is learned from one dataset is not utilized to better understand the EM task in order to make predictions on the unseen datasets with fewer labeled samples. In this paper, we propose a new domain adaptation-based method that transfers the task knowledge from multiple source domains to a target domain. Our method presents a new setting for EM where the objective is to capture the task-specific knowledge from pretraining our model using multiple source domains, then testing our model on a target domain. We study the zero-shot learning case on the target domain, and demonstrate that our method learns the EM task and transfers knowledge to the target domain. We extensively study fine-tuning our model on the target dataset from multiple domains, and demonstrate that our model generalizes better than state-of-the-art methods in EM.

A New Class of Polynomial Activation Functions of Deep Learning for Precipitation Forecasting

Precipitation forecasting, modeled as an important chaotic system in earth system science, is not explicitly solved with theory-driven models. In recent years, deep learning models have achieved great success in various applications including rainfall prediction. However, these models work in an image processing manner regardless of the nature of a physical system. We found that the non-linearity relationships learned by deep learning models, which mostly rely on the activation functions, are commonly weighted piecewise continuous functions with bounded first-order derivatives. In contrast, the polynomial is one of the most widely used classes of functions for theory-driven models, applied to numerical approximation, dynamic system modeling, etc.. Researchers started to use the polynomial activation functions (Pacs in short) for neural networks from the 1990s. In recent years, with bloomed researches that apply deep learning to scientific problems, it is weird that such a powerful class of basis functions is rarely used. In this paper, we investigate it and argue that, even though polynomials are good at information extraction, it is too fragile to train stably. We finally solve its serious data flow explosion problem with Chebyshev polynomials and prepended normalization, which enables networks to go deep with Pacs. To enhance the robustness of training, a normalization called Range Norm is further proposed. Performance on synthetic dataset and summer precipitation prediction task validates the necessity of such a class of activation functions to simulate complex physical mechanisms. The new tool for deep learning enlightens a new way of automatic theoretical physics analysis.

Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce

Ensemble models in E-commerce combine predictions from multiple sub-models for ranking and revenue improvement. Industrial ensemble models are typically deep neural networks, following the supervised learning paradigm to infer conversion rate given inputs from sub-models. However, this process has the following two problems. Firstly, the point-wise scoring approach disregards the relationships between items and leads to homogeneous displayed results, while diversified display benefits user experience and revenue. Secondly, the learning paradigm focuses on the ranking metrics and does not directly optimize the revenue. In our work, we propose a new Learning-To-Ensemble (LTE) framework RA-EGO, which replaces the ensemble model with a contextual Rank Aggregator (RA) and explores the best weights of sub-models by the Evaluator-Generator Optimization (EGO). To achieve the best online performance, we propose a new rank aggregation algorithm TournamentGreedy as a refinement of classic rank aggregators, which also produces the best average weighted Kendall Tau Distance (KTD) amongst all the considered algorithms with quadratic time complexity. Under the assumption that the best output list should be Pareto Optimal on the KTD metric for sub-models, we show that our RA algorithm has higher efficiency and coverage in exploring the optimal weights. Combined with the idea of Bayesian Optimization and gradient descent, we solve the online contextual Black-Box Optimization task that finds the optimal weights for sub-models given a chosen RA model. RA-EGO has been deployed in our online system and has improved the revenue significantly.

Knowledge Enhanced Sports Game Summarization

Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, K-SportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-of-the-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.

MtCut: A Multi-Task Framework for Ranked List Truncation

Ranked list truncation aims to cut the ranked results in short considering user-defined objectives, which balances the overall utility and user efforts over retrieval results. The exact selection of an optimal cut-off position brings potential benefits in various real-world applications, such as patent search and legal search. However, there is significant retrieval bias in the ranked list. The result scores and the disorder of document sequences cause difficulties in judging the relevance between the queries and documents -- alleviating the existing methods' performance improvement. In this work, we investigate the characteristics of retrieval bias on altering truncation and propose a multi-task truncation model, MtCut. It employs two auxiliary tasks to make complementary for the retrieval bias. As a practical evaluation, we explore its performance on two datasets, and the results show that MtCut outperforms the state-of-the-art methods on both F1-score and DCG metrics.

A Sequence-to-Sequence Model for Large-scale Chinese Abbreviation Database Construction

Abbreviations often used in our daily communication play an important role in natural language processing. Most of the existing studies regard the Chinese abbreviation prediction as a sequence labeling problem. However, sequence labeling models usually ignore label dependencies in the process of abbreviation prediction, and the label prediction of each character should be conditioned on its previous labels. In this paper, we propose to formalize the Chinese abbreviation prediction task as a sequence generation problem, and a novel sequence-to-sequence model is designed. To boost the performance of our deep model, we further propose a multi-level pre-trained model that incorporates character, word, and concept-level embeddings. To evaluate our methods, a new dataset for Chinese abbreviation prediction is automatically built, which contains 81,351 pairs of full forms and abbreviations. Finally, we conduct extensive experiments on a public dataset and the built dataset, and the experimental results on both datasets show that our model outperforms the state-of-the-art methods. More importantly, we build a large-scale database for a specific domain, i.e., life services in Meituan Inc., with high accuracy of about 82.7%, which contains 4,134,142 pairs of full forms and abbreviations. The online A/B testing on Meituan APP and Dianping APP suggests that Click-Through Rate increases by 0.59% and 0.86% respectively when the built database is used in the searching system. We have released our API on with over 87k API calls in 9 months.

Personalized Long-distance Fuel-efficient Route Recommendation Through Historical Trajectories Mining

\beginabstract Finding fuel-efficient routes for drivers has increasingly important value in terms of saving energy, protecting the environment and saving expenses. Previous studies basically adopt simple fuel consumption calculation or prediction methods to recommend the fuel-efficient routes within a city, which have two major limitations. First, the effect of drivers' driving behavior preferences (e.g. acceleration, frequency of clutch use, etc.) on fuel consumption is not fully studied and utilized. Second, existing methods mainly focus on short-distance route recommendation. Due to the difference in the road network structure and route composition, it is not effective to directly apply the route recommendation methods designed for short-distance travel within a city on the scenario of long-distance travel among cities. In this paper, we propose a novel model PLd-FeRR for the P ersonalized L ong-d istance F uel-e fficient R oute R ecommendation. Specifically, we first identify the features reflecting the user's driving behavior preference based on the user's historical driving trajectory, and then extract the potential factors that can affect long-distance fuel consumption. As transformer can effectively capture the temporal features for long sequence data, the extracted personalized driving preference features and long-distance fuel consumption features are input into a transformer-based fuel consumption prediction model. Next, the prediction model is combined with a genetic algorithm to further improve the performance of recommending fuel-efficient routes. Extensive evaluations are conducted on the large real-world dataset, and the results show the effectiveness of our proposal. \endabstract

Hierarchical Imitation Learning via Subgoal Representation Learning for Dynamic Treatment Recommendation

Dynamic Treatment Recommendation (DTR) is a sequence of tailored treatment decision rules which can be grouped as individual sub-tasks. As the reward signals in DTR are hard to design, Imitation Learning (IL) has achieved great success as it is effective in mimicking doctors' behaviors from their demonstrations without explicit reward signals. As a patient may have several different symptoms, the behaviors in doctors' demonstrations can often be grouped to handle individual symptoms. However, a single flat policy learned by IL is difficult to mimic doctors' demonstrations with such hierarchical structure, where low-level behaviors are switching from one symptom to another controlled by high-level decisions. Due to this observation, we consider Hierarchical Imitation Learning methods as good solutions for DTR. In this paper, we propose a novel Subgoal conditioned HIL framework (short for SHIL), where a high-level policy sequentially sets a subgoal for each sub-task without prior knowledge, and the low-level policy for sub-tasks is learned to reach the subgoal. To get rid of prior knowledge, a self-supervised learning method is proposed to learn an effective representation for each subgoal. More specifically, we carefully designed to encourage diverse representations among different subgoals. To demonstrate that SHIL is able to learn meaningful high-level policy and low-level policy that accurately reproduces complex doctors' demonstrations, we conduct experiments on a real-world medical data from health care domain, MIMIC-III. Compared with state-of-the-art baselines, SHIL improves the likelihood of patient survival by a significant margin and provides explainable recommendation with hierarchical structure.

Structure Meets Sequences: Predicting Network of Co-evolving Sequences

Co-evolving sequences are ubiquitous in a variety of applications, where different sequences are often inherently inter-connected with each other. We refer to such sequences, together with their inherent connections modeled as a structured network, as network of co-evolving sequences (NoCES). Typical NoCES applications include road traffic monitoring, company revenue prediction, motion capture, etc. To date, it remains a daunting challenge to accurately model NoCES due to the coupling between network structure and sequences. In this paper, we propose to modeling \pname\ with the aim of simultaneously capturing both the dynamics and the interplay between network structure and sequences. Specifically, we propose a joint learning framework to alternatively update the network representations and sequence representations as the sequences evolve over time. A unique feature of our framework lies in that it can deal with the case when there are co-evolving sequences on both network nodes and edges. Experimental evaluations on four real datasets demonstrate that the proposed approach (1) outperforms the existing competitors in terms of prediction accuracy, and (2) scales linearly w.r.t. the sequence length and the network size.

Scalable Graph Topology Learning via Spectral Densification

Graph learning plays an important role in many data mining and machine learning tasks, such as manifold learning, data representation and analysis, dimensionality reduction, data clustering, and visualization, etc. In this work, we introduce a highly-scalable spectral graph densification approach (GRASPEL) for graph topology learning from data. By limiting the precision matrix to be a graph-Laplacian-like matrix, our approach aims to learn sparse undirected graphs from potentially high-dimensional input data. A very unique property of the graphs learned by GRASPEL is that the spectral embedding (or approximate effective-resistance) distances on the graph will encode the similarities between the original input data points. By leveraging high-performance spectral methods, sparse yet spectrally-robust graphs can be learned by identifying and including the most spectrally-critical edges into the graph. Compared with prior state-of-the-art graph learning approaches, GRASPEL is more scalable and allows substantially improving computing efficiency and solution quality of a variety of data mining and machine learning applications, such as manifold learning, spectral clustering (SC), and dimensionality reduction (DR).

Profiling the Design Space for Graph Neural Networks based Collaborative Filtering

In recent years, Graph Neural Networks (GNNs) have been widely used in Collaborative Filtering (CF), one of the most popular methods in recommender systems. However, most existing works focus on designing an individual model architecture given a specific scenario, without studying the influences of different design dimensions. Thus, it remains a challenging problem to quickly obtain a top-performing model in a new recommendation scenario. To address the problem, in this work, we make the first attempt to profile the design space of GNN-based CF methods to enrich the understanding of different design dimensions as well as provide a novel paradigm of model design. Specifically, a unified framework of GNN-based CF is proposed, on top of which a design space is developed and evaluated by extensive experiments. Interesting findings on the impacts of different design dimensions on recommendation performance are obtained. Guided by the empirical findings, we further prune the design space to obtain a compact one containing a higher concentration of top-performing models. Empirical studies demonstrate its high quality and strong generalization ability.

Contrastive Meta Learning with Behavior Multiplicity for Recommendation

A well-informed recommendation framework could not only help users identify their interested items, but also benefit the revenue of various online platforms (e.g., e-commerce, social media). Traditional recommendation models usually assume that only a single type of interaction exists between user and item, and fail to model the multiplex user-item relationships from multi-typed user behavior data, such as page view, add-to-favourite and purchase. While some recent studies propose to capture the dependencies across different types of behaviors, two important challenges have been less explored: i) Dealing with the sparse supervision signal under target behaviors (e.g., purchase). ii) Capturing the personalized multi-behavior patterns with customized dependency modeling. To tackle the above challenges, we devise a new model CML, Contrastive Meta Learning (CML), to maintain dedicated cross-type behavior dependency for different users. In particular, we propose a multi-behavior contrastive learning framework to distill transferable knowledge across different types of behaviors via the constructed contrastive loss. In addition, to capture the diverse multi-behavior patterns, we design a contrastive meta network to encode the customized behavior heterogeneity for different users. Extensive experiments on three real-world datasets indicate that our method consistently outperforms various state-of-the-art recommendation methods. Our empirical studies further suggest that the contrastive meta learning paradigm offers great potential for capturing the behavior multiplicity in recommendation. We release our model implementation at:

A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in Online Advertising

In online advertising, auto-bidding has become an essential tool for advertisers to optimize their preferred ad performance metrics by simply expressing high-level campaign objectives and constraints. Previous works designed auto-bidding tools from the view of single-agent, without modeling the mutual influence between agents. In this paper, we instead consider this problem from a distributed multi-agent perspective, and propose a general \underlineM ulti-\underlineA gent reinforcement learning framework for \underlineA uto-\underlineB idding, namely MAAB, to learn the auto-bidding strategies. First, we investigate the competition and cooperation relation among auto-bidding agents, and propose a temperature-regularized credit assignment to establish a mixed cooperative-competitive paradigm. By carefully making a competition and cooperation trade-off among agents, we can reach an equilibrium state that guarantees not only individual advertiser's utility but also the system performance (i.e., social welfare). Second, to avoid the potential collusion behaviors of bidding low prices underlying the cooperation, we further propose bar agents to set a personalized bidding bar for each agent, and then alleviate the revenue degradation due to the cooperation. Third, to deploy MAAB in the large-scale advertising system with millions of advertisers, we propose a mean-field approach. By grouping advertisers with the same objective as a mean auto-bidding agent, the interactions among the large-scale advertisers are greatly simplified, making it practical to train MAAB efficiently. Extensive experiments on the offline industrial dataset and Alibaba advertising platform demonstrate that our approach outperforms several baseline methods in terms of social welfare and revenue.

Crowdsourcing-based Multi-Device Communication Cooperation for Mobile High-Quality Video Enhancement

The widespread use of mobile devices propels the development of new-fashioned video applications like 3D (3-Dimensional) stereo video and mobile cloud game via web or App, exerting more pressure on current mobile access network. To address this challenge, we adopt the crowdsourcing paradigm to offer some incentive for guiding the movement of recruited crowdsourcing users and facilitate the optimization of the movement control decision. In this paper, based on a practical 4G (4th-Generation) network throughput measurement study, we formulate the movement control decision as a cost-constrained user recruitment optimization problem. Considering the intractable complexity of this problem, we focus first on a single crowdsourcing user case and propose a pseudo-polynomial time complexity optimal solution. Then, we apply this solution to solve the more general problem of multiple users and propose a graph-partition-based algorithm. Extensive experiments show that our solutions can improve the efficiency of real-time D2D communication for mobile videos.

Improving the Applicability of Knowledge-Enhanced Dialogue Generation Systems by Using Heterogeneous Knowledge from Multiple Sources

Traditional conversational systems can only access the given query during the response generation, leading to meaningless responses. To this end, researchers proposed to enhance dialogue generation by integrating external knowledge. Although such methods have achieved remarkable gains, the use of only single-source knowledge often makes existing knowledge-enhanced methods degenerate into traditional models in real scenarios because of the insufficient knowledge coverage of single-source knowledge. To improve the applicability of knowledge-enhanced methods, we propose two novel frameworks to use heterogeneous knowledge from multiple sources. We first propose an MHKD-Seq2Seq framework, which can use different heterogeneous knowledge by identifying abstract-level knowledge behaviors; meanwhile, a Diffuse-Aggregate scheme is used to process multiple knowledge simultaneously and produce a unified result. The next framework MHKD-ARPLM can leverage the advantages of pretrained language models with Knowledge Linearization techniques. In experiments, we collected dialogues from previously open-released datasets and built a multi-source knowledge-aligned dataset TriKE-Weibo, which involves three knowledge sources: commonsense, texts, and infobox tables. Extensive evaluations demonstrate the performance leadership of our approaches against competitive baseline models.

Towards Unbiased and Robust Causal Ranking for Recommender Systems

We study the problem of optimizing ranking metrics with unbiased and robust causal estimation for recommender systems. A user may click/purchase an item regardless of whether the item is recommended or not. Thus, it is important to estimate the causal effect of recommendation and rank items higher with a larger causal effect. However, most existing works focused on improving the accuracy of recommendations, which usually have large bias and variance. Therefore, in this paper, we provide a general and theoretically rigorous framework for causal recommender systems, which enables unbiased evaluation and learning for the ranking metrics with confounding bias. We first propose a robust estimator for unbiased ranking evaluation and theoretically show that this estimator has a smaller bias and variance. We then propose a deep variational information bottleneck (IB) approach to exploit the sufficiency of the propensity score for estimation adjustment and better generalization. We also provide the learning bound and develop an unbiased learning algorithm to optimize the causal metric. Results on semi-synthetic and real-world datasets show that our evaluation and learning algorithms significantly outperform existing methods.

Long Short-Term Temporal Meta-learning in Online Recommendation

An effective online recommendation system should jointly capture users' long-term and short-term preferences in both users' internal behaviors (from the target recommendation task) and external behaviors (from other tasks). However, it is extremely challenging to conduct fast adaptations to real-time new trends while making full use of all historical behaviors in large-scale systems, due to the real-world limitations in real-time training efficiency and external behavior acquisition. To address these practical challenges, we propose a novel Long Short-Term Temporal Meta-learning framework (LSTTM) for online recommendation. It arranges user multi-source behaviors in a global long-term graph and an internal short-term graph, and conducts different GAT-based aggregators and training strategies to learn user short-term and long-term preferences separately. To timely capture users' real-time interests, we propose a temporal meta-learning method based on MAML under an asynchronous optimization strategy for fast adaptation, which regards recommendations at different time periods as different tasks. In experiments, LSTTM achieves significant improvements on both offline and online evaluations. It has been deployed on a widely-used online recommendation system named WeChat Top Stories, affecting millions of users.

A Peep into the Future: Adversarial Future Encoding in Recommendation

Personalized recommendation often relies on user historical behaviors to provide items for users. It is intuitive that future information also contains essential messages as supplements to user historical behaviors. However, we cannot directly encode future information into models, since we are unable to get future information in online serving. In this work, we propose a novel adversarial future encoding (AFE) framework to make full use of informative future features in different types of recommendation models. Specifically, AFE contains a future-aware discriminator and a generator. The future-aware discriminator takes both common features and future features as inputs, working as a recommendation prophet to judge user-item pairs. In contrast, the generator is considered as a challenger, which generates items with only common features, aiming to confuse the future-aware prophet. The future-aware discriminator can inspire the generator (to be deployed online) to produce better results. We further conduct a multi-factor optimization to enable a fast and stable model convergence via the direct learning and knowledge distillation losses. Moreover, we have adopted AFE on both a list-wise RL-based ranking model and a point-wise ranking model to verify its universality. In experiments, we conduct sufficient evaluations on two large-scale datasets, achieving significant improvements on both offline and online evaluations. Currently, we have deployed AFE on a real-world system, affecting millions of users. The source code is in

Supervised Advantage Actor-Critic for Recommender Systems

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence.

To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the ''advantage'' of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.

Informed Multi-context Entity Alignment

Entity alignment is a crucial step in integrating knowledge graphs (KGs) from multiple sources. Previous attempts at entity alignment have explored different KG structures, such as neighborhood-based and path-based contexts, to learn entity embeddings, but they are limited in capturing the multi-context features. Moreover, most approaches directly utilize the embedding similarity to determine entity alignment without considering the global interaction among entities and relations. In this work, we propose an Informed Multi-context Entity Alignment (IMEA) model to address these issues. In particular, we introduce Transformer to flexibly capture the relation, path, and neighborhood contexts, and design holistic reasoning to estimate alignment probabilities based on both embedding similarity and the relation/entity functionality. The alignment evidence obtained from holistic reasoning is further injected back into the Transformer via the proposed soft label editing to inform embedding learning. Experimental results on several benchmark datasets demonstrate the superiority of our IMEA model compared with existing state-of-the-art entity alignment methods.

Identifying Cost-effective Debunkers for Multi-stage Fake News Mitigation Campaigns

Online social networks have become a fertile ground for spreading fake news. Methods to automatically mitigate fake news propagation have been proposed. Some studies focus on selecting top k influential users on social networks as debunkers, but the social influence of debunkers may not translate to wide mitigation information propagation as expected. Other studies assume a given set of debunkers and focus on optimizing intensity for debunkers to publish true news, but as debunkers are fixed, even if with high social influence and/or high intensity to post true news, the true news may not reach users exposed to fake news and therefore mitigation effect may be limited. In this paper, we propose the multi-stage fake news mitigation campaign where debunkers are dynamically selected within budget at each stage. We formulate it as a reinforcement learning problem and propose a greedy algorithm optimized by predicting future states so that the debunkers can be selected in a way that maximizes the overall mitigation effect. We conducted extensive experiments on synthetic and real-world social networks and show that our solution outperforms state-of-the-art baselines in terms of mitigation effect.

MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition

In this paper, we study multimodal named entity recognition in social media posts. Existing works mainly focus on using a cross-modal attention mechanism to combine text representation with image representation. However, they still suffer from two weaknesses: (1) the current methods are based on a strong assumption that each text and its accompanying image are matched, and the image can be used to help identify named entities in the text. However, this assumption is not always true in real scenarios, and the strong assumption may reduce the recognition effect of theMNER model; (2) the current methods fail to construct a consistent representation to bridge the semantic gap between two modalities, which prevents the model from establishing a good connection between the text and image. To address these issues, we propose a general matching and alignment framework (MAF) for multimodal named entity recognition in social media posts. Specifically, to solve the first issue, we propose a novel cross-modal matching (CM) module to calculate the similarity score between text and image, and use the score to determine the proportion of visual information that should be retained. To solve the second issue, we propose a novel cross-modal alignment (CA) module to make the representations of the two modalities more consistent. We conduct extensive experiments, ablation studies, and case studies to demonstrate the effectiveness and efficiency of our method.The source code of this paper can be found in

Translating Human Mobility Forecasting through Natural Language Generation

Existing human mobility forecasting models follow the standard design of the time-series prediction model which takes a series of numerical values as input to generate a numerical value as a prediction. Although treating this as a regression problem seems straightforward, incorporating various contextual information such as the semantic category information of each Place-of-Interest (POI) is a necessary step, and often the bottleneck, in designing an effective mobility prediction model. As opposed to the typical approach, we treat forecasting as a translation problem and propose a novel forecasting through a language generation pipeline. The paper aims to address the human mobility forecasting problem as a language translation task in a sequence-to-sequence manner. A mobility-to-language template is first introduced to describe the numerical mobility data as natural language sentences. The core intuition of the human mobility forecasting translation task is to convert the input mobility description sentences into a future mobility description from which the prediction target can be obtained. Under this pipeline, a two-branch network, SHIFT (Translating Human Mobility Forecasting), is designed. Specifically, it consists of one main branch for language generation and one auxiliary branch to directly learn mobility patterns. During the training, we develop a momentum mode for better connecting and training the two branches. Extensive experiments on three real-world datasets demonstrate that the proposed SHIFT is effective and presents a new revolutionary approach to forecasting human mobility.

Lightweight Composite Re-Ranking for Efficient Keyword Search with BERT

Recently transformer-based ranking models have been shown to deliver high relevance for document search and the relevance-efficiency tradeoff becomes important for fast query response times. This paper presents BECR (BERT-based Composite Re-Ranking), a lightweight composite re-ranking scheme that combines deep contextual token interactions and traditional lexical term-matching features. BECR conducts query decomposition and composes a query representation using pre-computable token embeddings based on uni-grams and skip-n-grams, to seek a tradeoff of inference efficiency and relevance. Thus it does not perform expensive transformer computations during online inference, and does not require the use of GPU. This paper describes an evaluation of relevance and efficiency of BECR with several TREC datasets.

Few-shot Link Prediction in Dynamic Networks

Dynamic link prediction, which aims at forecasting future edges of a node in a dynamic network, is an important problem in network science and has a wide range of real-world applications. A key property of dynamic networks is that new nodes and links keep coming over time and these new nodes usually have only a few links at their arrivals. However, how to predict future links for these few-shot nodes in a dynamic network has not been well studied. Existing dynamic network representation learning methods were not specialized for few-shot scenarios and thus would lead to suboptimal performances. In this paper, we propose a novel model based on a meta-learning framework, dubbed as MetaDyGNN, for few-shot link prediction in dynamic networks. Specifically, we propose a meta-learner with hierarchical time interval-wise and node-wise adaptions to extract general knowledge behind this problem. We also design a simple and effective dynamic graph neural network (GNN) module to characterize the local structure of each node in meta-learning tasks. As a result, the learned general knowledge serves as model initializations, and can quickly adapt to new nodes with a fine-tuning process on only a few links. Experimental results show that our proposed MetaDyGNN significantly outperforms state-of-the-art methods on three publicly available datasets.

MAVE: A Product Dataset for Multi-source Attribute Value Extraction

Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover. Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is an effective dataset for attribute value extraction task. It is also a very challenging task on zero-shot attribute extraction. Data is available at \url .

Interpretable Relation Learning on Heterogeneous Graphs

Relation learning, widely used in recommendation systems or relevant entity search over knowledge graphs, has attracted increasing attentions in recent years. Existing methods like network embedding and graph neural networks (GNNs), learn the node representations from neighbors and calculate the similarity score for relation prediction. Despite effective prediction performance, they lack explanations to the predicted results. We propose a novel interpretable relation learning model named IRL, which can not only predict whether relations exist between node pairs, but also make the inference more transparent and convincing. Specifically, we introduce a meta-path based path encoder to model sequential dependency between nodes through recurrent neural network. We also apply the self-supervised GNN on the extracted sub-graph to capture the graph structure by aggregating information from neighbors, which are fed into the meta-path encoder. In addition, we propose a meta-path walk pruning strategy for positive path generation and an adaptive negative sampling method for negative path generation to improve the quality of paths, which both consider the semantics of nodes in the heterogeneous graph. We conduct extensive experiments on two public heterogeneous graph data, AMiner and Delve, for different relation prediction tasks, which demonstrate significant improvements of our model over the existing embedding-based and sequential modeling-based methods.

Fast Semantic Matching via Flexible Contextualized Interaction

Deep pre-trained language models (e.g., BERT) lead to remarkable headway in many Natural Language Processing tasks. Their superior capacity in perceiving textual data is also witnessed in semantic matching tasks (e.g., question answering, web search). Particularly for matching a pair of query and text candidate, the current state-of-the-arts usually rely on the semantic representations produced by BERT, and compute relevance scores with various interaction (i.e., matching) methods. However, they may 1) miss fine-grained phrase-level interaction between the input query and candidate context or 2) lack a thoughtful consideration of both effectiveness and efficiency. Motivated by this, we propose \hyttInteractor, a BERT-based semantic matching model with a flexible contextualized interaction paradigm. It is capable of capturing fine-grained phrase-level information in the interaction, and thus is more effective to be applied for semantic matching tasks. Moreover, we further facilitate \hyttInteractor with a novel partial attention scheme, which significantly reduces the computational cost while maintaining the high effectiveness. We conduct comprehensive experimental evaluations on three datasets. The results show that \hyttInteractor achieves superior effectiveness and efficiency for semantic matching.

Assessing Algorithmic Biases for Musical Version Identification

Version identification (VI) systems now offer accurate and scalable solutions for detecting different renditions of a musical composition, allowing the use of these systems in industrial applications and throughout the wider music ecosystem. Such use can have an important impact on various stakeholders regarding recognition and financial benefits, including how royalties are circulated for digital rights management. In this work, we take a step toward acknowledging this impact and consider VI systems as socio-technical systems rather than isolated technologies. We propose a framework for quantifying performance disparities across 5 systems and 6 relevant side attributes: gender, popularity, country, language, year, and prevalence. We also consider 3 main stakeholders for this particular information retrieval use case: the performing artists of query tracks, those of reference (original) tracks, and the composers. By categorizing the recordings in our dataset using such attributes and stakeholders, we analyze whether the considered VI systems show any implicit biases. We find signs of disparities in identification performance for most of the groups we include in our analyses. We also find that learning- and rule-based systems behave differently for some attributes, which suggests an additional dimension to consider along with accuracy and scalability when evaluating VI systems. Lastly, we share our dataset to encourage VI researchers to take these aspects into account while building new systems.

Directed Network Embedding with Virtual Negative Edges

The directed network embedding problem is to represent the nodes in a given directed network as embeddings (i.e., low-dimensional vectors) that preserve the asymmetric relationships between nodes. While a number of approaches have been developed for this problem, we point out that existing approaches commonly face difficulties in accurately preserving asymmetric proximities between nodes in a sparse network containing a large number of low out- and in-degree nodes. In this paper, we focus on addressing this intrinsic difficulty caused by the lack of information. We first introduce the concept of virtual negative edges (VNEs), which represent latent negative relationships between nodes. Based on the concept, we propose a novel DIrected NE approach with VIrtual Negative Edges, named as DIVINE. DIVINE carefully decides the number and locations of VNEs to be added to the input network. Once VNEs are added, DIVINE learns embeddings by exploiting both the signs and directions of edges. Our experiments on four real-world directed networks demonstrate that adding VNEs alleviates the lack of information about low-degree nodes, thereby enabling DIVINE to yield high-quality embeddings that accurately capture asymmetric proximities between nodes. Specifically, the embeddings obtained by DIVINE lead to up to 10.16% more accurate link prediction, compared to those obtained by state-of-the-art competitors.

Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations

Self-supervision is recently surging at its new frontier of graph learning. It facilitates graph representations beneficial to downstream tasks; but its success could hinge on domain knowledge for handcraft or the often expensive trials and errors. Even its state-of-the-art representative, graph contrastive learning (GraphCL), is not completely free of those needs as GraphCL uses a prefabricated prior reflected by the ad-hoc manual selection of graph data augmentations. Our work aims at advancing GraphCL by answering the following questions: How to represent the space of graph augmented views? What principle can be relied upon to learn a prior in that space? And what framework can be constructed to learn the prior in tandem with contrastive learning? Accordingly, we have extended the prefabricated discrete prior in the augmentation set, to a learnable continuous prior in the parameter space of graph generators, assuming that graph priors per se, similar to the concept of image manifolds, can be learned by data generation. Furthermore, to form contrastive views without collapsing to trivial solutions due to the prior learnability, we have leveraged both principles of information minimization (InfoMin) and information bottleneck (InfoBN) to regularize the learned priors. Eventually, contrastive learning, InfoMin, and InfoBN are incorporated organically into one framework of bi-level optimization. Our principled and automated approach has proven to be competitive against the state-of-the-art graph self-supervision methods, including GraphCL, on benchmarks of small graphs; and shown even better generalizability on large-scale graphs, without resorting to human expertise or downstream validation. Our code is publicly released at

Sentiment Analysis of Fashion Related Posts in Social Media

The role of social media in fashion industry has been blooming as the years have continued on. In this work, we investigate sentiment analysis for fashion related posts in social media platforms. There are two main challenges of this task. On the first place, information of different modalities must be jointly considered to make the final predictions. On the second place, some unique fashion related attributes should be taken into account. While most existing works focus on traditional multimodal sentiment analysis, they always fail to exploit the fashion related attributes in this task. We propose a novel framework that jointly leverages the image vision, post text, as well as fashion attribute modality to determine the sentiment category. One characteristic of our model is that it extracts fashion attributes and integrates them with the image vision information for effective representation. Furthermore, it exploits the mutual relationship between the fashion attributes and the post texts via a mutual attention mechanism. Since there is no existing dataset for this task, we prepare a large-scale sentiment analysis dataset of over 12k fashion related social media posts. Extensive experiments are conducted to demonstrate the effectiveness of our model.

Community Trend Prediction on Heterogeneous Graph in E-commerce

In online shopping, ever-changing fashion trends make merchants need to prepare more differentiated products to meet the diversified demands, and e-commerce platforms need to capture the market trend with a prophetic vision. For the trend prediction, the attribute tags, as the essential description of items, can genuinely reflect the decision basis of consumers. However, few existing works explore the attribute trend in the specific community for e-commerce. In this paper, we focus on the community trend prediction on the item attribute and propose a unified framework that combines the dynamic evolution of two graph patterns to predict the attribute trend in a specific community. Specifically, we first design a community-attribute bipartite graph at each time step to learn the collaboration of different communities. Next, we transform the bipartite graph into a hypergraph to exploit the associations of different attribute tags in one community. Lastly, we introduce a dynamic evolution component based on the recurrent neural networks to capture the fashion trend of attribute tags. Extensive experiments on three real-world datasets in a large e-commerce platform show the superiority of the proposed approach over several strong alternatives and demonstrate the ability to discover the community trend in advance.

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Representations via CONstrained Clustering. RepCONC jointly trains dual-encoders and the Product Quantization (PQ) method to learn discrete document representations and enables fast approximate NNS with compact indexes. It models quantization as a constrained clustering process, which requires the document embeddings to be uniformly clustered around the quantization centroids and supports end-to-end optimization of the quantization method and dual-encoders. We theoretically demonstrate the importance of the uniform clustering constraint in RepCONC and derive an efficient approximate solution for constrained clustering by reducing it to an instance of the optimal transport problem. Besides constrained clustering, RepCONC further adopts a vector-based inverted file system (IVF) to support highly efficient vector search on CPUs. Extensive experiments on two popular ad-hoc retrieval benchmarks show that RepCONC achieves better ranking effectiveness than competitive vector quantization baselines under different compression ratio settings. It also substantially outperforms a wide range of existing retrieval models in terms of retrieval effectiveness, memory efficiency, and time efficiency.

Geometric Inductive Matrix Completion: A Hyperbolic Approach with Unified Message Passing

Collaborative filtering is a central task in a broad range of recommender systems. As traditional methods train latent variables for user/item individuals under a transductive setting, it requires re-training for out-of-sample inferences. Inductive matrix completion (IMC) solves this problem by learning transformation functions upon engineered features, but it sacrifices model expressiveness and highly depends on feature qualities. In this paper, we propose Geometric Inductive Matrix Completion (GIMC) by introducing hyperbolic geometry and a unified message passing scheme into this generic task. The proposed method is the earliest attempt utilizing capacious hyperbolic space to enhance the capacity of IMC. It is the first work defining continuous explicit feedback prediction within non-Euclidean space by introducing hyperbolic regression for vertex interactions. This is also the first to provide comprehensive evidence that edge semantics can significantly improve recommendations, which is ignored by previous works. The proposed method outperforms the state-of-the-art algorithms with less than 1% parameters compared to its transductive counterparts. Extensive analysis and ablation studies are conducted to reveal the design considerations and practicability for a positive impact to the research community.

ESC-GAN: Extending Spatial Coverage of Physical Sensors

Scientific discoveries and studies about our physical world have long benefited from large-scale and planetary sensing, from weather forecasting to wildfire monitoring. However, the limited deployment of sensors in the environment due to cost or physical access constraints has lagged behind our ever-growing need for increased data coverage and higher resolution, impeding timely and precise monitoring and understanding of the environment. Therefore, we seek to extend the spatial coverage of analysis based on existing sensory data, that is, to "generate" data for locations where no historical data exists. This problem is fundamentally different and more challenging than the traditional spatio-temporal imputation that assumes data for any particular location are only partially missing across time. Inspired by the success of Generative Adversarial Network (GAN) in imputation, we propose a novel ESC-GAN. We observe that there are local patterns in nearby locations, as well as trends in a global manner (e.g., temperature drops as altitude increases regardless of the location). As local patterns may exhibit at different scales (from meters to kilometers), we employ a multi-branch generator to aggregate information of different granularity. More specifically, each branch in the generator contains 1) randomly masked 3D partial convolutions at different resolutions to capture the local patterns and 2) global attention modules for global similarity. Next, we adversarially train a 3D convolution-based discriminator to distinguish the generator's output from the ground truth. Extensive experiments on three geo-sensor datasets demonstrate that ESC-GAN outperforms state-of-the-art methods on extending spatial coverage and also achieves the best results on a traditional spatio-temporal imputation task.

MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information

We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only and without any annotated training document provided. Most existing classifiers leverage textual information in each document. However, in many domains, documents are accompanied by various types of metadata (e.g., authors, venue, and year of a research paper). These metadata and their combinations may serve as strong category indicators in addition to textual contents. In this paper, we explore the potential of using metadata to help weakly supervised text classification. To be specific, we model the relationships between documents and metadata via a heterogeneous information network. To effectively capture higher-order structures in the network, we use motifs to describe metadata combinations. We propose a novel framework, named MotifClass , which (1) selects category-indicative motif instances, (2) retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances, and (3) trains a text classifier using the pseudo training data. Extensive experiments on real-world datasets demonstrate the superior performance of MotifClass to existing weakly supervised text classification approaches. Further analysis shows the benefit of considering higher-order metadata information in our framework.

Leaving No One Behind: A Multi-Scenario Multi-Task Meta Learning Approach for Advertiser Modeling

Advertisers play an essential role in many e-commerce platforms like Taobao and Amazon. Fulfilling their marketing needs and supporting their business growth is critical to the long-term prosperity of platform economies. However, compared with extensive studies on user modeling such as click-through rate predictions, much less attention has been drawn to advertisers, especially in terms of understanding their diverse demands and performance. Different from user modeling, advertiser modeling generally involves many kinds of tasks (e.g. predictions of advertisers' expenditure, active-rate, or total impressions of promoted products). In addition, major e-commerce platforms often provide multiple marketing scenarios (e.g. Sponsored Search, Display Ads, Live Streaming Ads) while advertisers' behavior tend to be dispersed among many of them. This raises the necessity of multi-task and multi-scenario consideration in comprehensive advertiser modeling, which faces the following challenges: First, one model per scenario or per task simply doesn't scale; Second, it is particularly hard to model new or minor scenarios with limited data samples; Third, inter-scenario correlations are complicated, and may vary given different tasks.

To tackle these challenges, we propose a multi-scenario multi-task meta learning approach (M2M) which simultaneously predicts multiple tasks in multiple advertising scenarios. Specifically, we introduce a novel meta unit that incorporates rich scenario knowledge to learn explicit inter-scenario correlations and can easily scale to new scenarios. Furthermore, we present a meta attention module to capture diverse inter-scenario correlations given different tasks, and a meta tower module to enhance the capability of capturing the representation of scenario-specific features. Compelling results from both offline evaluation and online A/B tests demonstrate the superiority of M2M over state-of-the-art methods.

Learning Concept Prerequisite Relations from Educational Data via Multi-Head Attention Variational Graph Auto-Encoders

Recently, the topic of learning concept prerequisite relations has gained the attention of many researchers, which is crucial in the learning process for a learner to decide an optimal study order. However, the existing work still ignores three key factors. (1) People's cognitive differences could make a difference for annotating the prerequisite relation between resources (e.g., courses, textbooks) or concepts (e.g., binary tree). (2) The current vertex (resources or concepts) can be affected by the feature of the neighbor vertex in the resource or concept graph. (3) The feature information of the resource graph may affect the concept graph. To integrate the above factors, we propose an end-to-end graph network-based model called Multi-Head Attention Variational Graph Auto-Encoders (MHAVGAE ) to learn the prerequisite relation between concepts via a resource-concept graph. To address the first two problems, we introduce the multi-head attention mechanism to operate and compute the hidden representations of each vertex over the resource-concept graph. Then, we design a gated fusion mechanism to integrate the feature information of the resource and concept graphs to enrich concept content features. Finally, we conduct numerous experiments to demonstrate the effectiveness of the MHAVGAE across multiple widely used metrics compared with the state-of-the-art methods. The experimental results show that the performance of the MHAVGAE almost outperforms all the baseline methods.

A GNN-based Multi-task Learning Framework for Personalized Video Search

Watching online videos has become more and more popular and users tend to watch videos based on their personal tastes and preferences. Providing a customized ranking list to maximize the user's satisfaction has become increasingly important for online video platforms. Existing personalized search methods (PSMs) train their models with user feedback information (e.g. clicks). However, we identified that such feedback signals may indicate attractiveness but not necessarily indicate relevance in video search. Besides, the click data and user historical information are usually too sparse to train a good PSM, which is different from the conventional Web search containing users' rich historical information. To address these concerns, in this paper we propose a multi-task graph neural network architecture for personalized video search (MGNN-PVS) that can jointly model user's click behaviour and the relevance between queries and videos. To relieve the sparsity problem and learn better representation for users, queries and videos, we develop an efficient and novel GNN architecture based on neighborhood sampling and hierarchical aggregation strategy by leveraging their different hops of neighbors in the user-query and query-document click graph. Extensive experiments on a major commercial video search engine show that our model significantly outperforms state-of-the-art PSMs, which illustrates the effectiveness of our proposed framework.

GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning

Nearest Neighbor Search (NNS) has recently drawn a rapid growth of interest because of its core role in high-dimensional vector data management in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommending popular items. Among several categories of methods for fast NNS, graph-based approximate nearest neighbor search algorithms have led to the best-in-class search performance on a wide range of real-world datasets. While prior works improve graph-based NNS search efficiency mainly through exploiting the structure of the graph with sophisticated heuristic rules, in this work, we show that the frequency distributions of edge visits for graph-based NNS can be highly skewed. This finding leads to the study of pruning unnecessary edges to avoid redundant computation during graph traversal by utilizing the query distribution, an important yet under-explored aspect of graph-based NNS. In particular, we formulate graph pruning as a discrete optimization problem, and introduce a graph optimization algorithm GraSP that improves the search efficiency of similarity graphs by learning to prune redundant edges. GraSP enhances an existing similarity graph with a probabilistic model. It then performs a novel subgraph sampling and iterative refinement optimization to explicitly maximize search efficiency when removing a subset of edges in expectation over a graph for a large set of training queries. The evaluation shows that GraSP consistently improves the search efficiency on real-world datasets, providing up to 2.24X faster search speed than state-of-the-art methods without losing accuracy.

CMT-Net: A Mutual Transition Aware Framework for Taxicab Pick-ups and Drop-offs Co-Prediction

With increasing population of modern cities, accurate estimation of regional passenger demands is critical to online taxicab services as such platforms aim at a reformation of taxicab scheduling for a more efficient order dispatching. Though great efforts have been made on passenger demand predictions, existing works still have the following shortcomings: i) they mostly performed based on uniform grid partition, which results in the imbalance of demand volumes among regions and even non-vehicle regions in such partition, ii) none of previous demand forecasting efforts have highlighted the important mutual influences between pick-ups and drop-offs, which are of great significance for taxicab scheduling. To this end, we first devise a multi-kernel based clustering to achieve a taxicab-behavior and geographic-aware sub-region partition, hence a more balanced and compact regional division is obtained. Subsequently, we emphasize the essential factors with regard to mutual transition quantification in taxicab predictions, then propose a Transfer-LSTM and an Origin-Destination-based transition matrix to respectively capture the drop-to-pick and pick-to-drop spatiotemporal transition patterns. Hence, a novel mutual-transition-aware co-prediction framework is devised by capturing complex spatiotemporal interactions between pick-ups and drop-offs. Extensive experiments on two real-world taxicab datasets demonstrate our co-prediction framework is superior to state-of-the-art methods, thus providing novel perspectives to urban human mobility understanding and transition-based taxicab scheduling.

PipAttack: Poisoning Federated Recommender Systems for Manipulating Item Promotion

Due to the growing privacy concerns, decentralization emerges rapidly in personalized services, especially recommendation. Also, recent studies have shown that centralized models are vulnerable to poisoning attacks, compromising their integrity. In the context of recommender systems, a typical goal of such poisoning attacks is to promote the adversary's target items by interfering with the training dataset and/or process. Hence, a common practice is to subsume recommender systems under the decentralized federated learning paradigm, which enables all user devices to collaboratively learn a global recommender while retaining all the sensitive data locally. Without exposing the full knowledge of the recommender and entire dataset to end-users, such federated recommendation is widely regarded 'safe' towards poisoning attacks. In this paper, we present a systematic approach to backdooring federated recommender systems for targeted item promotion. The core tactic is to take advantage of the inherent popularity bias that commonly exists in data-driven recommenders. As popular items are more likely to appear in the recommendation list, our innovatively designed attack model enables the target item to have the characteristics of popular items in the embedding space. Then, by uploading carefully crafted gradients via a small number of malicious users during the model update, we can effectively increase the exposure rate of a target (unpopular) item in the resulted federated recommender. Evaluations on two real-world datasets show that 1) our attack model significantly boosts the exposure rate of the target item in a stealthy way, without harming the accuracy of the poisoned recommender; and 2) existing defenses are not effective enough, highlighting the need for new defenses against our local model poisoning attacks to federated recommender systems.

A Counterfactual Modeling Framework for Churn Prediction

Accurate churn prediction for retaining users is keenly important for online services because it determines their survival and prosperity. Recent research has specified social influence to be one of the most important reasons for user churn, and thereby many works start to model its effects on user churn to improve the prediction performance. However, existing works only use the data's correlational information while neglecting the problem's causal nature. Specifically, the fact that a user's churn is correlated with some social factors does not mean he/she is actually influenced by his/her friends, which results in inaccurate and unexplainable predictions of the existing methods. To bridge this gap, we develop a counterfactual modeling framework for churn prediction, which can effectively capture the causal information of social influence for accurate and explainable churn predictions. Specifically, we first propose a backbone framework that uses two separate embeddings to model users' endogenous churn intentions and the exogenous social influence. Then, we propose a counterfactual data augmentation module to introduce the causal information to the model by providing partially labeled counterfactual data. Finally, we design a three-headed counterfactual prediction framework to guide the model to learn causal information to facilitate churn prediction. Extensive experiments on two large-scale datasets with different types of social relations show our model's superior prediction performance compared with the state-of-the-art baselines. We further conduct an in-depth analysis of the prediction results demonstrating our proposed method's ability to capture causal information of social influence and give explainable churn predictions, which provide insights into designing better user retention strategies.

Towards Fair Classifiers Without Sensitive Attributes: Exploring Biases in Related Features

Despite the rapid development and great success of machine learning models, extensive studies have exposed their disadvantage of inheriting latent discrimination and societal bias from the training data. This phenomenon hinders their adoption on high-stake applications. Thus, many efforts have been taken for developing fair machine learning models. Most of them require that sensitive attributes are available during training to learn fair models. However, in many real-world applications, it is usually infeasible to obtain the sensitive attributes due to privacy or legal issues, which challenges existing fair-ensuring strategies. Though the sensitive attribute of each data sample is unknown, we observe that there are usually some non-sensitive features in the training data that are highly correlated with sensitive attributes, which can be used to alleviate the bias. Therefore, in this paper, we study a novel problem of exploring features that are highly correlated with sensitive attributes for learning fair and accurate classifiers. We theoretically show that by minimizing the correlation between these related features and model prediction, we can learn a fair classifier. Based on this motivation, we propose a novel framework which simultaneously uses these related features for accurate prediction and enforces fairness. In addition, the model can dynamically adjust the regularization weight of each related feature to balance its contribution on model classification and fairness. Experimental results on real-world datasets demonstrate the effectiveness of the proposed model for learning fair models with high classification accuracy.

ST-GSP: Spatial-Temporal Global Semantic Representation Learning for Urban Flow Prediction

Urban flow prediction plays a crucial role in public transportation management and smart city construction. Although previous studies have achieved success in integrating spatial-temporal information to some extents, those models lack thoughtful consideration on global information and positional information in the temporal dimension, which can be summarized by three aspects: a) The models do not consider the relative position information of time axis, resulting in that the position features of flow maps are not effectively learned. b) They overlook the correlation among temporal dependencies of different scales, which lead to inaccurate global information representation. c) Those models only predict the flow map at the end of time sequence other than more flow maps before that, which results in neglecting parts of temporal features in the learning process. To solve the problems, we propose a novel model, Spatial-Temporal Global Semantic representation learning for urban flow Prediction (ST-GSP) in this paper. Specifically, for a), we design a semantic flow encoder that extracts relative positional information of time. Besides, the encoder captures the spatial dependencies and external factors of urban flow at each time interval. For b), we model the correlation among temporal dependencies of different scales simultaneously by using the multi-head self-attention mechanism, which can learn the global temporal dependencies. For c), inspired by the idea of self-supervised learning, we mask an urban flow map on the time sequence and predict it to pre-train a deep bidirectional learning model to catch the representation from its context. We conduct extensive experiments on two types of urban flows in Beijing and New York City to show that the proposed method outperforms state-of-the-art methods.

Multi-Sparse-Domain Collaborative Recommendation via Enhanced Comprehensive Aspect Preference Learning

Cross-domain recommendation (CDR) has been attracting increasing attention of researchers for its ability to alleviate the data sparsity problem in recommender systems. However, the existing single-target or dual-target CDR methods often suffer from two drawbacks, the assumption of at least one rich domain and the heavy dependence on domain-invariant preference, which are impractical in real world where sparsity is ubiquitous and might degrade the user preference learning. To overcome these issues, we propose a Multi-Sparse-Domain Collaborative Recommendation (MSDCR) model for multi-target cross-domain recommendation. Unlike traditional CDR methods, MSDCR treats the multiple relevant domains as all sparse and can simultaneously improve the recommendation performance in each domain. We propose a Multi-Domain Separation Network (MDSN) and a Gated Aspect Preference Enhancement (GAPE) module for MSDCR to enhance a user's domain-specific aspect preferences in a domain by transferring the complementary aspect preferences in other domains, during which the uniqueness of the domain-specific preference can be preserved through the adversarial training offered by MDSN and the complementarity can be adaptively determined by GAPE. Meanwhile, we propose a Multi-Domain Adaptation Network (MDAN) for MSDCR to capture a user's domain-invariant aspect preference. With the integration of the enhanced domain-specific aspect preference and the domain-invariant aspect preference, MSDCR can reach a comprehensive understanding of a user's preference in each sparse domain. At last, the extensive experiments conducted on real datasets demonstrate the remarkable superiority of MSDCR over the state-of-the-art single-domain recommendation models and CDR models.

Joint Learning of E-commerce Search and Recommendation with a Unified Graph Neural Network

Click-through rate (CTR) prediction plays an important role in search and recommendation, which are the two most prominent scenarios in e-commerce. A number of models have been proposed to predict CTR by mining user behaviors, especially users' interactions with items. But the sparseness of user behaviors is an obstacle to the improvement of CTR prediction. Previous works only focused on one scenario, either search or recommendation. However, on a practical e-commerce platform, search and recommendation share the same set of users and items, which means joint learning of both scenarios may alleviate the sparseness of user behaviors. In this paper, we propose a novel Search and Recommendation Joint Graph (SRJGraph) neural network to jointly learn a better CTR model for both scenarios. A key question of joint learning is how to effectively share information across search and recommendation, in spite of their differences. A notable difference between search and recommendation is that there are explicit queries in search, whereas no query exists in recommendation. We address this difference by constructing a unified graph to share representations of users and items across search and recommendation, as well as represent user-item interactions uniformly. In this graph, users and items are heterogeneous nodes, and search queries are incorporated into the user-item interaction edges as attributes. For recommendation where no query exists, a special attribute is attached on user-item interaction edges. We further propose an intention and upstream-aware aggregator to explore useful information from high-order connections among users and items. We conduct extensive experiments on a large-scale dataset collected from, the largest e-commerce platform in China. Empirical results show that SRJGraph significantly outperforms the state-of-the-art approaches of CTR prediction in both search and recommendation tasks.

AngHNE: Representation Learning for Bipartite Heterogeneous Networks with Angular Loss

Real-world networks often show heterogeneity. A frequently encountered type is the bipartite heterogeneous structure, in which two types of nodes and three types of edges exist. Recently, much attention has been devoted to representation learning in these networks. One of the essential differences between heterogeneous and homogeneous learning is that the former structure requires methods to possess awareness to node and edge types. Most existing methods, including metapath-based, proximity-based and graph neural network-based, adopt inner product or vector norms to evaluate the similarities in embedding space. However, these measures either violates the triangle inequality, or show severe sensitivity to scaling transformation. The limitations often hinder the applicability to real-world problems. In view of this, in this paper, we propose a novel angle-based method for bipartite heterogeneous network representation. Specifically, we first construct training sets by generating quintuples, which contain both positive and negative samples from two different parts of networks. Then we analyze the quintuple-based problem from a geometry perspective, and transform the comparisons between preferred and non-preferred samples to the comparisons of angles. In addition, we utilize convolution modules to extract node features. A hinge loss, as the final objective, is proposed to relax the angular constraint for learning. Extensive experiments for two typical tasks show the efficacy of the proposed method, comparing with eight competitive methods.

Learning Transferable Node Representations for Attribute Extraction from Web Documents

Given a web page, extracting an object along with various attributes of interest (e.g. price, publisher, author, and genre for a book) can facilitate a variety of downstream applications such as large-scale knowledge base construction, e-commerce product search, and personalized recommendation. Prior approaches have either relied on computationally expensive visual feature engineering or required large amounts of training data to get to an acceptable precision. In this paper, we propose a novel method, LeArNing TransfErable node RepresentatioNs for Attribute Extraction (LANTERN), to tackle the problem. We model the problem as a tree node tagging task. The key insight is to learn a contextual representation for each node in the DOM tree where the context explicitly takes into account the tree structure of the neighborhood around the node. Experiments on the SWDE public dataset show that LANTERN outperforms the previous state-of-the-art (SOTA) by 1.44% (F1 score) with a dramatically simpler model architecture. Furthermore, we report that utilizing data from a different domain (for instance, using training data about web pages with cars to extract book objects) is surprisingly useful and helps beat the SOTA by a further 1.37%.

C²-CRS: Coarse-to-Fine Contrastive Learning for Conversational Recommender System

Conversational recommender systems (CRS) aim to recommend suitable items to users through natural language conversations. For developing effective CRSs, a major technical issue is how to accurately infer user preference from very limited conversation context. To address issue, a promising solution is to incorporate external data for enriching the context information. However, prior studies mainly focus on designing fusion models tailored for some specific type of external data, which is not general to model and utilize multi-type external data. To effectively leverage multi-type external data, we propose a novel coarse-to-fine contrastive learning framework to improve data semantic fusion for CRS. In our approach, we first extract and represent multi-grained semantic units from different data signals, and then align the associated multi-type semantic units in a coarse-to-fine way. To implement this framework, we design both coarse-grained and fine-grained procedures for modeling user preference, where the former focuses on more general, coarse-grained semantic fusion and the latter focuses on more specific, fine-grained semantic fusion. Such an approach can be extended to incorporate more kinds of external data. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach in both recommendation and conversation tasks.

Fighting Mainstream Bias in Recommender Systems via Local Fine Tuning

In collaborative filtering, the quality of recommendations critically relies on how easily a model can find similar users for a target user. Hence, a niche user who prefers items out of the mainstream may receive poor recommendations, while a mainstream user sharing interests with many others will likely receive recommendations of higher quality. In this work, we study this mainstream bias centering around three key thrusts. First, to distinguish mainstream and niche users, we explore four approaches based on outlier detection techniques to identify a mainstream score indicating the mainstream level for each user. Second, we empirically show that severe mainstream bias is produced by conventional recommendation models. Last, we explore both global and local methods to mitigate the bias. Concretely, we propose two global models: Distribution Calibration (DC) and Weighted Loss (WL) methods; and one local method: Local Fine Tuning (LFT) method. Extensive experiments show the effectiveness of the proposed methods to improve utility for niche users and also show that the proposed LFT can improve the utility for mainstream users at the same time.

Personalized Transfer of User Preferences for Cross-domain Recommendation

Cold-start problem is still a very challenging problem in recommender systems. Fortunately, the interactions of the cold-start users in the auxiliary source domain can help cold-start recommendations in the target domain. How to transfer user's preferences from the source domain to the target domain, is the key issue in Cross-domain Recommendation (CDR) which is a promising solution to deal with the cold-start problem. Most existing methods model a common preference bridge to transfer preferences for all users. Intuitively, since preferences vary from user to user, the preference bridges of different users should be different. Along this line, we propose a novel framework named Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR). Specifically, a meta network fed with users' characteristic embeddings is learned to generate personalized bridge functions to achieve personalized transfer of preferences for each user. To learn the meta network stably, we employ a task-oriented optimization procedure. With the meta-generated personalized bridge function, the user's preference embedding in the source domain can be transformed into the target domain, and the transformed user preference embedding can be utilized as the initial embedding for the cold-start user in the target domain. Using large real-world datasets, we conduct extensive experiments to evaluate the effectiveness of PTUPCDR on both cold-start and warm-start stages. The code has been available at

DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning

Knowledge Graph Embedding (KGE) is a popular method for KG reasoning and training KGEs with higher dimension are usually preferred since they have better reasoning capability. However, high-dimensional KGEs pose huge challenges to storage and computing resources and are not suitable for resource-limited or time-constrained applications, for which faster and cheaper reasoning is necessary. To address this problem, we propose DualDE, a knowledge distillation method to build low-dimensional student KGE from pre-trained high-dimensional teacher KGE. DualDE considers the dual-influence between the teacher and the student. In DualDE, we propose a soft label evaluation mechanism to adaptively assign different soft label and hard label weights to different triples, and a two-stage distillation approach to improve the student's acceptance of the teacher. Our DualDE is general enough to be applied to various KGEs. Experimental results show that our method can successfully reduce the embedding parameters of a high-dimensional KGE by 7× - 15× and increase the inference speed by 2× - 6× while retaining a high performance. We also experimentally prove the effectiveness of our soft label evaluation mechanism and two-stage distillation approach via ablation study.

A Neighborhood-Attention Fine-grained Entity Typing for Knowledge Graph Completion

Knowledge graph (KG) entity typing focuses on inferring possible entity type instances, which is a significant subtask of knowledge graph completion (KGC). Existing entity typing methods usually exploit the entity representation to model the transmission between entities and their types, which cannot fully explore the fine-grained entity typing on identifying the semantic type of an entity. To address these issues, we propose Neighborhood-Attention Neural Fine-Grained Entity Typing (AttEt), which considers the neighborhood information of the entities from KGs to bridge entities and their types together. In this paper, AttEt first develops a type-specific attention mechanism to aggregate the neighborhood knowledge of the given entity with type-specific weights. These weights are beneficial to capture various characteristics for different types of the entity, and further imply the complex correlation among these fine-grained types. Then, AttEt adaptively integrates the aggregated neighbor-level representation with entity inherent embedding to calculate the matching score between the entity and its candidate type. Besides, many entities are sparse in their relations with other entities in KGs, which makes the entity typing task more challenging. To solve this problem, we present a smooth strategy on relation-sparsity entities to improve the robustness of the model. Extensive experiments on two real-world datasets (Freebase and YAGO) show that AttEt significantly outperforms state-of-the-art baselines in the HITS@1 by 2.11% on Freebase and by 8.42% on YAGO, respectively.

Improving Session Search by Modeling Multi-Granularity Historical Query Change

In session search, it's important to utilize historical interactions between users and the search engines to improve document retrieval. However, not all historical information contributes to document ranking. Users often express their preferences in the process of modifying the previous query, which can help us catch useful information in the historical interactions. Inspired by it, we propose to model historical query change to improve document ranking performance. Especially, we characterize multi-granularity query change between each pair of adjacent queries at both term level and semantic level. For term level query change, we calculate three types of term weights, including the retained term weights, added term weights and removed term weights. Then we perform term-based interaction between the candidate document and historical queries based on the term weights. For semantic level query change, we calculate an overall representation of user intent by integrating the representations of each historical query obtained by different types of term weights. Then we adopt representation-based matching between this representation and the candidate document. To improve the effect of query change modeling, we introduce query change classification as an auxiliary task. Experimental results on AOL and TianGong-ST search logs show that our model outperforms most existing models for session search.

SESSION: Doctoral Consortium

Improving Text Generation via Neural Discourse Planning

Recent Transformer-based approaches to NLG like GPT-2 can generate syntactically coherent original texts. However, these generated texts have serious flaws. One of them is a global discourse incoherence. We present an approach to estimate the quality of discourse structure. Empirical results confirm that the discourse structure of currently generated texts is inaccurate. We propose the research directions to plan it and fill in the text in its leaves using the pipeline consisting of two GPT-based generation models. The suggested approach is universal and can be applied to different languages.

Robust Graph Learning for Misbehavior Detection

Recent years have witnessed the thriving of online services like social media, e-commerce, and e-finance. Those services facilitate our daily lives while breeding malicious actors like fraudsters and spammers to promote misinformation, gain monetary rewards, or reap end users' privacy. Graph-based machine learning models have been playing a critical and irreplaceable role in modeling and detecting online misbehavior. With the observation that misbehaviors are different from massive regular behaviors, the graph models can leverage the relationship between data entities from a holistic view and reveal suspicious behaviors as anomalous nodes/edges/subgraphs on the graph. In this proposal, we investigate the graph-based misbehavior detection models from an adversarial perspective, considering the adversarial nature of malicious actors and real-world factors that impair graph models' robustness. We first introduce two published works enhancing the robustness of several graph-based misbehavior detectors using reinforcement learning. Then, we propose to explore: 1) the robustness of graph neural networks for misinformation detection on social media; and 2) the general robustness of graph neural networks towards unknown perturbations.

GNNs for Node Clustering in Signed and Directed Networks

With an increasing number of applications where data can be represented as graphs, graph neural networks are a useful tool to apply deep learning to graph data. In particular, node clustering is an important problem in network analysis. Signed and directed networks are important types of networks that are linked to many real-world problems; their asymmetry provides a challenge for many clustering methods.

We propose two graph neural network models for node clustering in signed networks and directed networks, respectively. The methods are end-to-end in combining embedding generation and clustering without an intermediate step. Experimental results on a synthetic signed stochastic block model, a polarized version of it, and real-world data at different scales, demonstrate that our proposed methods can achieve comparable or better results than state-of-the-art node clustering methods, for a wide range of noise and sparsity levels. The introduced models complement existing well-performing methods through the possibility of including exogenous information, in the form of node-level features or labels.

Towards Practical Robustness Evaluation and Robustness Enhancing

Deep neural networks (DNNs) have been widely applied on various machine learning tasks and have achieved significant performance across multiple domains. However, it well known that DNNs suffer from severe adversarial vulnerability. Thus it raises great concernswhen DNNs are adopted to safety-critical tasks. These concerns boost the area of adversarial machine learning, which mainly fo-cus on evaluating model robustness through adversarial attacks and gain reliable model performance through adversarial defenses.Among this board topic, my research work focus on a practical perspective. Specifically, there are three subtopics: (1) Enhancing robustness performance for adversarial learning from feature perspective. (2) Standardized and Reliable evaluation for black box attacks under different settings. (3) Building user-friendly adversarial learning tools to help evaluate model robustness. In this research statement, we will mainly focus on these three topics and we will take this opportunity to share our contribution to the relate problems.

From Uni-relational to Multi-relational Graph Neural Networks

Graph Neural Networks (GNNs), which extend deep neural networks to graph-structured data, have attracted increasing attention. They have been proven to be powerful for numerous graph related tasks that cover a variety of research areas including natural language processing, information retrieval and knowledge graph completion (KGC). GNNs are primary designed for simple homogeneous and uni-relational graphs. Due to its great success in handling the graph data, considerable studies have been developed to extend GNNs to process complex multi-relational graphs such as the knowledge graph. My research first focuses on learning effective representation of uni-relational graph to facilitate some downstream applications such as graph classification and query understanding, and show the great capacity of GNNs to advance these tasks. Although the GNNs have demonstrated its significant effectiveness on the uni-relational graph in a large range of applications, we surprisingly found it may not be as crucial as previously believed in the knowledge graph completion task. It suggests careful attention to more suitable GNNs designs for KGC task.

Trustworthy Machine Learning: Fairness and Robustness

In recent years, machine learning (ML) technologies have experienced swift developments and attracted extensive attention from both academia and industry. The applications of ML are extended to multiple domains, from computer vision, text processing, to recommendations, etc. However, recent studies have uncovered the untrustworthy side of ML applications. For example, ML algorithms could show human-like discrimination against certain individuals or groups, or make unreliable decisions in safety-critical scenarios, which implies the absence of fairness and robustness, respectively. Consequently, building trustworthy machine learning systems has become an urgent need. My research strives to help meet this demand. In particular, my research focuses on designing trustworthy ML models and spans across three main areas: (1) fairness in ML, where we aim to detect, eliminate bias and ensure fairness in various ML applications; (2) robustness in ML, where we seek to ensure the robustness of certain ML applications towards adversarial attacks; (3) specific applications of ML, where my research involves the development of ML-based natural language processing (NLP) models and recommendation systems.

Obtaining Robust Models from Imbalanced Data

The vulnerability of deep neural network (DNN) models has been verified by the existence of adversarial examples. By exploiting slight changes to input examples, the generated adversarial examples can easily cause well trained DNN models make wrong predictions. Many defense methods have been proposed to improve the robustness of DNN models against adversarial examples. Among them, adversarial training has been empirically proven to be one of the most effective methods. Almost all existing studies about adversarial training are focused on balanced datasets, where each class has an equal amount of training examples. However, as datasets collected in real-world applications cannot guarantee all contained classes are uniformly distributed, it would be much challenging to obtain robust models in those real applications where the available training datasets are imbalanced. As the initial effort to study this problem, we first investigate the different behaviors between adversarially trained models and naturally trained models using imbalanced training datasets and then explore possible solutions to facilitate adversarial training under imbalanced settings.

Fair Graph Representation Learning with Imbalanced and Biased Data

Graph-structured data is omnipresent in various fields, such as biology, chemistry, social media and transportation. Learning informative graph representations are crucial in effectively completing downstream graph-related tasks such as node/graph classification and link prediction. Graph Neural Networks (GNNs), due to their inclusiveness on handling graph-structured data and distinguished data-mining power inherited from deep learning, have achieved significant success in learning graph representations. Nonetheless, most existing GNNs are mainly designed with unrealistic data assumptions, such as the balanced and unbiased data distributions while abounding real-world networks exhibit skewed (i.e., long-tailed) node/graph class distributions and may also encode patterns of previous discriminatory decisions dominated by sensitive attributes. Even further, extensive research efforts have been invested in developing GNN architectures towards improving model utility while most of the time totally ignoring whether the obtained node/graph representations conceal any discriminatory bias, which could lead to prejudicial decisions as GNN-based machine learning models are increasingly being utilized in real-world applications. In light of the prevalence of the above two types of unfairness originated from quantity-imbalanced and discriminatory bias, my research expects to propose novel node/graph representation learning frameworks through constructing innovative GNN architectures and devising novel graph-mining algorithms to learn both fair and expressive node/graph representations that can enjoy a favorable fairness-utility tradeoff.

Doctoral Consortium of WSDM'22: Exploring the Bias of Adversarial Defenses

Deep neural networks (DNNs) have achieved extraordinary accomplishments on various machine learning tasks. However, the existence of adversarial attacks still raise great concerns when they are adopted to safety-critical tasks. As countermeasures to protect DNN models against adversarial attacks, there are various defense strategies proposed. However, we find that the robustness ("safety'') provided by the robust training algorithms usually result unequal performance either among classes or sub-populations across the whole data distribution. For example, the model can achieve extremely low accuracy / robustness on certain groups of data. As a result, the safety of the model is still under great threats. As a summary, our project is about to study the bias problems of robust trained neural networks from different perspectives, which aims to build eventually reliable and safe deep learning models. We propose to present our research works in the Doctoral Consortium in WSDM'22 and gain opportunities to share our contribution to the relate problems.

SESSION: Demonstrations

Quality Assurance of a German COVID-19 Question Answering Systems using Component-based Microbenchmarking

Question Answering (QA) has become an often used method to retrieve data as part of chatbots and other natural-language user interfaces. In particular, QA systems of official institutions have high expectations regarding the answers computed by the system, as the provided information might be critical. In this demonstration, we use the official COVID-19 QA system that was developed together with the German Federal government to provide German citizens access to data regarding incident values, number of deaths, etc. To ensure high quality, a component-based approach was used that enables exchanging data between QA components using RDF and validating the functionality of the QA system using SPARQL. Here, we will demonstrate how our solution enables developers of QA systems to use a descriptive approach to validate the quality of their implementation before the system's deployment and also within a live environment.

iLFQA: A Platform for Efficient and Accurate Long-Form Question Answering

We present an efficient and accurate long-form question-answering platform, dubbed iLFQA (i.e., short for intelligent Long-Form Question Answering). The purpose of iLFQA is to function as a platform which accepts unscripted questions and efficiently produces semantically meaningful, explanatory, and accurate long-form responses. iLFQA consists of a number of modules for zero-shot classification, text retrieval, and text generation to generate answers to questions based on an open-domain knowledge base. iLFQA is unique in the question answering space because it is an example of a deployable and efficient long-form question answering system. Question answering systems exist in many forms, but long-form question answering remains relatively unexplored, and to the best of our knowledge none of the existing long-form question answering systems are shown to be sufficiently efficient to be deployable. We have made the source code and implementation details of iLFQA available for the benefit of researchers and practitioners in this field. With this demonstration, we present iLFQA as an open-domain, deployable, and accurate open-source long-form question answering platform.

Web Search via an Efficient and Effective Brain-Machine Interface

While search technologies have evolved to be robust and ubiquitous, the fundamental interaction paradigm has remained relatively stable for decades. With the maturity of the Brain-Machine Interface(BMI), we build an efficient and effective communication system between human beings and search engines based on electroencephalogram (EEG) signals, called Brain Machine Search Interface (BMSI)system. The BMSI system provides functions including query reformulation and search result interaction. In our system, users can perform search tasks without having to use the mouse and keyboard. Therefore, it is useful for application scenarios in which hand-based interactions are infeasible, e.g, for users with severe neuromuscular disorders. Besides, based on brain signals decoding, our system can provide abundant and valuable user-side context information (e.g., real-time satisfaction feedback, extensive context information, and a clearer description of information needs) to the search engine, which is hard to capture in the previous paradigm. In our implementation, the system can decode user satisfaction from brain signals in real-time during the interaction process and re-rank the search results list based on user satisfaction feedback.The demo video is available at

Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini

We demonstrate, via competitive bag-of-words first-stage retrieval baselines for the MS MARCO document ranking task, seamless replicability and interoperability between Elasticsearch and the Pyserini IR toolkit, which are both built on the open-source Lucene search library. This integration highlights the benefits of recent efforts to promote the use of Lucene in information retrieval research to better align the research and practice of building search applications. Closer alignment between academia and industry is mutually beneficial: Academic researchers gain a smoother path to real-world impact because their contributions can be more easily deployed in production applications. Industry practitioners gain an easy way to benchmark their innovations in a rigorous and vendor-neutral manner by exploiting evaluation resources and infrastructure built by the academic community. This two-way exchange between academia and industry allows both parties to "have their cakes and eat them too".

Building Multi-turn Query Interpreters for E-commercial Chatbots with Sparse-to-dense Attentive Modeling

Predicting query intents is crucial for understanding user demands in chatbots. In real-world applications, accurate query intent classification can be highly challenging as human-machine interactions are often conducted in multiple turns, which requires the models to capture related information from the entire contexts. In addition, query intents tend to be fine-grained (up to hundreds of classes), containing lots of casual chats without clear intents. Hence, it is difficult for standard transformer-based models to capture complicated language characteristics of dialogues to support these applications. In this demo, we present AliMeTerp, a multi-turn query interpretation system, which can be seamlessly integrated into e-commercial chatbots in order to generate appropriate responses. Specifically, in AliMeTerp, we introduce SAM-BERT, a pre-trained language model for fine-grained query intent understanding, based on Sparse-to-dense Attentive Modeling. For model pre-training, a stack of Sparse-to-dense Attentive Encoders are employed to model the complicated dialogue structures from different levels. We further design Hierarchical Multi-grained Classification tasks for model fine-tuning. Experiments show SAM-BERT consistently outperforms strong baselines over multiple multi-turn chatbot datasets. We further show how AliMeTerp is deployed in real-world e-commercial chatbots to support real-time customer service.

Automating ETL and Mining of Ethereum Blockchain Network

The popularity of blockchain technology led to the development of many web platforms with different functionalities. Ethereum, a decentralized, open-source blockchain featuring smart contracts, introduces an interesting ecosystem of human users and autonomous agents (the contracts). It is the most actively used blockchain platform, hosting ether, the second largest cryptocurrency by market capitalization, as its native store of value. The Ethereum blockchain contains a vast amount of user-to-user, user-to-contract, contract-to -user, and contract-to-contract interactions that can be modeled as complex networks. To mine these interactions as graphs through a preferred analytics toolbox, an end-user has to extract, transform, and load (ETL) the required data into the desired network format. However, it is costly and time-consuming to manage the ETL pipeline for the massive and complex blockchain data. To support research in this domain, we develop an end-to-end, automated tool - EtherNet, which performs ETL tasks from a single source of truth (Google BigQuery), and provides graph equivalent representations for visualization and mining on the entire Ethereum blockchain network.

a2RegInf: An Interactive System for Maximizing Influence within Arbitrary Number of Arbitrary Shaped Query Regions

Recently, aside with the prevalent usage of location-based social network, location-aware influence maximization (laim) problem has received plenty of attention in viral marketing. It aims to find a set of seed users such that information propagated from them can reach the largest number of users within particular geographical regions. However, existing solutions to laim can only work on single simple query region, e.g., a rectangle, instead of complex ones. Besides, there is no ready-to-use system for users to address laim visually. In this work, we present a pair of solutions towards location-aware influence maximization problem. Both can work on queries with arbitrary number of regions and arbitrary shapes. More importantly, we implement a web-based system, namely a2RegInf, which enables viral marketers to address laim visually, with native GPU support. To the best of our knowledge, we are the first to provide a ready-to-use system for answering the problem over web-based interface that supports arbitrary number of arbitrary shaped query regions.

Node Co-occurrence based Graph Neural Networks for Knowledge Graph Link Prediction

We introduce a novel embedding model, named NoGE, which aims to integrate co-occurrence among entities and relations into graph neural networks to improve knowledge graph completion (i.e., link prediction). Given a knowledge graph, NoGE constructs a single graph considering entities and relations as individual nodes. NoGE then computes weights for edges among nodes based on the co-occurrence of entities and relations. Next, NoGE proposes Dual Quaternion Graph Neural Networks (DualQGNN) and utilizes DualQGNN to update vector representations for entity and relation nodes. NoGE then adopts a score function to produce the triple scores. Comprehensive experimental results show that NoGE obtains state-of-the-art results on three new and difficult benchmark datasets CoDEx for knowledge graph completion.

PrivacyCheck v3: Empowering Users with Higher-Level Understanding of Privacy Policies

Online privacy policies are lengthy and hard to read, yet are profoundly important as they communicate the practices of an organization pertaining to user data privacy. Privacy Enhancing Technologies, or PETs, seek to inform users by summarizing these privacy policies. Efforts in the research and development of such PETs, however, have largely been limited to tools that recap the policy or visualize it. We present the next generation of our research and publicly available tool, PrivacyCheck v3, that utilizes machine learning to inform and empower users with respect to privacy policies. PrivacyCheck v3 adds capabilities that are commonly absent from similar PETs on the web. In particular, it adds the ability to (1) find the competitors of an organization with Alexa traffic analysis and compare policies across them, (2) follow privacy policies to which the user has agreed and notify the user when policies change, (3) track policies over time and report how often policies change and their trends, (4) automatically find privacy policies in domains, and (5) provide a bird's-eye view of privacy policies. The new features of PrivacyCheck not only inform users about details of privacy policies, but also empower them to understand privacy policies at a higher level, make informed decisions, and even select competitors with better privacy policies.

RGRecSys: A Toolkit for Robustness Evaluation of Recommender Systems

Robust machine learning is an increasingly important topic that focuses on developing models resilient to various forms of imperfect data. Due to the pervasiveness of recommender systems in online technologies, researchers have carried out several robustness studies focusing on data sparsity and profile injection attacks. Instead, we propose a more holistic view of robustness for recommender systems that encompasses multiple dimensions - robustness with respect to sub-populations, transformations, distributional disparity, attack, and data sparsity. While there are several libraries that allow users to compare different recommender system models, there is no software library for comprehensive robustness evaluation of recommender system models under different scenarios. As our main contribution, we present a robustness evaluation toolkit, Robustness Gym for RecSys (RGRecSys), that allows us to quickly and uniformly evaluate the robustness of recommender system models.

Fair-SRS: A Fair Session-based Recommendation System

This paper demonstrates Fair-SRS, a Fair Session-based Recommendation System that predicts user's next click based on their historical and current sessions. Fair-SRS provides personalized and diversified recommendations in two main steps: (1) forming user's session graph embeddings based on their long- and short-term interests, and (2) computing user's level of interest in diversity based on their recently-clicked items' similarity. In real-world scenarios, users tend to interact with more or fewer contents at different times, and providers expect to receive more exposure for their items. To achieve the objectives of both sides, the proposed Fair-SRS optimizes recommendations by making a trade-off between accuracy and personalized diversity.

Crowd_Frame: A Simple and Complete Framework to Deploy Complex Crowdsourcing Tasks Off-the-shelf

Due to their relatively low cost and ability to scale, crowdsourcing based approaches are widely used to collect a large amount of human annotated data. To this aim, multiple crowdsourcing platforms exist, where requesters can upload tasks and workers can carry them out and obtain payment in return. Such platforms share a task design and deploy workflow that is often counter-intuitive and cumbersome. To address this issue, we propose Crowd\_Frame, a simple and complete framework which allows to develop and deploy diverse types of complex crowdsourcing tasks in an easy and customizable way. We show the abilities of the proposed framework and we make it available to researchers and practitioners.

An Interactive Knowledge Graph Based Platform for COVID-19 Clinical Research

Since the first identified case of COVID-19 in December 2019, a plethora of pharmaceuticals and therapeutics have been tested for COVID-19 treatment. While medical advancements and breakthroughs are well underway, the sheer number of studies, treatments, and associated reports makes it extremely challenging to keep track of the rapidly growing COVID-19 research landscape. While existing scientific literature search systems provide basic document retrieval, they fundamentally lack the ability to explore data, and in addition, do not help develop a deeper understanding of COVID-19 related clinical experiments and findings. As research expands, results do so as well, resulting in a position that is complicated and overwhelming. To address this issue, we present a named entity recognition based framework that accurately extracts COVID-19 related information from clinical test results articles, and generates an efficient and interactive visual knowledge graph. This knowledge graph platform is user friendly, and provides intuitive and convenient tools to explore and analyze COVID-19 research data and results including medicinal performances, side effects and target populations.

Personalized Information Retrieval for Touristic Attractions in Augmented Reality

The rapid advances and increasing accessibility of augmented reality (AR) in recent years opened up many new possibilities to incorporate AR into our daily lives. A very interesting area for AR is tourism where one can enhance attractions with virtual elements and provide tourists with additional information about the places they are visiting. In this paper, we present our prototype, an AR application that augments various points of interest (POIs) by showing images and facts about each POI. We also developed a simple recommender system that ensures the facts are selected based on user preferences, thus creating a unique and personalized experience for each user. Furthermore, we also conducted a live user study to assess the usability of our prototype and the usefulness of our personalization system.

SESSION: Tutorial Presentations

Search and Discovery in Personal Email Collections

Email has been an essential communication medium for many years. As a result, the information accumulated in our mailboxes has become valuable for all of our personal and professional activities. For years, researchers have developed interfaces, models, and algorithms to facilitate email search, discovery, and organization. This tutorial brings together these diverse research directions and provides both a historical background, as well as a high-level overview of the recent advances in the field. In particular, we lay out all of the components needed in the design of email search engines, including user interfaces, indexing, document and query understanding, retrieval, ranking, evaluation, and data privacy. The tutorial also goes beyond search, presenting recent work on intelligent task assistance in email and a number of interesting future directions.

Graph Minimally-supervised Learning

Graphs are widely used for abstracting complex systems of interacting objects, such as social networks, knowledge graphs, and traffic networks, as well as for modeling molecules, manifolds, and source code. To model such graph-structured data, graph learning, in particular deep graph learning with graph neural networks, has drawn much attention in both academic and industrial communities lately. Prevailing graph learning methods usually rely on learning from "big'' data, requiring a large amount of labeled data for model training. However, it is common that graphs are associated with "small'' labeled data as data annotation and labeling on graphs is always time and resource-consuming. Therefore, it is imperative to investigate graph learning with minimal human supervision for the low-resource settings where limited or even no labeled data is available. In this tutorial, we will focus on the state-of-the-art techniques of Graph Minimally-supervised Learning, in particular a series of weakly-supervised learning, few-shot learning, and self-supervised learning methods on graph-structured data as well as their real-world applications. The objectives of this tutorial are to: (1) formally categorize the problems in graph minimally-supervised learning and discuss the challenges under different learning scenarios; (2) comprehensively review the existing and recent advances of graph minimally-supervised learning; and (3) elucidate open questions and future research directions. This tutorial introduces major topics within minimally-supervised learning and offers a guide to a new frontier of graph learning. We believe this tutorial is beneficial to researchers and practitioners, allowing them to collaborate on graph learning.

Graph Neural Networks for Recommender System

Recently, graph neural network (GNN) has become the new state-of-the-art approach in many recommendation problems, with its strong ability to handle structured data and to explore high-order information. However, as the recommendation tasks are diverse and various in the real world, it is quite challenging to design proper GNN methods for specific problems. In this tutorial, we focus on the critical challenges of GNN-based recommendation and the potential solutions. Specifically, we start from an extensive background of recommender systems and graph neural networks. Then we fully discuss why GNNs are required in recommender systems and the four parts of challenges, including graph construction, network design, optimization, and computation efficiency. Then, we discuss how to address these challenges by elaborating on the recent advances of GNN-based recommendation models, with a systematic taxonomy from four critical perspectives: stages, scenarios, objectives, and applications. Last, we finalize this tutorial with conclusions and discuss important future directions.

A Tutorial on Stance Detection

Stance detection (also known as stance classification, stance prediction, and stance analysis) is a problem related to social media analysis, natural language processing, and information retrieval, which aims to determine the position of a person from a piece of text they produce, towards a target (a concept, idea, event, etc.) either explicitly specified in the text, or implied only. Common stance classes include Favor, Against, and None. In this tutorial, we will define the core concepts and other related research problems, present historical and contemporary approaches to stance detection (including shared tasks and tools employed), provide pointers to related datasets, and cover open research directions and application areas of stance detection. As solutions to stance detection can contribute to diverse applications including trend analysis, opinion surveys, user reviews, personalization, and predictions for referendums and elections, it will continue to stand as an important research problem, mostly on textual content currently, and particularly on Web content including social media.

Half-Day Tutorial on Combating Online Hate Speech: The Role of Content, Networks, Psychology, User Behavior, etc.

While the rise in popularity of social media is seen as a hugely positive development, it is also accompanied by a proliferation of hate speech, which has recently become a major concern. On the one hand, hateful content creates an unsafe environment for certain members of society. On the other hand, manual moderation causes distress to content moderators, and the volume of harmful content is far beyond what human moderators can manually flag and react to. Thus, researchers in machine learning, social computing, and other areas have worked on developing tools to help automate the process. While initially studied as a text classification problem, over time, researchers realized that hate speech is multi-faceted and requires analysis of the role of linguistic expressions, context, and network structure, while using inspiration from psychology and user behavior, among others. With this in mind, we provide a holistic view of what the research community has explored so far, and what we believe are promising future research directions.

Fact-Checking, Fake News, Propaganda, Media Bias, and the COVID-19 Infodemic

Social media have democratized content creation and have made it easy for anybody to spread information online. However, stripping traditional media from their gate-keeping role has left the public unprotected against biased, deceptive and disinformative content, which could now travel online at breaking-news speed and influence major public events. For example, during the COVID-19 pandemic, a new blending of medical and political disinformation has given rise to the first global infodemic. We offer an overview of the emerging and inter-connected research areas of fact-checking, disinformation, "fake news'', propaganda, and media bias detection. We explore the general fact-checking pipeline and important elements thereof such as check-worthiness estimation, spotting previously fact-checked claims, stance detection, source reliability estimation, detection of persuasion techniques, and detecting malicious users in social media. We also cover large-scale pre-trained language models, and the challenges and opportunities they offer for generating and for defending against neural fake news. Finally, we discuss the ongoing COVID-19 infodemic.

Modern Theoretical Tools for Understanding and Designing Next-generation Information Retrieval System

In the relatively short history of machine learning, the subtle balance between engineering and theoretical progress has been proved critical at various stages. The most recent wave of AI has brought to the IR community powerful techniques, particularly for pattern recognition. While many benefits from the burst of ideas as numerous tasks become algorithmically feasible, the balance is tilting toward the application side. The existing theoretical tools in IR can no longer explain, guide, and justify the newly-established methodologies. With no choices, we have to bet our design on black-box mechanisms that we only empirically understand. The consequences can be suffering: in stark contrast to how the IR industry has envisioned modern AI making life easier, many are experiencing increased confusion and costs in data manipulation, model selection, monitoring, censoring, and decision making. This reality is not surprising: without handy theoretical tools, we often lack principled knowledge of the pattern recognition model's expressivity, optimization property, generalization guarantee, and our decision-making process has to rely on over-simplified assumptions and human judgments from time to time. Facing all the challenges, we started researching advanced theoretical tools emerging from various domains that can potentially resolve modern IR problems. We encountered many impactful ideas and made several independent publications emphasizing different pieces. Time is now to bring the community a systematic tutorial on how we successfully adapt those tools and make significant progress in understanding, designing, and eventually productionize impactful IR systems. We emphasize systematicity because IR is a comprehensive discipline that touches upon particular aspects of learning, causal inference analysis, interactive (online) decision-making, etc. It thus requires systematic calibrations to render the actual usefulness of the imported theoretical tools to serve IR problems, as they usually exhibit unique structures and definitions. Therefore, we plan this tutorial to systematically demonstrate our learning and successful experience of using advanced theoretical tools for understanding and designing IR systems.

SESSION: Smart City Talks

The Pit Stop Problem: How to Plan Your Next Road Trip

Many online trip planning and navigation software need to routinely solve the problem of deciding where to take stops during a journey for various services such as refueling (or EV charging), rest stops, food, etc. The goal is to minimize the overhead of these stops while ensuring that the traveler is not starved of any essential resource (such as fuel, rest, or food) during the journey. In this paper, we formally model this problem and call it the pit stop problem. We design algorithms for this problem under various settings: single vs multiple types of stops, and offline vs online optimization (i.e., in advance of or during the trip). Our algorithms achieve provable guarantees in terms of approximating the optimal solution. We then extensively evaluate our algorithms on real world data and demonstrate that they significantly outperform baseline solutions.

Predicting Users' Gender and Age based on Mobile Tasks

Demographic attributes are a key factor in marketing products and services, which enable a business owner to find the ideal customer. Users' app usage behaviors could reveal rich clues regarding their personal attributes since they always determine what apps to use depending on their personal needs and interests. Prior studies [1, 2] have tried to predict users' gender and age through their app usage behavior. However, most of the existing methods for users' demographic prediction are straightforward, simply using popular used apps or app usage frequency as features, without considering the internal semantic relationship of apps usage.

Recently, mobile tasks [3] have been identified from mobile app usage logs, representing a more accurate unit for capturing users' goals and behavioral insights, where a "mobile task" can be thought of as a group of related used apps to accomplish a single discrete task. For example, to plan dinner with friends, multiple apps (e.g., WhatsApp, Yelp, Uber and Google Maps) might be accessed for completing the task. In this talk, I will introduce how we leverage the fine-grained task units for generating user representation aims at predicting users' gender and age. We analyzed the effectiveness of using tasks to infer users' demographics, especially when compared to only treating apps independently. We explored different approaches for constructing users' representation and models with both mobile apps and tasks. Finally, we validated that the two-level hierarchical structure of "apps to tasks" and "tasks to users" is an important factor that should be taken into consideration for improving mobile user modelling. This work shed light on whether and how the extracted mobile tasks could be effectively applied. We believe that the task-based representations could be further explored for improving many other applications.

Towards a Cross-domain Semantically Interoperable Ecosystem

The increasing number of IoT devices and digital services offers cross-domain sensing and control opportunities to a growing set of stakeholders. The provision of cross-domain digital services requires interoperability as a key enabler to bridge domain specifics, while inferring knowledge and allowing new data-driven services.

This work addresses H2020 InterConnect project's Interoperability Framework, highlighting the use of semantic web technologies. The interoperability framework layering is presented, particularly addressing the Semantic Interoperability layer as its cornerstone to build an interoperable ecosystem of cross-domain digital services via a federation of distributed knowledge bases.

Departing from a generic, ontology-agnostic approach that can fit any cross-domain use case, it validates the approach by considering the SAREF family of ontologies, showcasing an IoT and energy cross-domain use case.

SESSION: Industry Day Talks

Exploration in Recommender Systems

In the era of increasing choices, recommender systems are becoming indispensable in helping users navigate the million or billion pieces of content on recommendation platforms. Most of the recommender systems are powered by ML models trained on a large amount of user-item interaction data. Such a setup however induces a strong feedback loop that creates the rich gets richer phenomenon where head contents are getting more and more exposure while tail and fresh contents are not discovered. At the same time, it pigeonholes users to contents they are already familiar with. We believe exploration is key to break away from the feedback loop and to optimize long term user experience on recommendation platforms.

The exploration-exploitation tradeoff, being the foundation of bandits and RL research, has been extensively studied in RL. While effective exploration is believed to positively influence the user experience on the platform, the exact value of exploration in recommender systems has not been well established. In this talk, we examine the roles of exploration in recommender systems in three facets: 1) system exploration to surface fresh/tail recommendations based on users' known interests; 2) user exploration to identify unknown user interests or introduce users to new interests; and 3) online exploration to utilize real-time user feedback to reduce extrapolation errors in performing system and user exploration. We discuss the challenges in measurements and optimization in different types of exploration, and propose initial solutions. We showcase how each aspect of exploration contributes to the long term user experience through offline and live experiments on industrial recommendation platforms. We hope this talk can inspire more follow up work in understanding and improving exploration in recommender systems.

Studying Long-Term User Behaviour Using Dynamic Time Warping for Customer Retention

Dynamic Time Warping (DTW) is an asynchronous alignment algorithm used to measure similarities between temporal sequences. DTW is advantageous for comparing sequences when the shape of the pattern is more important than the speed of events. It has been widely used for automated speech recognition, pattern detection in stock pricing data, studying energy consumption patterns for appliances, etc. In this presentation, we discuss the use of dynamic time warping for understanding long-term usage patterns on Twitter. Time series data are more useful to understand repeat user behavior and build user narratives than aggregated and/or snapshot user features. We utilize the time series of different user metrics as temporal signals and cluster them to identify specific usage and engagement patterns for churning users (users who become inactive). This approach led to more accurate opportunity sizing for the different user personas which in turn helped prioritize interventions for customer retention. We will discuss the implementation of this approach for large-scale data, understanding the time series clusters in a human-understandable manner, and challenges associated with multi-dimensional time-series data in the presentation.

AI & Public Data for Humanitarian and Emergency Response

When an emergency event, or an incident relevant for peacekeeping or humanitarian needs first occurs, getting the right information as quickly as possible is critical in saving lives. When an event is ongoing, information on what is happening can be critical in making decisions to keep people safe and take control of the particular situation unfolding. In both cases, first responders, peacekeepers, and others have to quickly make decisions that include what resources to deploy and where. Fortunately, in most emergencies, people use social media to publicly share information. At the same time, sensor data is increasingly becoming available. But a platform to detect emergency situations and deliver the right information has to deal with ingesting thousands of noisy data points per second: sifting through and identifying relevant information, from different sources, in different formats, with varying levels of detail, in real time, so that relevant individuals and teams can be alerted at the right level and at the right time. In this talk I will describe the technical challenges in processing vast amounts of heterogenous, noisy data in real time from the web and other sources, highlighting the importance of interdisciplinary research and a human-centered approach to address problems in humanitarian and emergency response. I will give specific examples and discuss relevant future research directions in Machine Learning, NLP, Information Retrieval, Computer Vision and other fields, highlighting the role of knowledge combined with Neural and other approaches. This talk will present an overview, and draw from some of our publications at CVPR, AAAI, EMNLP, and others.

Scalable Attribute Extraction at Instacart

Structured attribute information extracted from natural text inputs has been extensively exploited in e-commerce to help improve the customer experience. For example, attributes can be extracted from the product catalog data such as product name & product descriptions; similarly, attributes can also be extracted from user queries to the search engine. Having these attribute information available can help greatly boost relevance of many different functionalities such as search, recommendation, and ads. However, with the huge space of product categories and extensive details in the product information, how to extract attribute information from text with high accuracy and high efficiency becomes an extremely challenging problem.

In this talk, we will present the scalable machine learning-based attribute extraction pipeline we have built at Instacart for our online grocery business. We start our presentation with the unique challenges at Instacart on building our meta-catalog (catalog on top of catalogs from different retailers), and how we work with a diverse set of attribute naming conventions from multiple sources. After which we will talk about how we bootstrapped our attribute extraction work from scratch following a human-in-the-loop based solution, and trained our practical machine learning-based attribute extraction solution. We then present our achievement on unifying the attribute extraction on both user search queries and product textual information, and how we tackle the problem of mitigating the vocabulary gap between user search queries and product textual information in the catalog. Finally we present applications of our work in the real production environment and our learnings.

A Practical Guide to Robust Multimodal Machine Learning and Its Application in Education

Recently we have seen a rapid rise in the amount of education data available through the digitization of education. This huge amount of education data usually exhibits in a mixture form of images, videos, speech, texts, etc. It is crucial to consider data from different modalities to build successful applications in AI in education (AIED). This talk targets AI researchers and practitioners who are interested in applying state-of-the-art multimodal machine learning techniques to tackle some of the hard-core AIED tasks. These include tasks such as automatic short answer grading, student assessment, class quality assurance, knowledge tracing, etc.

In this talk, I will share some recent developments of successfully applying multimodal learning approaches in AIED, with a focus on those classroom multimodal data. Beyond introducing the recent advances of computer vision, speech, natural language processing in education respectively, I will discuss how to combine data from different modalities and build AI driven educational applications on top of these data. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use models, and how related models and techniques benefit real-world AIED applications.

Mining Frequent Patterns on Knowledge Graphs

The adoption of knowledge graphs is growing across various domains. And their construction, when automated, can result in verbose sparse graphs. In this talk, we discuss how to mine frequent patterns and subgraphs in a calculation knowledge graph in order to factor out verbose representations and allow for each of maintenance, reduction of storage, and an increase in runtime speed.

Near Real Time AI Personalization for Notifications at LinkedIn

Notifications at LinkedIn are very crucial for our members to stay informed about their network, discover professionally relevant content, conversations and courses, as well as identify potential career opportunities. For the Notifications AI team, our mission is to use AI to notify the right members, about the right content, at the right time and frequency through the right channel (push, in app or email) to maximize member value. In this talk we will give an overview of the AI systems and models behind these decisions.

We will present the candidate generation systems as well as the final relevance layer, built on top of the Air Traffic Controller (ATC), to enable volume optimization, notification channel (badge, push or email) selection and state aware message spacing based delivery time optimization. We describe how we formulated a multi-objective optimization problem, considering multiple objectives that capture member and business impact on the entire ecosystem. This problem considers three types of utilities: whether a member visits, their engagement on the notifications, and their overall engagement on LinkedIn. We will explain the final decision function, derived from the multi-objective optimization formulation, and show that it can be applied in a streaming fashion. The final decision function is tuned online, through a hyperparameter tuning solution developed at Linkedin which allows us to fine tune tradeoffs in the multi-objective optimization approach. We will conclude with a discussion on some of the wins this has enabled, managing most of the notifications sent to our 774million+ members.

Successes and Opportunities in Enterprise AI

The advances in AI over the last decade has led to significantly better outcomes and improved experiences in consumer applications. Meanwhile, successful applications of AI in enterprises have been modest during this period. In this talk, I will provide an overview of the scope of Enterprise AI, how it is similar and different from AI for consumer applications, a few successful applications, open problems and challenges and the enormous opportunity to create better outcomes and transform the experiences for employees and customers of enterprises.

Experiments with Predictive Long Term Guardrail Metrics

Product experiments today need a long term view of impact to make shipping decisions truly effective. Here we will discuss the challenges in the traditional metrics used in experiment analysis and how long term forecast metrics enable better decisions.

Most tech companies such as Google, Amazon, Netflix etc run thousands of experiments (also known as A/B test) a year [1]. The aim is to measure the impact new features have on core Key Predictive Indicators (KPIs) before deciding to launch it to production.

Traditional A/B testing metrics will usually measure the impact of the feature on core KPIs in the short-term. However, for many lines of business (such as loyalty and memberships), this is not enough, as we want to understand the impact of the features in the mid/long term. This reality can force companies to run experiments to 6+ months duration, or use a correlated leading metric (such as user activity, engagement level) with estimated impact in the long term. Both these situations are not ideal, the first slows down the rate of innovation while the second does not account for multiple factors that define the future results.

At Lyft, this reality is shared, and one that becomes a challenge for innovation as we need to know the long term impact before we decide to ship new features. As a solution we design forecasted metrics for retention and revenue at a user level that can be used to measure the impact of experiments in the long term. In this talk we will discuss challenges and learnings from this approach, when applied in practice.

Challenges in Data Production for AI with Human-in-the-Loop

Today, successful Artificial Intelligence applications rely on three pillars: machine learning algorithms, hardware for running them, and data for training and evaluating models. Although algorithms and hardware have already become commodities, obtaining up-to-date and high-quality data at scale is still challenging-but possible by building hybrid human-computer pipelines called human-in-the-loop. This talk will show how to make a significant business impact using human-in-the-loop pipelines that combine machine learning with crowdsourcing. We will share the experience of one of the world's largest search engines, Yandex.

After a brief introduction to human-in-the-loop, we will describe two insightful case studies with a significant business impact at Yandex. First, we will show how to use human-in-the-loop with subjective human opinions to gather training data for learning-to-rank models in the online setting, crucial for the recommendation, e-commerce, and search applications. Second, we will show how human-in-the-loop combined with spatial crowdsourcing enables keeping information on brick-and-mortar businesses up-to-date and transformed into structured data, essential for social impactful applications like online maps and directories.

Then, we will present the practical challenges of deploying human-in-the-loop pipelines, focusing on common issues with task design and quality control. We will demonstrate the end-to-end task design techniques that better fit for open-ended and subjective questions compared to widely-used classification tasks. We will present our recent advances in this field, including the use of large-scale language models (like BART and T5) for sequence aggregation. Also, we will show the new evaluation datasets for textual and subjective annotation, which are publicly available at We will discuss the problem of reliable quality control in crowdsourcing by describing the relevant computational methods for aggregation, quality estimation, and model selection. Finally, we will demonstrate Crowd-Kit, an open-source library that offers battle-tested and platform-agnostic implementations of all the above-described methods in Python:

Overall, we will share our experience in running impactful human-in-the-loop pipelines in production while overcoming the common practical challenges using the available and reliable open-source technologies, datasets, and tools.

Rethink e-Commerce Search

The quality of the search experience on an e-commerce site plays a critical role in customer conversion and the growth of the e-commerce business. In this talk, I will discuss the current status and challenges of product search. In particular, I will highlight the significant amount of effort it takes to create a high-quality product search engine using classical information retrieval methods. Then, I will discuss how recent advances in NLP and deep learning, especially the advent of large pre-trained language models, may change the status quo. While embedding-based retrieval has the potential to improve classical information retrieval methods, creating a machine learning-based, end-to-end system for general-purpose, web search is still extremely difficult. Nevertheless, I will argue that product search for e-commerce may prove to be an area where deep learning can create the first disruption to classical information retrieval systems.

The Incentives Platform at Lyft

The Incentives Platform team at Lyft has developed a platform for applying new methodologies at the intersection of causal inference, machine learning, and reinforcement learning to problems at scale. We utilize heterogeneous treatment effect algorithms to predict how different users (riders, drivers) will respond to a specific treatment (coupon, incentive, message, etc.). We then can apply various optimization algorithms to choose which users get which treatment while using bandit methodologies to balance an explore/exploit trade-off. This platform dramatically increases the degree to which we can customize the user experience and hit business goals while reducing the operational load of doing so.

This platform lets us understand how our users differ; letting us optimally target users based on individual treatment effect predictions; and evaluate the results of these predictions. The platform is built in a flexible way to allow us to plug-and-play with different algorithms, which lets us compare performance and develop improvements. We have integrated Off-Policy Evaluation into the platform allowing us to make unbiased (backtesting) evaluations of causal effects, without needing to run an AB test.

While the scale of our data and complexity of these algorithms requires substantial engineering infrastructure, we have built the platform in a modular way that allows for separation between the science and engineering code. This makes it easy for data scientists to iterate on these models without worrying (as much) about infrastructure or distributed systems.

Graph Neural Networks for the Global Economy with Microsoft DeepGraph

Graph Neural Networks (GNNs) are AI models that learn embeddings for the nodes in a graph and use the embeddings to perform prediction tasks. In this talk, we present how we developed GNNs for the LinkedIn economic graph. LinkedIn economic graph is a digital representation of the global economy with 1B nodes and 200B edges, consisting of social graphs about members' connections, activity graphs between members and other economic entities, and knowledge graphs about members', companies', job postings' attributes. By applying GNN to this graph, we can utilize the full potential of the economic graph in many search and recommendation products across LinkedIn.

The biggest challenge was to scale up GNNs to a massive scale of billion nodes and edges. To address this challenge, we developed Microsoft DeepGraph, an open source library for large scale GNN development. DeepGraph allows for training GNNs on large graphs by serving the graph in a distributed fashion with graph engine servers. In this talk, we will highlight the strengths of DeepGraph, such as support for both PyTorch and TensorFlow, and integration with Azure ML and Azure Kubernetes Service.

We will share lessons and findings from developing GNNs for various applications around the LinkedIn economic graph. We will explain how we combine graphs such as social graph, activity graph, knowledge graphs into one gigantic heterogeneous graph, and what algorithms we employed for this heterogenous graph. We will present a few case studies, such as how we identify job postings with vague titles and replace them with more specific titles using GNNs.

The Rise of Data Observability: Architecting the Future of Data Trust

As companies become increasingly data driven, the technologies underlying these rich insights have grown more and more nuanced and complex. While our ability to collect, store, aggregate, and visualize this data has largely kept up with the needs of modern data teams (think: domain-oriented data meshes, cloud warehouses, data visualization tools, and data modeling solutions), the mechanics behind data quality and integrity has lagged. To keep pace with data's clock speed of innovation, data engineers need to invest not only in the latest modeling and analytics tools, but also technologies that can increase data accuracy and prevent broken pipelines. The solution? Data observability, the next frontier of data engineering. I'll discuss why data observability matters to building a better data quality strategy and tactics best-in-class organizations use to address it -- including org structure, culture, and technology.