CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Anomaly Mining: Past, Present and Future

Leman Akoglu

Anomaly mining finds high-stakes applications in various real-world domains such as cybersecurity, finance, environmental monitoring, to name a few. Therefore, it has been studied widely and a large body of detection techniques exists [1]. Today, many real-world settings necessitate detection at speed for streaming/evolving data, and/or detection at scale for massive datasets stored in a distributed environment [2]. Despite the plethora of detection algorithms, selecting an algorithm to use on a new task as well as setting the values for its hyperparameter(s), known as the model selection problem, is an open challenge for unsupervised anomaly detection. This issue is only to be exacerbated with the recent advent of detectors based on deep neural networks that exhibit a long list of hyperparameters. The challenge stems from two main factors: the lack of labeled data and the lack of a widely accepted anomaly loss function. Toward automation, one can explore internal evaluation strategies [3], or capitalize on the experience from historical detection tasks through meta-learning [4]. However, the problem remains far from solved. In deployment, many real-world use cases of anomaly detection require the flagged anomalies from a detector to be screened or audited by a human expert, typically for vetting purposes, where taking automatic actions can be costly (e.g. directly charging a flagged medical provider with fraud). While a vast majority of the literature focuses on novel detection algorithms, as humans are often involved with(in) the process, anomaly mining also concerns various human-centric problems that are beyond mere detection, namely explanation [5, 6], human interaction [7], and fairness [8]. These aspects of the field are under-studied and pose many open challenges.

The Primacy of Data in Deep Learning NLP for Conversational AI

Mark Johnson

Computational Linguistics and Natural Language Processing have changed considerably in the past few decades. Early research focused on representing and using linguistic knowledge in computational processes such as parsers, while these days the field focuses on practically-useful tasks such as information retrieval and chatbots. Currently our Deep Learning models have little to do with linguistic theory

For example, the Oracle Digital Assistant is built on top of generic "Foundation" Deep Learning models. An intermediate Focusing step adapts these models to specific enterprise domains. Transfer Learning is used to refocus these models onto specific customer-oriented tasks such as Intent Classification, Named Entity Recognition, as well as more advanced models such as text-to-SQL sequence-to-sequence models. These technologies have revolutionised the application of NLP to practical problems with commercial relevance, enabling us to build better systems faster and cheaper than ever before.

Linguistic insights aren't gone from the field, however; they play a critical role in data manufacturing and evaluation. This talk explain how we use hundreds of different evaluations to understand the strengths and weaknesses of our models in the Oracle Digital Assistant, and how we automatically use this in hyper-parameter tuning. It also describes areas where additional research is still required before we can claim that NLP has become an engineering field.

Towards Reliable and Practicable Algorithmic Recourse

Himabindu Lakkaraju

As predictive models are increasingly being deployed in high-stakes decision making (e.g., loan approvals), there has been growing interest in developing post hoc techniques which provide recourse to individuals who have been adversely impacted by predicted outcomes. For example, when an individual is denied loan by a predictive model deployed by a bank, they should be informed about reasons for this decision and what can be done to reverse it. While several approaches have been proposed to tackle the problem of generating recourses, these techniques rely heavily on various restrictive assumptions. For instance, these techniques generate recourses under the assumption that the underlying predictive models do not change. In practice, however, models are often updated for a variety of reasons including data distribution shifts. There is little to no research that systematically investigates and addresses these limitations.

In this talk, I will discuss some of our recent work that sheds light on and addresses the aforementioned challenges, thereby paving the way for making algorithmic recourse practicable and reliable. First, I will present theoretical and empirical results which demonstrate that the recourses generated by state-of-the-art approaches are often invalidated due to model updates. Next, I will introduce a novel algorithmic framework based on adversarial training to generate recourses that remain valid even if the underlying models are updated. I will conclude the talk by presenting theoretical and empirical evidence for the efficacy of our solutions, and also discussing other open problems in the burgeoning field of algorithmic recourse.

SESSION: Full Paper Track

Model-agnostic vs. Model-intrinsic Interpretability for Explainable Product Search

Qingyao Ai
Lakshmi Narayanan.R

Product retrieval systems have served as the main entry for customers to discover and purchase products online. With increasing concerns on the transparency and accountability of AI systems, studies on explainable information retrieval has received more and more attention in the research community. Interestingly, in the domain of e-commerce, despite the extensive studies on explainable product recommendation, the studies of explainable product search is still in an early stage. In this paper, we study how to construct effective explainable product search by comparing model-agnostic explanation paradigms with model-intrinsic paradigms and analyzing the important factors that determine the performance of product search explanations. We propose an explainable product search model with model-intrinsic interpretability and conduct crowdsourcing to compare it with the state-of-the-art explainable product search model with model-agnostic interpretability. We observe that both paradigms have their own advantages and the effectiveness of search explanations on different properties are affected by different factors. For example, explanation fidelity is more important for user's overall satisfaction on the system while explanation novelty may be more useful in attracting user purchases. These findings could have important implications for the future studies and design of explainable product search engines.

Analysing Mixed Initiatives and Search Strategies during Conversational Search

Mohammad Aliannejadi
Leif Azzopardi
Hamed Zamani
Evangelos Kanoulas
Paul Thomas
Nick Craswell

Information seeking conversations between users and Conversational Search Agents (CSAs) consist of multiple turns of interaction. While users initiate a search session, ideally a CSA should sometimes take the lead in the conversation by obtaining feedback from the user by offering query suggestions or asking for query clarifications i.e. mixed initiative. This creates the potential for more engaging conversational searches, but substantially increases the complexity of modelling and evaluating such scenarios due to the large interaction space coupled with the trade-offs between the costs and benefits of the different interactions. In this paper, we present a model for conversational search -- from which we instantiate different observed conversational search strategies, where the agent elicits: (i) Feedback-First, or (ii) Feedback-After. Using 49 TREC WebTrack Topics, we performed an analysis comparing how well these different strategies combine with different mixed initiative approaches: (i) Query Suggestions vs. (ii) Query Clarifications. Our analysis reveals that there is no superior or dominant combination, instead it shows that query clarifications are better when asked first, while query suggestions are better when asked after presenting results. We also show that the best strategy and approach depends on the trade-offs between the relative costs between querying and giving feedback, the performance of the initial query, the number of assessments per query, and the total amount of gain required. While this work highlights the complexities and challenges involved in analyzing CSAs, it provides the foundations for evaluating conversational strategies and conversational search agents in batch/offline settings.

Automated Selection of Multiple Datasets for Extension by Integration

Yael Amsterdamer
Moran Cohen

Organizations often seek to extend their data by integration with available datasets originating from external sources. While there are many tools that recommend how to perform the integration for given datasets, the selection of what datasets to integrate is often challenging in itself. First, the relevant candidates must be efficiently identified among irrelevant ones. Next, relevant datasets need to be evaluated according to issues such as low quality or poor matching to the target data and schema. Last, jointly integrating multiple datasets may have significant benefits such as increasing completeness and information gain, but may also greatly complicate the task due to dependencies in the integration process.

To assist administrators in this task, we quantify to what extent an integration of multiple datasets is valuable as an extension of an initial dataset and formalize the computational problem of finding the most valuable subset to integrate by this measure. We formally analyze the problem, showing that it is NP-hard; we nevertheless introduce heuristic efficient algorithms, which our experiments show to be near-optimal in practice and highly effective in finding the most valuable integration.

Skyline in Crowdsourcing with Imprecise Comparisons

Aris Anagnostopoulos
Adriano Fazzone
Giacomo Vettraino

Given an input of a set of objects each one represented as a vector of features in a feature space, the problem of finding the skyline is the problem of determining the subset of objects that are not dominated by any other input object. An example of an application is to find the best hotel(s) with respect to some features (location, price, cleanliness, etc.)

The use of the crowd for solving this problem is useful when a score of items according to their features is not available. Yet the crowd can give inconsistent answers. In this paper we study the computation of the skyline when the comparisons between objects are performed by humans. We model the problem using the threshold model [Ajtai et al, TALG 2015] in which the comparison of two objects may create errors/inconsistencies if the objects are close to each other. We provide algorithms for the problem and we analyze the required number of human comparisons and lower bounds. We also evaluate the effectiveness and efficiency of our algorithms using synthetic and real-world data.

Random Sampling Plus Fake Data: Multidimensional Frequency Estimates With Local Differential Privacy

Héber H. Arcolezi
Jean-François Couchot
Bechara Al Bouna
Xiaokui Xiao

With local differential privacy (LDP), users can privatize their data and thus guarantee privacy properties before transmitting it to the server (a.k.a. the aggregator). One primary objective of LDP is frequency (or histogram) estimation, in which the aggregator estimates the number of users for each possible value. In practice, when a study with rich content on a population is desired, the interest is in the multiple attributes of the population, that is to say, in multidimensional data (d ≥ 2). However, contrary to the problem of frequency estimation of a single attribute (the majority of the works), the multidimensional aspect imposes to pay particular attention to the privacy budget. This one can indeed grow extremely quickly due to the composition theorem. To the authors' knowledge, two solutions seem to stand out for this task: 1) splitting the privacy budget for each attribute, i.e., send each value with ε d ≥-LDP (Spl), and 2) random sampling a single attribute and spend all the privacy budget to send it with ε-LDP (Smp). AlthoughSmp adds additional sampling error, it has proven to provide higher data utility than the formerSpl solution. However, we argue that aggregators (who are also seen as attackers) are aware of the sampled attribute and its LDP value, which is protected by a "less strict" eε probability bound (rather than e^ε/d ). This way, we propose a solution named Random S ampling plus Fake Data (RS+FD), which allows creatinguncertainty over the sampled attribute by generating fake data for each non-sampled attribute; RS+FD further benefits from amplification by sampling. We theoretically and experimentally validate our proposed solution on both synthetic and real-world datasets to show that RS+FD achieves nearly the same or better utility than the state-of-the-artSmp solution.

Non-Visual Accessibility Assessment of Videos

Ali Selman Aydin
Yu-Jung Ko
Utku Uckun
IV Ramakrishnan
Vikas Ashok

Video accessibility is crucial for blind screen-reader users as online videos are increasingly playing an essential role in education, employment, and entertainment. While there exist quite a few techniques and guidelines that focus on creating accessible videos, there is a dearth of research that attempts to characterize the accessibility of existing videos. Therefore in this paper, we define and investigate a diverse set of video and audio-based accessibility features in an effort to characterize accessible and inaccessible videos. As a ground truth for our investigation, we built a custom dataset of 600 videos, in which each video was assigned an accessibility score based on the number of its wins in a Swiss-system tournament, where human annotators performed pairwise accessibility comparisons of videos. In contrast to existing accessibility research where the assessments are typically done by blind users, we recruited sighted users for our effort, since videos comprise a special case where sight could be required to better judge if any particular scene in a video is presently accessible or not. Subsequently, by examining the extent of association between the accessibility features and the accessibility scores, we could determine the features that significantly (positively or negatively) impact video accessibility and therefore serve as good indicators for assessing the accessibility of videos. Using the custom dataset, we also trained machine learning models that leveraged our handcrafted features to either classify an arbitrary video as accessible/inaccessible or predict an accessibility score for the video. Evaluation of our models yielded an F1 score of 0.675 for binary classification and a mean absolute error of 0.53 for score prediction, thereby demonstrating their potential in video accessibility assessment while also illuminating their current limitations and the need for further research in this area.

GAM: Explainable Visual Similarity and Classification via Gradient Activation Maps

Oren Barkan
Omri Armstrong
Amir Hertz
Avi Caciularu
Ori Katz
Itzik Malkiel
Noam Koenigstein

We present Gradient Activation Maps (GAM) - a machinery for explaining predictions made by visual similarity and classification models. By gleaning localized gradient and activation information from multiple network layers, GAM offers improved visual explanations, when compared to existing alternatives. The algorithmic advantages of GAM are explained in detail, and validated empirically, where it is shown that GAM outperforms its alternatives across various tasks and datasets.

Representation Learning via Variational Bayesian Networks

Oren Barkan
Avi Caciularu
Idan Rejwan
Ori Katz
Jonathan Weill
Itzik Malkiel
Noam Koenigstein

We present Variational Bayesian Network (VBN) - a novel Bayesian entity representation learning model that utilizes hierarchical and relational side information and is particularly useful for modeling entities in the "long-tail'', where the data is scarce. VBN provides better modeling for long-tail entities via two complementary mechanisms: First, VBN employs informative hierarchical priors that enable information propagation between entities sharing common ancestors. Additionally, VBN models explicit relations between entities that enforce complementary structure and consistency, guiding the learned representations towards a more meaningful arrangement in space. Second, VBN represents entities by densities (rather than vectors), hence modeling uncertainty that plays a complementary role in coping with data scarcity. Finally, we propose a scalable Variational Bayes optimization algorithm that enables fast approximate Bayesian inference. We evaluate the effectiveness of VBN on linguistic, recommendations, and medical inference tasks. Our findings show that VBN outperforms other existing methods across multiple datasets, and especially in the long-tail.

HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Anson Bastos
Kuldeep Singh
Abhishek Nadgeri
Saeedeh Shekarpour
Isaiah Onando Mulang
Johannes Hoffart

Recently, several Knowledge Graph Embedding (KGE) approaches have been devised to represent entities and relations in a dense vector space and employed in downstream tasks such as link prediction. A few KGE techniques address interpretability, i.e., mapping the connectivity patterns of the relations (symmetric/asymmetric, inverse, and composition) to a geometric interpretation such as rotation. Other approaches model the representations in higher dimensional space such as four-dimensional space (4D) to enhance the ability to infer the connectivity patterns (i.e., expressiveness). However, modeling relation and entity in a 4D space often comes at the cost of interpretability. We propose HopfE, a novel KGE approach aiming to achieve the interpretability of inferred relations in the four-dimensional space. HopfE models the structural embeddings in 3D Euclidean space. Next, we map the entity embedding vector from a 3D Euclidean space to a 4D hypersphere using the inverse Hopf Fibration, in which we embed the semantic information from the KG ontology. Thus, HopfE considers the structural and semantic properties of the entities without losing expressivity and interpretability. Our empirical results on four well-known benchmarks achieve state-of-the-art performance for KG completion.

Influence Maximization With Co-Existing Seeds

Ruben Becker
Gianlorenzo D'Angelo
Hugo Gilbert

In the classical influence maximization problem we aim to select a set of nodes, called seeds, to start an efficient information diffusion process. More precisely, the goal is to select seeds such that the expected number of nodes reached by the diffusion process is maximized. In this work we study a variant of this problem where an unknown (up to a probability distribution) set of nodes, referred to as co-existing seeds, joins in starting the diffusion process even if not selected. This setting allows to model that, in certain situations, some nodes are willing to act as "voluntary seeds'' even if not chosen by the campaign organizer. This may for example be due to the positive nature of the information campaign (e.g., public health awareness programs, HIV prevention, financial aid programs), or due to external social driving effects (e.g., nodes are friends of selected seeds in real life or in other social media).

In this setting, we study two types of optimization problems. While the first one aims to maximize the expected number of reached nodes, the second one endeavors to maximize the expected increment in the number of reached nodes in comparison to a non-intervention strategy. The problems (particularly the second one) are motivated by cooperative game theory. For various probability distributions on co-existing seeds, we obtain several algorithms with approximation guarantees as well as hardness and hardness of approximation results. We conclude with experiments that demonstrate the usefulness of our approach when co-existing seeds exist.

Cross-Market Product Recommendation

Hamed Bonab
Mohammad Aliannejadi
Ali Vardasbi
Evangelos Kanoulas
James Allan

We study the problem of recommending relevant products to users in relatively resource-scarce markets by leveraging data from similar, richer in resource auxiliary markets. We hypothesize that data from one market can be used to improve performance in another. Only a few studies have been conducted in this area, partly due to the lack of publicly available experimental data. To this end, we collect and release XMarket, a large dataset covering 18 local markets on 16 different product categories, featuring 52.5 million user-item interactions.

We introduce and formalize the problem of cross-market product recommendation, i.e., market adaptation. We explore different market-adaptation techniques inspired by state-of-the-art domain-adaptation and meta-learning approaches and propose a novel neural approach for market adaptation, named FOREC. Our model follows a three-step procedure - pre-training, forking, and fine-tuning - in order to fully utilize the data from an auxiliary market as well as the target market. We conduct extensive experiments studying the impact of market adaptation on different pairs of markets. Our proposed approach demonstrates robust effectiveness, consistently improving the performance on target markets compared to competitive baselines selected for our analysis. In particular, FOREC improves on average 24% and up to 50% in terms of nDCG@10, compared to the NMF baseline. Our analysis and experiments suggest specific future directions in this research area. We release our data and code for academic purposes.

ASTERYX: A model-Agnostic SaT-basEd appRoach for sYmbolic and score-based eXplanations

Ryma Boumazouza
Fahima Cheikh-Alili
Bertrand Mazure
Karim Tabia

The ever increasing complexity of machine learning techniques used more and more in practice, gives rise to the need to explain the outcomes of these models, often used as black-boxes. Explainable AI approaches are either numerical feature-based aiming to quantify the contribution of each feature in a prediction or symbolic providing certain forms of symbolic explanations such ascounterfactuals. This paper proposes a generic agnostic approach named ASTERYX allowing to generate both symbolic explanations and score-based ones. Our approach is declarative and it is based on the encoding of the model to be explained in an equivalent symbolic representation. This latter serves to generate in particular two types of symbolic explanations which aresufficient reasons andcounterfactuals. We then associate scores reflecting the relevance of the explanations and the features w.r.t to some properties. Our experimental results show the feasibility of the proposed approach and its effectiveness in providing symbolic and score-based explanations.

Certification and Trade-off of Multiple Fairness Criteria in Graph-based Spam Detection

Kai Burkholder
Kenny Kwock
Yuesheng Xu
Jiaxin Liu
Chao Chen
Sihong Xie

Spamming reviews are prevalent in review systems to manipulate seller reputation and mislead customers. patterns to achieve state-of-the-art detection accuracy. The detection can influence a large number of real-world entities and it is ethical to treat different groups of entities as equally as possible. However, due to skewed distributions of the graphs, GNN can fail to meet diverse fairness criteria designed for different parties. We formulate linear systems of the input features and the adjacency matrix of the review graphs for the certification of multiple fairness criteria. When the criteria are competing, we relax the certification and design a multi-objective optimization (MOO) algorithm to explore multiple efficient trade-offs, so that no objective can be improved without harming another objective. We prove that the algorithm converges to a Pareto efficient solution using duality and the implicit function theorem. Since there can be exponentially many trade-offs of the criteria, we propose a data-driven stochastic search algorithm to approximate Pareto fronts consisting of multiple efficient trade-offs. Experimentally, we show that the algorithms converge to solutions that dominate baselines based on fairness regularization and adversarial training.

ScarceGAN: Discriminative Classification Framework for Rare Class Identification for Longitudinal Data with Weak Prior

Surajit Chakrabarty
Rukma Talwadker
Tridib Mukherjee

This paper introduces ScarceGAN which focuses on identification of extremely rare or scarce samples from multi-dimensional longitudinal telemetry data with small and weak label prior. We specifically address: (i) severe scarcity in positive class, stemming from both underlying organic skew in the data, as well as extremely limited labels; (ii) multi-class nature of the negative samples, with uneven density distributions and partially overlapping feature distributions; and (iii) massively unlabelled data leading to tiny and weak prior on both positive and negative classes, and possibility of unseen or unknown behavior in the unlabelled set, especially in the negative class. Although related to PU learning problems, we contend that knowledge (or lack of it) on the negative class can be leveraged to learn the compliment of it (i.e., the positive class) better in a semi-supervised manner. To this effect, ScarceGAN re-formulates semi-supervised GAN by accommodating weakly labelled multi- class negative samples and the available positive samples. It relaxes the supervised discriminator's constraint on exact differentiation be- tween negative samples by introducing a 'leeway' term for samples with noisy prior. We propose modifications to the cost objectives of discriminator, in supervised and unsupervised path as well as that of the generator. For identifying risky players in skill gaming, this formulation in whole gives us a recall of over 85% (~60% jump over vanilla semi-supervised GAN) on our scarce class with very minimal verbosity in the unknown space. Further ScarceGAN out- performs the recall benchmarks established by recent GAN based specialized models for the positive imbalanced class identification and establishes a new benchmark in identifying one of rare attack classes (0.09%) in the intrusion dataset from the KDDCUP99 challenge. We establish ScarceGAN to be one of new competitive benchmark frameworks in the rare class identification for longitudinal telemetry data.

Geometric Heuristics for Transfer Learning in Decision Trees

Siddhesh Chaubal
Mateusz Rzepecki
Patrick K. Nicholson
Guangyuan Piao
Alessandra Sala

Motivated by a network fault detection problem, we study how recall can be boosted in a decision tree classifier, without sacrificing too much precision. This problem is relevant and novel in the context of transfer learning(TL), in which few target domain training samples are available. We define a geometric optimization problem for boosting the recall of a decision tree classifier, and show it is NP-hard. To solve it efficiently, we propose several near-linear time heuristics, and experimentally validate these heuristics in the context of TL. Our evaluation includes 7 public datasets, as well as 6 network fault datasets, and we compare our heuristics with several existing TL algorithms, as well as exact mixed integer linear programming(MILP) solutions to our optimization problem. We find that our heuristics boost recall in a manner similar to optimal MILP solutions, yet require several orders of magnitude less compute time. In many cases the F1 score of our approach is competitive, and often better, than other TL algorithms. Moreover, our approach can be used as a building block to apply transfer learning to more powerful ensemble methods, such as random forests.

LiteGT: Efficient and Lightweight Graph Transformers

Cong Chen
Chaofan Tao
Ngai Wong

Transformers have shown great potential for modeling long-term dependencies for natural language processing and computer vision. However, little study has applied transformers to graphs, which is challenging due to the poor scalability of the attention mechanism and the under-exploration of graph inductive bias. To bridge this gap, we propose a Lite Graph Transformer (LiteGT) that learns on arbitrary graphs efficiently. First, a node sampling strategy is proposed to sparsify the considered nodes in self-attention with only O (Nlog N) time. Second, we devise two kernelization approaches to form two-branch attention blocks, which not only leverage graph-specific topology information, but also reduce computation further to O (1 over 2 Nlog N). Third, the nodes are updated with different attention schemes during training, thus largely mitigating over-smoothing problems when the model layers deepen. Extensive experiments demonstrate that LiteGT achieves competitive performance on both node classification and link prediction on datasets with millions of nodes. Specifically, Jaccard + Sampling + Dim. reducing setting reduces more than 100x computation and halves the model size without performance degradation.

Incorporating Query Reformulating Behavior into Web Search Evaluation

Jia Chen
Yiqun Liu
Jiaxin Mao
Fan Zhang
Tetsuya Sakai
Weizhi Ma
Min Zhang
Shaoping Ma

While batch evaluation plays a central part in Information Retrieval (IR) research, most evaluation metrics are based on user models which mainly focus on browsing and clicking behaviors. As users' perceived satisfaction may also be impacted by their search intent, constructing different user models across various search intent may help design better evaluation metrics. However, user intents are usually unobservable in practice. As query reformulating behaviors may reflect their search intents to a certain extent and highly correlate with users' perceived satisfaction for a specific query, these observable factors may be beneficial for the design of evaluation metrics. How to incorporate the search intent behind query reformulation into user behavior and satisfaction models remains under-investigated. To investigate the relationships among query reformulations, search intent, and user satisfaction, we explore a publicly available web search dataset and find that query reformulations can be a good proxy for inferring user intent, and therefore, reformulating actions may be beneficial for designing better web search effectiveness metrics. A group of Reformulation-Aware Metrics (RAMs) is then proposed to improve existing click model-based metrics. Experimental results on two public session datasets have shown that RAMs have significantly higher correlations with user satisfaction than existing evaluation metrics. In the robustness test, we have found that RAMs can achieve good performance when only a small proportion of satisfaction training labels are available. We further show that RAMs can be directly applied in a new dataset for offline evaluation once trained. This work shows the possibility of designing better evaluation metrics by incorporating fine-grained search context factors.

FedMatch: Federated Learning Over Heterogeneous Question Answering Data

Jiangui Chen
Ruqing Zhang
Jiafeng Guo
Yixing Fan
Xueqi Cheng

Question Answering (QA), a popular and promising technique for intelligent information access, faces a dilemma about data as most other AI techniques. On one hand, modern QA methods rely on deep learning models which are typically data-hungry. Therefore, it is expected to collect and fuse all the available QA datasets together in a common site for developing a powerful QA model. On the other hand, real-world QA datasets are typically distributed in the form of isolated islands belonging to different parties. Due to the increasing awareness of privacy security, it is almost impossible to integrate the data scattered around, or the cost is prohibited. A possible solution to this dilemma is a new approach known as federated learning, which is a privacy-preserving machine learning technique over distributed datasets. In this work, we propose to adopt federated learning for QA with the special concern on the statistical heterogeneity of the QA data. Here the heterogeneity refers to the fact that annotated QA data are typically with non-identical and independent distribution (non-IID) and unbalanced sizes in practice. Traditional federated learning methods may sacrifice the accuracy of individual models under the heterogeneous situation. To tackle this problem, we propose a novel Federated Matching framework for QA, named FedMatch, with a backbone-patch architecture. The shared backbone is to distill the common knowledge of all the participants while the private patch is a compact and efficient module to retain the domain information for each participant. To facilitate the evaluation, we build a benchmark collection based on several QA datasets from different domains to simulate the heterogeneous situation in practice. Empirical studies demonstrate that our model can achieve significant improvements against the baselines over all the datasets.

HetMAML: Task-Heterogeneous Model-Agnostic Meta-Learning for Few-Shot Learning Across Modalities

Jiayi Chen
Aidong Zhang

Most of existing gradient-based meta-learning approaches to few-shot learning assume that all tasks have the same input feature space. However, in the real world scenarios, there are many cases that the input structures of tasks can be different, that is, different tasks may vary in the number of input modalities or data types. Existing meta-learners cannot handle the heterogeneous task distribution (HTD) as there is not only global meta-knowledge shared across tasks but also type-specific knowledge that distinguishes each type of tasks. To deal with task heterogeneity and promote fast within-task adaptions for each type of tasks, in this paper, we propose HetMAML, a task-heterogeneous model-agnostic meta-learning framework, which can capture both the type-specific and globally shared knowledge and can achieve the balance between knowledge customization and generalization. Specifically, we design a multi-channel backbone module that encodes the input of each type of tasks into the same length sequence of modality-specific embeddings. Then, we propose a task-aware iterative feature aggregation network which can automatically take into account the context of task-specific input structures and adaptively project the heterogeneous input spaces to the same lower-dimensional embedding space of concepts. Our experiments on six task-heterogeneous datasets demonstrate that HetMAML successfully leverages type-specific and globally shared meta-parameters for heterogeneous tasks and achieves fast within-task adaptions for each type of tasks.

Generative Inverse Deep Reinforcement Learning for Online Recommendation

Xiaocong Chen
Lina Yao
Aixin Sun
Xianzhi Wang
Xiwei Xu
Liming Zhu

Deep reinforcement learning enables an agent to capture users' interest through dynamic interactions with the environment. It uses a reward function to learn user's interest and to control the learning process, attracting great interest in recommendation research. However, most reward functions are manually designed; they are either too unrealistic or imprecise to reflect the variety, dimensionality, and non-linearity of the recommendation problem. This impedes the agent from learning an optimal policy in highly dynamic online recommendation scenarios. To address the above issue, we propose a generative inverse reinforcement learning approach that avoids the need of defining an elaborative reward function. In particular, we model the recommendation problem as an automatic policy learning problem. We first generate policies based on observed users' preferences and then evaluate the learned policy by a measurement based on a discriminative actor-critic network. We conduct experiments on an online platform, VirtualTB, and demonstrate the feasibility and effectiveness of our proposed approach via comparisons with several state-of-the-art methods.

Robust Road Network Representation Learning: When Traffic Patterns Meet Traveling Semantics

Yile Chen
Xiucheng Li
Gao Cong
Zhifeng Bao
Cheng Long
Yiding Liu
Arun Kumar Chandran
Richard Ellison

In this work, we propose a robust road network representation learning framework called Toast, which comes to be a cornerstone to boost the performance of numerous demanding transport planning tasks. Specifically, we first propose a traffic context aware skip-gram module to incorporate auxiliary tasks of predicting the traffic context of a target road segment. Furthermore, we propose a trajectory-enhanced Transformer module that utilizes trajectory data to extract traveling semantics on road networks. Apart from obtaining effective road segment representations, this module also enables us to obtain the route representations. With these two modules, we can learn representations which can capture multi-faceted characteristics of road networks to be applied in both road segment based applications and trajectory based applications. Last, we design a benchmark containing four typical transport planning tasks to evaluate the usefulness of Toast and comprehensive experiments verify that Toast consistently outperforms the state-of-the-art baselines across all tasks.

DCAP: Deep Cross Attentional Product Network for User Response Prediction

Zekai Chen
Fangtian Zhong
Zhumin Chen
Xiao Zhang
Robert Pless
Xiuzhen Cheng

User response prediction, which aims to predict the probability that a user will provide a predefined positive response in a given context such as clicking on an ad or purchasing an item, is crucial to many industrial applications such as online advertising, recommender systems, and search ranking. For these tasks and many other machine learning tasks, an indispensable part of success is feature engineering, where cross features are a significant type of feature transformations. However, due to the high dimensionality and super sparsity of the data collected in these tasks, handcrafting cross features is inevitably time expensive. Prior studies in predicting user response leveraged the feature interactions by enhancing feature vectors with products of features to model second-order or high-order cross features, either explicitly or implicitly. However, these existing methods can be hindered by not learning sufficient cross features due to model architecture limitations or modeling all high-order feature interactions with equal weights. Different features should contribute differently to the prediction, and not all cross features are with the same prediction power.

This work aims to fill this gap by proposing a novel architecture Deep Cross Attentional Product Network (DCAP), which keeps cross network's benefits in modeling high-order feature interactions explicitly at the vector-wise level. By computing the inner product or outer product between attentional feature embeddings and original input embeddings as each layer's output, we can model cross features with a higher degree of order as the network's depth increases. We concatenate all the outputs from each layer, which further helps the model capture much information on cross features of different orders. Beyond that, it can differentiate the importance of different cross features in each network layer inspired by the multi-head attention mechanism and Product Neural Network (PNN), allowing practitioners to perform a more in-depth analysis of user behaviors. Additionally, our proposed model can be easily implemented and train in parallel. We conduct comprehensive experiments on three real-world datasets. The results have robustly demonstrated that our proposed model DCAP achieves superior prediction performance compared with the state-of-the-art models. Public codes are available at https://github.com/zachstarkk/DCAP.

Learning Dual Dynamic Representations on Time-Sliced User-Item Interaction Graphs for Sequential Recommendation

Zeyuan Chen
Wei Zhang
Junchi Yan
Gang Wang
Jianyong Wang

Sequential Recommendation aims to recommend items that a target user will interact with in the near future based on the historically interacted items. While modeling temporal dynamics is crucial for sequential recommendation, most of the existing studies concentrate solely on the user side while overlooking the sequential patterns existing in the counterpart, i.e., the item side. Although a few studies investigate the dynamics involved in the dual sides, the complex user-item interactions are not fully exploited from a global perspective to derive dynamic user and item representations. In this paper, we devise a novel Dynamic Representation Learning model for Sequential Recommendation (DRL-SRe). To better model the user-item interactions for characterizing the dynamics from both sides, the proposed model builds a global user-item interaction graph for each time slice and exploit time-sliced graph neural networks to learn user and item representations. Moreover, to enable the model to capture fine-grained temporal information, we propose an auxiliary temporal prediction task over consecutive time slices based on temporal point process. Comprehensive experiments on three public real-world datasets demonstrate DRL-SRe outperforms the state-of-the-art sequential recommendation models with a large margin.

An Effective Non-Autoregressive Model for Spoken Language Understanding

Lizhi Cheng
Weijia Jia
Wenmian Yang

Spoken Language Understanding (SLU), a core component of the task-oriented dialogue system, expects a shorter inference latency due to the impatience of humans. Non-autoregressive SLU models clearly increase the inference speed but suffer uncoordinated-slot problems caused by the lack of sequential dependency information among each slot chunk. To gap this shortcoming, in this paper, we propose a novel non-autoregressive SLU model named Layered-Refine Transformer, which contains a Slot Label Generation (SLG) task and a Layered Refine Mechanism (LRM). SLG is defined as generating the next slot label with the token sequence and generated slot labels. With SLG, the non-autoregressive model can efficiently obtain dependency information during training and spend no extra time in inference. LRM predicts the preliminary SLU results from Transformer's middle states and utilizes them to guide the final prediction. Experiments on two public datasets indicate that our model significantly improves SLU performance (1.5% on Overall accuracy) while substantially speed up (more than 10 times) the inference process over the state-of-the-art baseline.

LT-OCF: Learnable-Time ODE-based Collaborative Filtering

Jeongwhan Choi
Jinsung Jeon
Noseong Park

Collaborative filtering (CF) is a long-standing problem of recommender systems. Many novel methods have been proposed, ranging from classical matrix factorization to recent graph convolutional network-based approaches. After recent fierce debates, researchers started to focus on linear graph convolutional networks (GCNs) with a layer combination, which show state-of-the-art accuracy in many datasets. In this work, we extend them based on neural ordinary differential equations (NODEs), because the linear GCN concept can be interpreted as a differential equation, and present the method of Learnable-Time ODE-based Collaborative Filtering (LT-OCF). The main novelty in our method is that after redesigning linear GCNs on top of the NODE regime, i) we learn the optimal architecture rather than relying on manually designed ones, ii) we learn smooth ODE solutions that are considered suitable for CF, and iii) we test with various ODE solvers that internally build a diverse set of neural network connections. We also present a novel training method specialized to our method. In our experiments with three benchmark datasets, our method consistently outperforms existing methods in terms of various evaluation metrics. One more important discovery is that our best accuracy was achieved by dense connections.

Evaluating Relevance Judgments with Pairwise Discriminative Power

Zhumin Chu
Jiaxin Mao
Fan Zhang
Yiqun Liu
Tetsuya Sakai
Min Zhang
Shaoping Ma

Relevance judgments play an essential role in the evaluation of information retrieval systems. As many different relevance judgment settings have been proposed in recent years, an evaluation metric to compare relevance judgments in different annotation settings has become a necessity. Traditional metrics, such as ĸ, Krippendorff's α and Φ have mainly focused on the inter-assessor consistency to evaluate the quality of relevance judgments. They encounter "reliable but useless" problem when employed to compare different annotation settings (e.g. binary judgment v.s. 4-grade judgment). Meanwhile, other existing popular metrics such as discriminative power (DP) are not designed to compare relevance judgments across different annotation settings, they therefore suffer from limitations, such as the requirement of result ranking lists from different systems. Therefore, how to design an evaluation metric to compare relevance judgments under different grade settings needs further investigation. In this work, we propose a novel metric named pairwise discriminative power (PDP) to evaluate the quality of relevance judgment collections. By leveraging a small amount of document-level preference tests, PDP estimates the discriminative ability of relevance judgments on separating ranking lists with various qualities. With comprehensive experiments on both synthetic and real-world datasets, we show that PDP maintains a high degree of consistency with annotation quality in various grade settings. Compared with existing metrics (e.g., Krippendorff's α, Φ, DP, etc), it provides reliable evaluation results with affordable additional annotation efforts.

Query Definability and Its Approximations in Ontology-based Data Management

Gianluca Cima
Federico Croce
Maurizio Lenzerini

Given an input dataset (i.e., a set of tuples), query definability in Ontology-based Data Management (OBDM) amounts to finding a query over the ontology whose certain answers coincide with the tuples in the given dataset. We refer to such a query as a characterization of the dataset with respect to the OBDM system. Our first contribution is to propose approximations of perfect characterizations in terms of recall (complete characterizations) and precision (sound characterizations). A second contribution is to present a thorough complexity analysis of three computational problems, namely verification (check whether a given query is a perfect, or an approximated characterization of a given dataset), existence (check whether a perfect, or a best approximated characterization of a given dataset exists), and computation (compute a perfect, or best approximated characterization of a given dataset).

Answering POI-recommendation Questions using Tourism Reviews

Danish Contractor
Krunal Shah
Aditi Partap
Parag Singla
Mausam Mausam

We introduce the novel and challenging task of answering Points-of-interest (POI) recommendation questions, using a collection of reviews that describe candidate answer entities (POIs). We harvest a QA dataset that contains 47,124 paragraph-sized user questions from travelers seeking POI recommendations for hotels, attractions and restaurants. Each question can have thousands of candidate entities to choose from and each candidate is associated with a collection of unstructured reviews. Questions can include requirements based on physical location, budget, timings as well as other subjective considerations related to ambience, quality of service etc. Our dataset requires reasoning over a large number of candidate answer entities (over 5300 per question on average) and we find that running commonly used neural architectures for QA is prohibitively expensive. Further, commonly used retriever-ranker based methods also do not work well for our task due to the nature of review-documents. Thus, as a first attempt at addressing some of the novel challenges of reasoning-at-scale posed by our task, we present a task specific baseline model that uses a three-stage cluster-select-rerank architecture. The model first clusters text for each entity to identify exemplar sentences describing an entity. It then uses a neural information retrieval (IR) module to select a set of potential entities from the large candidate set. A reranker uses a deeper attention-based architecture to pick the best answers from the selected entities. This strategy performs better than a pure retrieval or a pure attention-based reasoning approach yielding nearly 25% relative improvement in Hits@3 over both approaches. To the best of our knowledge we are the first to present an unstructured QA-style task for POI-recommendation, using real-world tourism questions and POI-reviews.

Into the Unobservables: A Multi-range Encoder-decoder Framework for COVID-19 Prediction

Yue Cui
Chen Zhu
Guanyu Ye
Ziwei Wang
Kai Zheng

The ongoing COVID-19 pandemic has dramatically changed people's daily lives. A robust forecasting model for COVID-19 infections is essential for governments and institutions to plan timely and perform accurate interventions. Mainstream solutions for COVID-19 prediction fit reported data only by considering observed cases. However, the neglected facts that positive samples are incomplete and many facts of the novel disease are unknown may be prone to cause severe error accumulation, especially in long-term predictions. To fully understand the spreading patterns of the virus, we propose an encoder-decoder framework: (i) in the encoder we embed historical case data into multiple expose-infection ranges and learn message passing between time slices and across ranges with coarse-grained human mobility data incorporated; (ii) in the decoder, we decode the embedded features based on reported cases as well as deaths to jointly consider the effect of both observed and hidden data. We model the spreading of disease in over 60 counties of California and New York, which are two of the most metropolitan areas in the US. The proposed framework significantly outperforms state-of-the-art baselines on JHU COVID-19 dataset on both weekly prediction and daily prediction tasks. We design detailed ablation studies to verify the effectiveness of each key module and find the model not only works with the assistance of mobility data but also with purely cases and deaths, which implies its broad application scenarios.

Towards Self-Explainable Graph Neural Network

Enyan Dai
Suhang Wang

Graph Neural Networks (GNNs), which generalize the deep neural networks to graph-structured data, have achieved great success in modeling graphs. However, as an extension of deep learning for graphs, GNNs lack explainability, which largely limits their adoption in scenarios that demand the transparency of models. Though many efforts are taken to improve the explainability of deep learning, they mainly focus on i.i.d data, which cannot be directly applied to explain the predictions of GNNs because GNNs utilize both node features and graph topology to make predictions. There are only very few work on the explainability of GNNs and they focus on post-hoc explanations. Since post-hoc explanations are not directly obtained from the GNNs, they can be biased and misrepresent the true explanations. Therefore, in this paper, we study a novel problem of self-explainable GNNs which can simultaneously give predictions and explanations. We propose a new framework which can find K-nearest labeled nodes for each unlabeled node to give explainable node classification, where nearest labeled nodes are found by interpretable similarity module in terms of both node similarity and local structure similarity. Extensive experiments on real-world and synthetic datasets demonstrate the effectiveness of the proposed framework for explainable node classification.

Scaling Up Distance-generalized Core Decomposition

Qiangqiang Dai
Rong-Hua Li
Lu Qin
Guoren Wang
Weihua Yang
Zhiwei Zhang
Ye Yuan

Core decomposition is a fundamental operator in network analysis. In this paper, we study a problem of computing distance-generalized core decomposition on a network. A distance-generalized core, also termed (k, h)-core, is a maximal subgraph in which every vertex has at least k other vertices at distance no larger than h. The state-of-the-art algorithm for solving this problem is based on a peeling technique which iteratively removes the vertex (denoted by v) from the graph that has the smallest h-hop degree. The h-hop degree of a vertex v denotes the number of other vertices that are reachable from v within h hops. Such a peeling algorithm, however, needs to frequently recompute the h-hop degrees of v's neighbors after deleting v, which is typically very costly for a large h. To overcome this limitation, we propose an efficient peeling algorithm based on a novel h-hop degree updating technique. Instead of recomputing the h-hop degrees, our algorithm can dynamically maintain the h-hop degrees for all vertices via exploring a very small subgraph, after peeling a vertex. We show that such an h-hop degree updating procedure can be efficiently implemented by an elegant bitmap technique. In addition, we also propose a sampling-based algorithm and a parallelization technique to further improve the efficiency. Finally, we conduct extensive experiments on 12 real-world graphs to evaluate our algorithms. The results show that, when h≥3, our exact and sampling-based algorithms can achieve up to 10x and 100x speedup over the state-of-the-art algorithm, respectively.

FiShNet: Fine-Grained Filter Sharing for Resource-Efficient Multi-Task Learning

Xin Dai
Xiangnan Kong
Tian Guo
Xinlu He

Multi-task learning has attracted much attention in recent years, where the goal is to learn multiple tasks by exploiting the similarities and differences between the tasks. Previous researches on multi-task learning mainly focus on flexible methods for feature sharing (e.g., soft sharing) under resource-sufficient settings (e.g., on GPU servers). However, in many real-world applications, we often need to deploy multi-task learning models on resource-constrained platforms (e.g., mobile devices). The high resource requirement of soft-sharing methods can make them hard to deploy on mobile devices. In this paper, we study the problem of Resource-efficient Multi-Task Learning (MTL), where the goal is to design a resource-friendly model that suits resource-constrained inference environment, e.g., security camera or mobile devices. We formulate the Resource-efficient MTL problem as a fine-grained filter sharing problem, i.e., learning how to share filters at any given convolutional layers among multiple tasks. We proposed a novel solution for parameter sharing, called FiShNet. Different from soft-sharing approaches, where the computational cost per task is growing w.r.t. the number of other tasks, FiShNet can achieve high accuracy comparable to soft-sharing approaches, while only consuming a constant computational cost per task. Different from hard-sharing approaches, where the parameter sharing structures are hand-picked, FiShNet can learn how to share parameters directly on the training data with finer-grained sharing. We evaluate FiShNet on a number of problem settings and datasets for multi-task learning. We show that FiShNet achieves high accuracy when compared with state-of-the-art methods in multi-task learning, while only requiring a fraction of the computational resource.

Mitigating Negative Influence Diffusion is Hard

Gianlorenzo D'Angelo
Mohammad Abouei Mehrizi

The way how the influence of a set of users is diffused in a social network has been widely studied in the last decades. Most of the work focused on maximizing the spread of influence or the diffusion of information (e.g., a viral marketing message) starting from a set of initial nodes called seeds. Unfortunately, malicious users can use these algorithms to spread negative messages, consisting of racist or hateful contents, misinformation, or fake news. We consider a scenario in which a malicious entity, the attacker, spreads a negative message and another entity, the defender, tries to mitigate the effects of the negative message by spreading another message that invalidates the former with some evidence that its content is wrong. The attacker has the advantage of playing first, knowing that the defender will play afterward, while the defender has the advantage of observing the attacker's spread. We define two optimization problems: the attacker, who is aware of the defender and her budget, selects a set of seeds to maximize the number of influenced nodes; when the attacker's diffusion process is finished, the defender selects her own seeds with the aim of minimizing the number of nodes that remain influenced by the attacker.

Understanding Event Predictions via Contextualized Multilevel Feature Learning

Songgaojun Deng
Huzefa Rangwala
Yue Ning

Deep learning models have been studied to forecast human events using vast volumes of data, yet they still cannot be trusted in certain applications such as healthcare and disaster assistance due to the lack of interpretability. Providing explanations for event predictions not only helps practitioners understand the underlying mechanism of prediction behavior but also enhances the robustness of event analysis. Improving the transparency of event prediction models is challenging given the following factors: (i) multilevel features exist in event data which creates a challenge to cross-utilize different levels of data; (ii) features across different levels and time steps are heterogeneous and dependent; and (iii) static model-level interpretations cannot be easily adapted to event forecasting given the dynamic and temporal characteristics of the data. Recent interpretation methods have proven their capabilities in tasks that deal with graph-structured or relational data. In this paper, we present a Contextualized Multilevel Feature learning framework, CMF, for interpretable temporal event prediction. It consists of a predictor for forecasting events of interest and an explanation module for interpreting model predictions. We design a new context-based feature fusion method to integrate multiple levels of heterogeneous features. We also introduce a temporal explanation module to determine sequences of text and subgraphs that have crucial roles in a prediction. We conduct extensive experiments on several real-world datasets of political and epidemic events. We demonstrate that the proposed method is competitive compared with the state-of-the-art models while possessing favorable interpretation capabilities.

Deep Adversarial Network Alignment

Tyler Derr
Hamid Karimi
Xiaorui Liu
Jiejun Xu
Jiliang Tang

Network alignment, in general, seeks to discover the hidden underlying correspondence between nodes across two (or more) networks when given their network structure. However, most existing network alignment methods have added assumptions of additional constraints to guide the alignment, such as having a set of seed node-node correspondences across the networks or the existence of side-information. Instead, we seek to develop a general unsupervised network alignment algorithm that makes no additional assumptions. Recently, network embedding has proven effective in many network analysis tasks, but embeddings of different networks are not aligned. Thus, we present our Deep Adversarial Network Alignment (DANA) framework that first uses deep adversarial learning to discover complex mappings for aligning the embedding distributions of the two networks. Then, using our learned mapping functions, DANA performs an efficient nearest neighbor node alignment. Furthermore, we present an unsupervised heuristic to perform model selection for DANA. We perform experiments on real world datasets to show the effectiveness of our framework for first aligning the graph embedding distributions and then discovering node alignments that outperform existing methods.

Explanations for Data Repair Through Shapley Values

Daniel Deutch
Nave Frost
Amir Gilad
Oren Sheffer

Data repair, i.e., the identification and fix of errors in the data, is a central component of the Data Science cycle. As such, significant research effort has been devoted to automate the repair process. Yet it still requires significant manual labor by the Data Scientists, tweaking and optimizing repair modules (up to 80% of their time, according to surveys).

To this end, we propose in this paper a novel framework for explaining the results of any data repair module. Explanations involve identifying the table cells and database constraints having the strongest influence on the process. Influence, in turn, is quantified through the game-theoretic notion of Shapley values, commonly used for explaining Machine Learning classifier results. The main technical challenge is that exact computation of Shapley values incurs exponential time. We consequently devise and optimize novel approximation algorithms, and analyze them both theoretically and empirically. Our results show the efficiency of our approach when compared to the alternative of adapting existing Shapley value computation techniques to the data repair settings.

A Deep Learning Framework for Self-evolving Hierarchical Community Detection

Daizong Ding
Mi Zhang
Hanrui Wang
Xudong Pan
Min Yang
Xiangnan He

Hierarchical community detection, which aims at discovering the hierarchical structure of a graph, attracts increasing attention due to its wide range of applications. However, due to the difficulty of parametrizing the community tree, existing methods mainly rely on heuristic algorithms, which are limited by their low accuracy and inability to handle new observations. As far as we know, how to leverage deep learning techniques to better discover hierarchical communities remains almost blank in the existing literature. In this paper, we present the first deep learning framework called ReinCom for hierarchical community detection. To address the challenge of parametrizing the community tree, we propose a novel growing-up process where, at each step, we first partition nodes into the community tree and then adjust the community tree according to the partition results. To learn an optimal growing-up process, we propose an embedding agent and a community agent to implement the two sub-steps respectively. Furthermore, we also propose an online learning strategy for new observations on the graph. Empirical results show that our proposed model has better modeling effectiveness than the state-of-the-art methods. For example, in terms of modularity, the performance of ReinCom is 33% higher than previous community detection works. Besides, with the aid of the learned node embeddings, we also devise a graph visualization algorithm which can consistently reflect the latent hierarchical structure of a graph.

Semi-deterministic and Contrastive Variational Graph Autoencoder for Recommendation

Yue Ding
Yuxiang Shi
Bo Chen
Chenghua Lin
Hongtao Lu
Jie Li
Ruiming Tang
Dong Wang

Variational AutoEncoder (VAE) is a popular deep generative framework with a solid theoretical basis. There are many research efforts on improving VAE. Among the existing works, a recently proposed deterministic Regularized AutoEncoder (RAE) provides a new scheme for generative modeling. RAE fixes the variance of the inferred Gaussian approximate posterior distribution as a hyperparameter, and substitutes the stochastic encoder by injecting noise into the input of a deterministic decoder. However, the deterministic RAE has three limitations: 1) RAE needs to fit the variance; 2) RAE requires ex-post density estimation to ensure sample quality; 3) RAE employs an additional gradient regularization to ensure training smoothness. Thus, it raises an interesting research question: Can we maintain the flexibility of variational inference while simplifying VAE, and at the same time ensuring a smooth training process to obtain good generative performance? Based on the above motivation, in this paper, we propose a novel Semi-deterministic and Contrastive Variational Graph autoencoder (SCVG) for item recommendation. The core design of SCVG is to learn the variance of the approximate Gaussian posterior distribution in a semi-deterministic manner by aggregating inferred mean vectors from other connected nodes via graph convolution operation. We analyze the expressive power of SCVG for the Weisfeiler-Lehman graph isomorphism test, and we deduce the simplified form of the evidence lower bound of SCVG. Besides, we introduce an efficient contrastive regularization instead of gradient regularization. We empirically show that the contrastive regularization makes learned user/item latent representation more personalized and helps to smooth the training process. We conduct extensive experiments on three real-world datasets to show the superiority of our model over state-of-the-art methods for the item recommendation task. Codes are available at https://github.com/syxkason/SCVG.

AdaGNN: Graph Neural Networks with Adaptive Frequency Response Filter

Yushun Dong
Kaize Ding
Brian Jalaian
Shuiwang Ji
Jundong Li

Graph Neural Networks have recently become a prevailing paradigm for various high-impact graph analytical problems. Existing efforts can be mainly categorized as spectral-based and spatial-based methods. The major challenge for the former is to find an appropriate graph filter to distill discriminative information from input signals for learning. Recently, myriads of explorations are made to achieve better graph filters, e.g., Graph Convolutional Network (GCN), which leverages Chebyshev polynomial truncation to seek an approximation of graph filters and bridge these two families of methods. Nevertheless, it has been shown in recent studies that GCN and its variants are essentially employing fixed low-pass filters to perform information denoising. Thus their learning capability is rather limited and may over-smooth node representations at deeper layers. To tackle these problems, we develop a novel graph neural network framework AdaGNN with a well-designed adaptive frequency response filter. At its core, AdaGNN leverages a simple but elegant trainable filter that spans across multiple layers to capture the varying importance of different frequency components for node representation learning. The inherent differences among different feature channels are also well captured by the filter. As such, it empowers AdaGNN with stronger expressiveness and naturally alleviates the over-smoothing problem. We empirically validate the effectiveness of the proposed framework on various benchmark datasets. Theoretical analysis is also provided to show the superiority of the proposed AdaGNN. The open-source implementation of AdaGNN can be found here: https://github.com/yushundong/AdaGNN.

AdaRNN: Adaptive Learning and Forecasting of Time Series

Yuntao Du
Jindong Wang
Wenjie Feng
Sinno Pan
Tao Qin
Renjun Xu
Chongjun Wang

Time series has wide applications in the real world and is known to be difficult to forecast. Since its statistical properties change over time, its distribution also changes temporally, which will cause severe distribution shift problem to existing methods. However, it remains unexplored to model the time series in the distribution perspective. In this paper, we term this as Temporal Covariate Shift (TCS). This paper proposes Adaptive RNNs (AdaRNN) to tackle the TCS problem by building an adaptive model that generalizes well on the unseen test data. AdaRNN is sequentially composed of two novel algorithms. First, we propose Temporal Distribution Characterization to better characterize the distribution information in the TS. Second, we propose Temporal Distribution Matching to reduce the distribution mismatch in TS to learn the adaptive TS model. AdaRNN is a general framework with flexible distribution distances integrated. Experiments on human activity recognition, air quality prediction, and financial analysis show that AdaRNN outperforms the latest methods by a classification accuracy of 2.6% and significantly reduces the RMSE by 9.0%. We also show that the temporal distribution matching algorithm can be extended in Transformer structure to boost its performance.

Query-Variant Advertisement Text Generation with Association Knowledge

Siyu Duan
Wei Li
Jing Cai
Yancheng He
Yunfang Wu

Online advertising is an important revenue source for many IT companies. In the search advertising scenario, advertisement text that meets the need of the search query would be more attractive to the user. However, the manual creation of query-variant advertisement texts for massive items is expensive. Traditional text generation methods tend to focus on the general searching needs with high frequency while ignoring the diverse personalized searching needs with low frequency. In this paper, we propose the query-variant advertisement text generation task that aims to generate candidate advertisement texts for different web search queries with various needs based on queries and item keywords. To solve the problem of ignoring low-frequency needs, we propose a dynamic association mechanism to expand the receptive field based on external knowledge, which can obtain associated words to be added to the input. These associated words can serve as bridges to transfer the ability of the model from the familiar high-frequency words to the unfamiliar low-frequency words. With association, the model can make use of various personalized needs in queries and generate query-variant advertisement texts. Both automatic and human evaluations show that our model can generate more attractive advertisement text than baselines.

Fine and Coarse Granular Argument Classification before Clustering

Lorik Dumani
Tobias Wiesenfeldt
Ralf Schenkel

Computational argumentation and especially argument mining together with retrieval enjoys increasing popularity. In contrast to standard search engines that focus on finding documents relevant to a query, argument retrieval aims at finding the best supporting and attacking premises given a query claim, e.g., from a predefined collection of arguments. Here, a claim is the central part of an argument representing the standpoint of a speaker with the goal to persuade the audience, and a premise serves as evidence to the claim. In addition to the actual retrieval process, existing work has focused on (1) classifying polarities of arguments into supporting or opposing, (2) classifying arguments by their frames (such as economic or environmental), and (3) clustering similar arguments by their meaning to avoid repetitions in the result list. For experiments, either hand-made argument collections or arguments extracted from debate portals were used. In this paper, we extend existing work on argument clustering, making the following contributions: First, we introduce a novel pipeline for clustering arguments. While previous work classified arguments either by polarity, frame, or meaning, our pipeline incorporates these three, allowing a more systematic presentation of arguments. Second, we introduce a new dataset consisting of 365 argument graphs accompanying more than 11,000 high-quality arguments that, contrary to previous datasets, have been generated, displayed, and verified by journalists and were published in newspapers. A thorough evaluation with this dataset provides a first baseline for future work.

Continuous-Time Sequential Recommendation with Temporal Graph Collaborative Transformer

Ziwei Fan
Zhiwei Liu
Jiawei Zhang
Yun Xiong
Lei Zheng
Philip S. Yu

In order to model the evolution of user preference, we should learn user/item embeddings based on time-ordered item purchasing sequences, which is defined as Sequential Recommendation~(SR) problem. Existing methods leverage sequential patterns to model item transitions. However, most of them ignore crucial temporal collaborative signals, which are latent in evolving user-item interactions and coexist with sequential patterns. Therefore, we propose to unify sequential patterns and temporal collaborative signals to improve the quality of recommendation, which is rather challenging. Firstly, it is hard to simultaneously encode sequential patterns and collaborative signals. Secondly, it is non-trivial to express the temporal effects of collaborative signals.

Hence, we design a new framework Temporal Graph Sequential Recommender (TGSRec) upon our defined continuous-time bipartite graph. We propose a novel Temporal Collaborative Transformer TCT layer in TGSRec, which advances the self-attention mechanism by adopting a novel collaborative attention. TCT layer can simultaneously capture collaborative signals from both users and items, as well as considering temporal dynamics inside sequential patterns. We propagate the information learned from TCT layer over the temporal graph to unify sequential patterns and temporal collaborative signals. Empirical results on five datasets show that modelname significantly outperforms other baselines, in average up to 22.5% and 22.1% absolute improvements in Recall@10 and MRR, respectively.

Large-scale Secure XGB for Vertical Federated Learning

Wenjing Fang
Derun Zhao
Jin Tan
Chaochao Chen
Chaofan Yu
Li Wang
Lei Wang
Jun Zhou
Benyu Zhang

Privacy-preserving machine learning has drawn increasingly attention recently, especially with kinds of privacy regulations come into force. Under such situation, Federated Learning (FL) appears to facilitate privacy-preserving joint modeling among multiple parties. Although many federated algorithms have been extensively studied, there is still a lack of secure and practical gradient tree boosting models (e.g., XGB) in literature. In this paper, we aim to build large-scale secure XGB under vertically federated learning setting. We guarantee data privacy from three aspects. Specifically, (1) we employ secure multi-party computation techniques to avoid leaking intermediate information during training, (2) we store the output model in a distributed manner in order to minimize information release, and (3) we provide a novel algorithm for secure XGB predict with the distributed model. Furthermore, by proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset. We conduct extensive experiments on both public datasets and real-world datasets, and the results demonstrate that our proposed XGB models provide not only competitive accuracy but also practical performance.

HyperGraph Convolution Based Attributed HyperGraph Clustering

Barakeel Fanseu Kamhoua
Lin Zhang
Kaili Ma
James Cheng
Bo Li
Bo Han

Attributed Graph Clustering (AGC) and Attributed Hypergraph Clustering (AHC) are important topics in graph mining with many applications. For AGC, amongst the unsupervised methods that combine the graph structure with node attributes, graph convolution has been shown to achieve impressive results. However, the effects of graph convolution on AGC have not yet been adequately studied. In this paper, we show that graph convolution attempts to find the best trade-off between node attribute distance and the number of inter-cluster edges. On the one hand, we show that compared to clustering node attributes directly, graph convolution produces a greater distance between node attributes in the same cluster and a smaller distance between node attributes in different clusters (which is detrimental for clustering). On the other hand, we show that graph convolution benefits clustering by considerably reducing the number of edges among different clusters. We then extend our result on AGC to AHC and leverage the hypergraph convolution to propose an unsupervised, fast, and memory-efficient algorithm (GRAC) for AHC, which achieves excellent performance on popular supervised clustering measures.

CANN: Coupled Approximation Neural Network for Partial Domain Adaptation

Cheng Feng
Chaoliang Zhong
Jie Wang
Jun Sun
Yasuto Yokota

Unsupervised domain adaptation (UDA) methods aim to transfer knowledge from a labeled source domain to an unlabeled target domain. Most existing UDA methods try to learn domain-invariant features so that the classifier trained by the source labels can automatically be adapted to the target domain. However, recent works have shown the limitations of these methods when label distributions differ between the source and target domains. Especially, in partial domain adaptation (PDA) where the source domain holds plenty of individual labels (private labels) not appeared in the target domain, the domain-invariant features can cause catastrophic performance degradation. In this paper, based on the originally favorable underlying structures of the two domains, we learn two kinds of target features, i.e., the source-approximate features and target-approximate features instead of the domain-invariant features. The source-approximate features utilize the consistency of the two domains to estimate the distribution of the source private labels. The target-approximate features enhance the feature discrimination in the target domain while detecting the hard (outlier) target samples. A novel Coupled Approximation Neural Network (CANN) has been proposed to co-train the source-approximate and target-approximate features by two parallel sub-networks without sharing the parameters. We apply CANN to three prevalent transfer learning benchmark datasets, Office-Home, Office-31, and Visda2017 with both UDA and PDA settings. The results show that CANN outperforms all baselines by a large margin in PDA and also performs best in UDA.

Zero Shot on the Cold-Start Problem: Model-Agnostic Interest Learning for Recommender Systems

Philip J. Feng
Pingjun Pan
Tingting Zhou
Hongxiang Chen
Chuanjiang Luo

User behavior has been validated to be effective in revealing personalized preferences for commercial recommendations. However, few user-item interactions can be collected for new users, which results in a nullspace for their interests, ie, the cold-start dilemma. In this paper, a two-tower framework, namely, the model-agnostic interest learning (MAIL) framework, is proposed to address the cold-start recommendation (CSR) problem for recommender systems. In MAIL, one unique tower is constructed to tackle the CSR from a zero-shot view, and the other tower focuses on the general ranking task. Specifically, the zero-shot tower first performs cross-modal reconstruction with dual autoencoders to obtain virtual behavior data from highly aligned hidden features for new users; and the ranking tower can then output recommendations for users based on the completed data by the zero-shot tower. Practically, the ranking tower in MAIL is model-agnostic and can be implemented with any embedding-based deep models. Based on the cotraining of the two towers, the MAIL presents an end-to-end method for recommender systems that shows an incremental performance improvement. The proposed method has been successfully deployed on the live recommendation system of NetEase Cloud Music to achieve a click-through rate improvement of 13% to 15% for millions of users. Offline experiments on real-world datasets also show its superior performance in CSR. Our code is available.

CMML: Contextual Modulation Meta Learning for Cold-Start Recommendation

Xidong Feng
Chen Chen
Dong Li
Mengchen Zhao
Jianye Hao
Jun Wang

Practical recommender systems experience a cold-start problem when observed user-item interactions in the history are insufficient. Meta learning, especially gradient based one, can be adopted to tackle this problem by learning initial parameters of the model and thus allowing fast adaptation to a specific task from limited data examples. Though with significant performance improvement, it commonly suffers from two critical issues: the non-compatibility with mainstream industrial deployment and the heavy computational burdens, both due to the inner-loop gradient operation. These two issues make them hard to be applied in practical recommender systems. To enjoy the benefits of meta learning framework and mitigate these problems, we propose a recommendation framework called Contextual Modulation Meta Learning (CMML). CMML is composed of fully feed-forward operations so it is computationally efficient and completely compatible with the mainstream industrial deployment. CMML consists of three components, including a context encoder that can generate context embedding to represent a specific task, a hybrid context generator that aggregates specific user-item features with task-level context, and a contextual modulation network, which can modulate the recommendation model to adapt effectively. We validate our approach on both scenario-specific and user-specific cold-start setting on various real-world datasets, showing CMML can achieve comparable or even better performance with gradient based methods yet with higher computational efficiency and better interpretability.

Popcorn: Human-in-the-loop Popularity Debiasing in Conversational Recommender Systems

Zuohui Fu
Yikun Xian
Shijie Geng
Gerard de Melo
Yongfeng Zhang

Recent conversational recommender systems (CRS) provide a promising solution to accurately capture a user's preferences by communicating with users in natural language to interactively guide them while pro-actively eliciting their current interests. Previous research on this mainly focused on either learning a supervised model with semantic features extracted from the user's responses, or training a policy network to control the dialogue state. However, none of them has considered the issue of popularity bias in a CRS. This paper proposes a human-in-the-loop popularity debiasing framework that integrates real-time semantic understanding of open-ended user utterances as well as historical records, while also effectively managing the dialogue with the user. This allows the CRS to balance the recommendation performance as well as the item popularity so as to avoid the well-known "long-tail'' effect. We demonstrate the effectiveness of our approach via experiments on two conversational recommendation datasets, and the results confirm that our proposed approach achieves high-accuracy recommendation while mitigating popularity bias.

Fast and Accurate Anchor Graph-based Label Prediction

Yasuhiro Fujiwara
Yasutoshi Ida
Atsutoshi Kumagai
Sekitoshi Kanai
Naonori Ueda

Anchor graphs are a popular tool used in label prediction of sparsely labeled data. In anchor graphs, labels of labeled data are propagated to unlabeled data via anchor points; anchor points are the centers of k-means clusters. Anchor graph-based label prediction determines local weights between data points and anchor points by exploiting Nesterov's method to obtain the graph's adjacency matrix, and it inverts a matrix obtained from the adjacency matrix to predict labels., however, incurs high computation cost since (1) Nesterov's method is applied to all closest anchor points to compute local weights, and (2) the computation cost of the inversion matrix is cubic in the number of anchor points. We propose an approach that can efficiently perform anchor graph-based label prediction because of its two key advances: (1) it prunes unnecessary anchor points so they are not passed to Nesterov's method, and (2) it applies the conjugate gradient method in computing labels of data points to avoid matrix inversion. In addition, we propose to exploit basis vectors computed by SVD as anchor points to improve label prediction accuracy. Experiments show that our approach outperforms the previous approaches in terms of efficiency and accuracy.

Pruning Meta-Trained Networks for On-Device Adaptation

Dawei Gao
Xiaoxi He
Zimu Zhou
Yongxin Tong
Lothar Thiele

Adapting neural networks to unseen tasks with few training samples on resource-constrained devices benefits various Internet-of-Things applications. Such neural networks should learn the new tasks in few shots and be compact in size. Meta-learning enables few-shot learning, yet the meta-trained networks can be over-parameterised. However, naive combination of standard compression techniques like network pruning with meta-learning jeopardises the ability for fast adaptation. In this work, we propose adaptation-aware network pruning (ANP), a novel pruning scheme that works with existing meta-learning methods for a compact network capable of fast adaptation. ANP uses weight importance metric that is based on the sensitivity of the meta-objective rather than the conventional loss function, and adopts approximation of derivatives and layer-wise pruning techniques to reduce the overhead of computing the new importance metric. Evaluations on few-shot classification benchmarks show that ANP can prune meta-trained convolutional and residual networks by 85% without affecting their fast adaptation.

Learning An End-to-End Structure for Retrieval in Large-Scale Recommendations

Weihao Gao
Xiangjun Fan
Chong Wang
Jiankai Sun
Kai Jia
Wenzi Xiao
Ruofan Ding
Xingyan Bin
Hui Yang
Xiaobing Liu

One of the core problems in large-scale recommendations is to retrieve top relevant candidates accurately and efficiently, preferably in sub-linear time. Previous approaches are mostly based on a two-step procedure: first learn an inner-product model, and then use some approximate nearest neighbor (ANN) search algorithm to find top candidates. In this paper, we present Deep Retrieval (DR), to learn a retrievable structure directly with user-item interaction data (e.g. clicks) without resorting to the Euclidean space assumption in ANN algorithms. DR's structure encodes all candidate items into a discrete latent space. Those latent codes for the candidates are model parameters and learnt together with other neural network parameters to maximize the same objective function. With the model learnt, a beam search over the structure is performed to retrieve the top candidates for reranking. Empirically, we first demonstrate that DR, with sub-linear computational complexity, can achieve almost the same accuracy as the brute-force baseline on two public datasets. Moreover, we show that, in a live production recommendation system, a deployed DR approach significantly outperforms a well-tuned ANN baseline in terms of engagement metrics. To the best of our knowledge, DR is among the first non-ANN algorithms successfully deployed at the scale of hundreds of millions of items for industrial recommendation systems.

Evaluating Human-AI Hybrid Conversational Systems with Chatbot Message Suggestions

Zihan Gao
Jiepu Jiang

AI chatbots can offer suggestions to help humans answer questions by reducing text entry effort and providing relevant knowledge for unfamiliar questions. We study whether chatbot suggestions can help people answer knowledge-demanding questions in a conversation and influence response quality and efficiency. We conducted a large-scale crowdsourcing user study and evaluated 20 hybrid system variants and a human-only baseline. The hybrid systems used four chatbots of varied response quality and differed in the number of suggestions and whether to preset the message box with top suggestions.

Experimental results show that chatbot suggestions---even using poor-performing chatbots---have consistently improved response efficiency. Compared with the human-only setting, hybrid systems have reduced response time by 12%--35% and keystrokes by 33%--60%, and users have adopted a suggestion for the final response without any changes in 44%--68% of the cases. In contrast, crowd workers in the human-only setting typed most of the response texts and copied 5% of the answers from other sites.

However, we also found that chatbot suggestions did not always help response quality. Specifically, in hybrid systems equipped with poor-performing chatbots, users responded with lower-quality answers than others in the human-only setting. It seems that users would not simply ignore poor suggestions and compose responses as they could without seeing the suggestions. Besides, presetting the message box has improved reply efficiency without hurting response quality. We did not find that showing more suggestions helps or hurts response quality or efficiency consistently. Our study reveals how and when AI chatbot suggestions can help people answer questions in hybrid conversational systems.

Computing and Maintaining Provenance of Query Result Probabilities in Uncertain Knowledge Graphs

Garima Gaur
Abhishek Dang
Arnab Bhattacharya
Srikanta Bedathur

Knowledge graphs (KG) model relationships between entities as labeled edges (or facts). They are mostly constructed using a suite of automated extractors, thereby inherently leading to uncertainty in the extracted facts. Modeling the uncertainty as probabilistic confidence scores results in a probabilistic knowledge graph. Graph queries over such probabilistic KGs require answer computation along with the computation of result probabilities, i.e., probabilistic inference. We propose a system, HAPPI (How Provenance of Probabilistic Inference), to handle such query processing and inference. Complying with the standard provenance semiring model, we propose a novel commutative semiring to symbolically compute the probability of the result of a query. These provenance-polynomial-like symbolic expressions encode fine-grained information about the probability computation process. We leverage this encoding to efficiently compute as well as maintain probabilities of results even as the underlying KG changes. Focusing on conjunctive basic graph pattern queries, we observe that HAPPI is more efficient than knowledge compilation for answering commonly occurring queries with lower range of probability derivation complexity. We propose an adaptive system that leverages the strengths of both HAPPI and compilation based techniques, for not only to perform efficient probabilistic inference and compute their provenance, but also to incrementally maintain them.

To Be or not to Be, Tail Labels in Extreme Multi-label Learning

Zhiqi Ge
Ximing Li

EXtreme Multi-label Learning (XML) aims to predict each instance its most relevant subset of labels from an extremely huge label space, often exceeding one million or even larger in many real applications. In XML scenarios, the labels exhibit a long tail distribution, where a significant number of labels appear in very few instances, referred to as tail labels. Unfortunately, due to the lack of positive instances, the tail labels are intractable to learn as well as predict. Several previous studies even suggested that the tail labels can be directly removed by referring to their label frequencies. We consider that such violent principle may miss many significant tail labels, because the predictive accuracy is not strictly consistent with the label frequency especially for tail labels. In this paper, we are interested in finding a reasonable principle to determine whether a tail label should be removed, not only depending on their label frequencies. To this end, we investigate a method named Nearest Neighbor Positive Proportion Score (N2P2S) to score the tail labels by annotations of the instance neighbors. Extensive empirical results indicate that the proposed N2P2S can effectively screen the tail labels, where many preserved tail labels can be learned and accurately predicted even with very few positive instances.

Hierarchical Semantics Matching For Heterogeneous Spatio-temporal Sources

Daniel Glake
Norbert Ritter
Florian Ocker
Nima Ahmady-Moghaddam
Daniel Osterholz
Ulfia Lenfers
Thomas Clemen

Spatio-temporal data are semantically valuable information used for various analytical tasks to identify spatially relevant and temporally limited correlations within a domain. The increasing availability and data acquisition from multiple sources with their typically high heterogeneity are getting more and more attention. However, these sources often lack interconnecting shared keys, making their integration a challenging problem. For example, publicly available parking data that consist of point data on parking facilities with fluctuating occupancy and static location data on parking spaces cannot be directly correlated. Both data sets describe two different aspects from distinct sources in which parking spaces and fluctuating occupancy are part of the same semantic model object. Especially for ad hoc analytical tasks on integrated models, these missing relationships cannot be handled using join operations as usual in relational databases. The reason lies in the lack of equijoin relationships, comparing for equality of strings and additional overhead in loading data up before processing. This paper addresses the optimization problem of finding suitable partners in the absence of equijoin relations for heterogeneous spatio-temporal data, applicable to ad hoc analytics. We propose a graph-based approach that achieves good recall and performance scaling via hierarchically separating the semantics along spatial, temporal, and domain-specific dimensions. We evaluate our approach using public data, showing that it is suitable for many standard join scenarios and highlighting its limitations.

Zero-shot Relation Classification from Side Information

Jiaying Gong
Hoda Eldardiry

We propose a zero-shot learning relation classification (ZSLRC) framework that improves on state-of-the-art by its ability to recognize novel relations that were not present in training data. The zero-shot learning approach mimics the way humans learn and recognize new concepts with no prior knowledge. To achieve this, ZSLRC uses advanced prototypical networks that are modified to utilize weighted side (auxiliary) information. ZSLRC's side information is built from keywords, hypernyms of name entities, and labels and their synonyms. ZSLRC also includes an automatic hypernym extraction framework that acquires hypernyms of various name entities directly from the web. ZSLRC improves on state-of-the-art few-shot learning relation classification methods that rely on labeled training data and is therefore applicable more widely even in real-world scenarios where some relations have no corresponding labeled examples for training. We present results using extensive experiments on two public datasets (NYT and FewRel) and show that ZSLRC significantly outperforms state-of-the-art methods on supervised learning, few-shot learning, and zero-shot learning tasks. Our experimental results also demonstrate the effectiveness and robustness of our proposed model.

Driving the Herd: Search Engines as Content Influencers

Gregory Goren
Oren Kurland
Moshe Tennenholtz
Fiana Raiber

In competitive search settings such as the Web, many documents' authors (publishers) opt to have their documents highly ranked for some queries. To this end, they modify the documents --- specifically, their content --- in response to induced rankings. Thus, the search engine affects the content in the corpus via its ranking decisions. We present a first study of the ability of search engines to drive pre-defined, targeted, content effects in the corpus using simple techniques. The first is based on the herding phenomenon --- a celebrated result from the economics literature --- and the second is based on biasing the relevance ranking function. The types of content effects we study are either topical or touch on specific document properties --- length and inclusion of query terms. Analysis of ranking competitions we organized between incentivized publishers shows that the types of content effects we target can indeed be attained by applying our suggested techniques. These findings have important implications with regard to the role of search engines in shaping the corpus.

Mining Bursty Groups from Interaction Data

Alexander Gorovits
Lin Zhang
Ekta Gujral
Evangelos Papalexakis
Petko Bogdanov

Empirical studies and theoretical models both highlight burstinessas a common temporal pattern in online behavior. A key driver for burstiness is the self-exciting nature of online interactions. For example, posts in online groups often incite posts in response. Such temporal dependencies are easily lost when interaction data is aggregated in snapshots which are subsequently analyzed independently. An alternative is to model individual interactions as a multi-dimensional self-exciting process, thus, enforcing both temporal and network dependencies. Point processes, however, are challenging to employ for large real-world datasets as fitting them incurs super-linear cost in the number of events. How can we efficiently detect online groups exhibiting bursty self-exciting temporal behavior in large real-world datasets?

We propose a bursty group detection framework, called MYRON, which explicitly models self-exciting behavior within groups while also accounting for network-wide baseline activity. MYRON imposes bursty temporal structure within a scalable tensor factorization framework to decouple within-group interactions as interpretable factors. Our framework can incorporate different "shapes"of temporal burstiness via wavelet decomposition or kernels forself-exciting behavior. Our evaluation on both synthetic and real-world data demonstrates MYRON's utility in community detection.It is up to 40% more effective in detecting ground truth groups compared to state-of-the-art baselines. In addition, MYRON is able to uncover interpretable bursty patterns of behavior from user-photo interactions in Flickr.

Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing

Sindhu C. M. Gowda
Shalmali Joshi
Haoran Zhang
Marzyeh Ghassemi

Machine learning models achieve state-of-the-art performance on many supervised learning tasks. However, prior evidence suggests that these models may learn to rely on "shortcut" biases or spurious correlations (intuitively, correlations that do not hold in the test as they hold in train) for good predictive performance. Such models cannot be trusted in deployment environments to provide accurate predictions. While viewing the problem from a causal lens is known to be useful, the seamless integration of causation techniques into machine learning pipelines remains cumbersome and expensive. In this work, we study and extend a causal pre-training debiasing technique called causal bootstrapping (CB) under five practical confounded-data generation-acquisition scenarios (with known and unknown confounding). Under these settings, we systematically investigate the effect of confounding bias on deep learning model performance, demonstrating their propensity to rely on shortcut biases when these biases are not properly accounted for. We demonstrate that such a causal pre-training technique can significantly outperform existing base practices to mitigate confounding bias on real-world domain generalization benchmarking tasks. This systematic investigation underlines the importance of accounting for the underlying data-generating mechanisms and fortifying data-preprocessing pipelines with a causal framework to develop methods robust to confounding biases.

VPALG: Paper-publication Prediction with Graph Neural Networks

Renchu Guan
Yonghao Liu
Xiaoyue Feng
Ximing Li

Paper-publication venue prediction aims to predict candidate publication venues that effectively suit given submissions. This technology is developing rapidly with the popularity of machine learning models. However, most previous methods ignore the structure information of papers, while modeling them with graphs can naturally solve this drawback. Meanwhile, they either use hand-crafted or bag-of-word features to represent the papers, ignoring the ones that involve high-level semantics. Moreover, existing methods assume that the venue where a paper is published as a correct venue for the data annotation, which is unrealistic. One paper can be relevant to many venues. In this paper, we attempt to address these problems above and develop a novel prediction model, namelyVenue Prediction with Abstract-Level Graph (Vpalg xspace), which can serve as an effective decision-making tool for venue selections. Specifically, to achieve more discriminative paper abstract representations, we construct each abstract as a semantic graph and perform a dual attention message passing neural network for representation learning. Then, the proposed model can be trained over the learned abstract representations with their labels and generalized via self-training. Empirically, we employ the PubMed dataset and further collect two new datasets from the top journals and conferences in computer science. Experimental results indicate the superior performance of Vpalg xspace, consistently outperforming the existing baseline methods.

NED: Niche Detection in User Content Consumption Data

Ekta Gujral
Leonardo Neves
Evangelos Papalexakis
Neil Shah

Explainable machine learning methods have attracted increased interest in recent years. In this work, we pose and study the niche detection problem, which imposes an explainable lens on the classical problem of co-clustering interactions across two modes. In the niche detection problem, our goal is to identify niches, or co-clusters with node-attribute oriented explanations. Niche detection is applicable to many social content consumption scenarios, where an end goal is to describe and distill high-level insights about user-content associations: not only that certain users like certain types of content, but rather the types of users and content, explained via node attributes. Some examples are an e-commerce platform with who-buys-what interactions and user and product attributes, or a mobile call platform with who-calls-whom interactions and user attributes. Discovering and characterizing niches has powerful implications for user behavior understanding, as well as marketing and targeted content production. Unlike prior works, ours focuses on the intersection of explainable methods and co-clustering. First, we formalize the niche detection problem and discuss preliminaries. Next, we design an end-to-end framework, NED, which operates in two steps: discovering co-clusters of user behaviors based on interaction densities, and explaining them using attributes of involved nodes. Finally, we show experimental results on several public datasets, as well as a large-scale industrial dataset from Snapchat, demonstrating that NED improves in both co-clustering (20% accuracy) and explanation-related objectives (12% average precision) compared to state-of-the-art methods.

Learning Discriminative and Unbiased Representations for Few-Shot Relation Extraction

Jiale Han
Bo Cheng
Guoshun Nan

Few-shot relation extraction (FSRE) aims to predict the relation for a pair of entities in a sentence by exploring a few labeled instances for each relation type. Current methods mainly rely on meta-learning to learn generalized representations by optimizing the network parameters based on various collections of tasks sampled from training data. However, these methods may suffer from two main issues. 1) Insufficient supervision of meta-learning to learn discriminative representations on very few training instances, which are sampled from a large amount of base class data. 2) Spurious correlations between entities and relation types due to the biased training procedure that focuses more on entity pair rather than context. To learn more discriminative and unbiased representations for FSRE, this paper proposes a two-stage approach via supervised contrastive learning and sentence- and entity-level prototypical networks. In the first (pre-training) stage, we introduce a supervised contrastive pre-training method, which is able to yield more discriminative representations by learning from the entire training instances, such that the semantically related representations are close to each other, and far away otherwise. In the second (meta-learning) stage, we propose a novel sentence- and entity-level prototypical network equipped with fine-grained feature-wise fusion strategy to learn unbiased representations, where the networks are initialized with the parameters trained in the first stage. Specifically, the proposed network consists of a sentence branch and an entity branch, taking entire sentences and entity mentions as inputs, respectively. The entity branch explicitly captures the correlation between entity pairs and relations, and then dynamically adjusts the sentence branch's prediction distributions. By doing so, the spurious correlations issue caused by biased training samples can be properly mitigated. Extensive experiments on two FSRE benchmarks demonstrate the effectiveness of our approach.

Multi-view Interaction Learning for Few-Shot Relation Classification

Yi Han
Linbo Qiao
Jianming Zheng
Zhigang Kan
Linhui Feng
Yifu Gao
Yu Tang
Qi Zhai
Dongsheng Li
Xiangke Liao

Conventional deep learning-based Relation Classification (RC) methods heavily rely on large-scale training dataset and fail to generalize to unseen classes when training data is scant. This work concentrates on RC tasks in few-shot scenarios in which models classify the unlabelled samples given only few labeled samples. Existing few-shot RC models consider the dataset as a series of individual instances and have not fully utilized interaction information among them. Interaction information is conducive to indicate the important areas and produce discriminating representations. So this paper proposes a novel interactive attention network (IAN) which uses inter-instance and intra-instance interactive information to classify the relations. Inter-instance interactive information is first introduced to solve the low-resource problem by capturing the semantic relevance between an instance pair. Intra-instance interactive information is then introduced to address the ambiguous relation classification issue by extracting the entity information inner an instance. Extensive numerical experimental results demonstrate the proposed method promotes the accuracy of down-stream task.

GraphITE: Estimating Individual Effects of Graph-structured Treatments

Shonosuke Harada
Hisashi Kashima

Outcome estimation of treatments for individual targets is a crucial foundation for decision making based on causal relations. Most of the existing outcome estimation methods deal with binary or multiple-choice treatments; however, in some applications, the number of interventions can be very large, while the treatments themselves have rich information. In this study, we consider one important instance of such cases, that is, the outcome estimation problem of graph-structured treatments such as drugs. Due to the large number of possible interventions, the counterfactual nature of observational data, which appears in conventional treatment effect estimation, becomes a more serious issue in this problem. Our proposed method GraphITE (pronounced 'graphite') obtains the representations of the graph-structured treatments using graph neural networks, and also mitigates the observation biases by using the HSIC regularization that increases the independence of the representations of the targets and the treatments. In contrast with the existing methods, which cannot deal with "zero-shot" treatments that are not included in observational data, GraphITE can efficiently handle them thanks to its capability of incorporating graph-structured treatments. The experiments using the two real-world datasets show GraphITE outperforms baselines especially in cases with a large number of treatments.

Learning Multiple Intent Representations for Search Queries

Helia Hashemi
Hamed Zamani
W. Bruce Croft

Representation learning has always played an important role in information retrieval (IR) systems. Most retrieval models, including recent neural approaches, use representations to calculate similarities between queries and documents to find relevant information from a corpus. Recent models use large-scale pre-trained language models for query representation. The typical use of these models, however, has a major limitation in that they generate only a single representation for a query, which may have multiple intents or facets. The focus of this paper is to address this limitation by considering neural models that support multiple intent representations for each query. Specifically, we propose the NMIR (Neural Multiple Intent Representations) model that can generate semantically different query intents and their appropriate representations. We evaluate our model on query facet generation using a large-scale dataset of real user queries sampled from the Bing search logs. We also provide an extrinsic evaluation of the proposed model using a clarifying question selection task. The results show that NMIR significantly outperforms competitive baselines.

Estimating Average Treatment Effects via Orthogonal Regularization

Tobias Hatt
Stefan Feuerriegel

Decision-making often requires accurate estimation of treatment effects from observational data. This is challenging as outcomes of alternative decisions are not observed and have to be estimated. Previous methods estimate outcomes based on unconfoundedness but neglect any constraints that unconfoundedness imposes on the outcomes. In this paper, we propose a novel regularization framework for estimating average treatment effects that exploits unconfoundedness. To this end, we formalize unconfoundedness as an orthogonality constraint, which ensures that the outcomes are orthogonal to the treatment assignment. This orthogonality constraint is then included in the loss function via a regularization. Based on our regularization framework, we develop deep orthogonal networks for unconfounded treatments (DONUT), which learn outcomes that are orthogonal to the treatment assignment. Using a variety of benchmark datasets for estimating average treatment effects, we demonstrate that DONUT outperforms the state-of-the-art substantially.

Click-Through Rate Prediction with Multi-Modal Hypergraphs

Li He
Hongxu Chen
Dingxian Wang
Shoaib Jameel
Philip Yu
Guandong Xu

Advertising is critical to many online e-commerce platforms such as e-Bay and Amazon. One of the important signals that these platforms rely upon is the click-through rate (CTR) prediction. The recent popularity of multi-modal sharing platforms such as TikTok has led to an increased interest in online micro-videos. It is, therefore, useful to consider micro-videos to help a merchant target micro-video advertising better and find users' favourites to enhance user experience. Existing works on CTR prediction largely exploit unimodal content to learn item representations. A relatively minimal effort has been made to leverage multi-modal information exchange among users and items. We propose a model to exploit the temporal user-item interactions to guide the representation learning with multi-modal features, and further predict the user click rate of the micro-video item. We design a Hypergraph Click-Through Rate prediction framework (HyperCTR) built upon the hyperedge notion of hypergraph neural networks, which can yield modal-specific representations of users and micro-videos to better capture user preferences. We construct a time-aware user-item bipartite network with multi-modal information and enrich the representation of each user and item with the generated interests-based user hypergraph and item hypergraph. Through extensive experiments on three public datasets, we demonstrate that our proposed model significantly outperforms various state-of-the-art methods.

Stock Trend Prediction with Multi-granularity Data: A Contrastive Learning Approach with Adaptive Fusion

Min Hou
Chang Xu
Yang Liu
Weiqing Liu
Jiang Bian
Le Wu
Zhi Li
Enhong Chen
Tie-Yan Liu

Stock trend prediction plays a crucial role in quantitative investing. Given the prediction task on a certain granularity (e.g., daily trend), a large portion of existing studies merely leverage market data of the same granularity (e.g., daily market data). In financial investment scenarios, however, there exist amounts of finer-grained information (e.g., high-frequency data) that contain more detailed investment signals beyond the original granularity data. This motivates us to investigate how to leverage multi-granularity market data to enhance the accuracy of stock trend prediction. Some straightforward methods, such as concatenating finer-grained data as features or fusing with a model based on finer-grained features, may not lead to more precise stock trend prediction due to some unique challenges. First, the inconsistency of granularity between the target trend and finer-grained data could substantially increase optimization difficulty, such as the relative sparsity of the target trend compared with higher dimensions of finer-grained features. Moreover, the continuously changing financial market state could result in varying efficacy of heterogeneous multi-granularity information, which consequently requires a dynamic approach for proper fusion among them. In this paper, we propose the Contrastive Multi-Granularity Learning Framework (CMLF) to address these challenges. Particularly, we first design two novel contrastive learning objectives at the pre-training stage to address the inconsistency issue by constructing additional self-supervised signals relying on the inherent character of stock data. We also design a gate mechanism based on market-aware technical indicators to fuse the multi-granularity features at each time step adaptively. Extensive experiments on three real-world datasets show significant improvements of our approach over the state-of-the-art baselines on stock trend prediction and profitability in real investing scenarios.

What is Next when Sequential Prediction Meets Implicitly Hard Interaction?

Kaixi Hu
Lin Li
Qing Xie
Jianquan Liu
Xiaohui Tao

Hard interaction learning between source sequences and their next targets is challenging, which exists in a myriad of sequential prediction tasks. During the training process, most existing methods focus on explicitly hard interactions caused by wrong responses. However, a model might conduct correct responses by capturing a subset of learnable patterns, which results in implicitly hard interactions with some unlearned patterns. As such, its generalization performance is weakened. The problem gets more serious in sequential prediction due to the interference of substantial similar candidate targets.

To this end, we propose a Hardness Aware Interaction Learning framework (HAIL) that mainly consists of two base sequential learning networks and mutual exclusivity distillation (MED). The base networks are initialized differently to learn distinctive view patterns, thus gaining different training experiences. The experiences in the form of the unlikelihood of correct responses are drawn from each other by MED, which provides mutual exclusivity knowledge to figure out implicitly hard interactions. Moreover, we deduce that the unlikelihood essentially introduces additional gradients to push the pattern learning of correct responses. Our framework can be easily extended to more peer base networks. Evaluation is conducted on four datasets covering cyber and physical spaces. The experimental results demonstrate that our framework outperforms several state-of-the-art methods in terms of top-k based metrics.

Rectifying Pseudo Labels: Iterative Feature Clustering for Graph Representation Learning

Zhihui Hu
Guang Kou
Haoyu Zhang
Na Li
Ke Yang
Lin Liu

Graph Convolutional Networks (GCNs) are powerful representation learning methods for non-Euclidean data. Compared with the Euclidean data, labeling the non-Euclidean data is more expensive. Meanwhile, most existing GCNs only utilize few labeled data but ignore most of the unlabeled data. To address this issue, we design a novel end-to-end Iterative Feature Clustering Graph Convolutional Networks (IFC-GCN) that enhances the standard GCN with an Iterative Feature Clustering (IFC) module. The proposed IFC module constrains node features iteratively based on the predicted pseudo labels and feature clustering. Further, we design an EM-like framework for IFC-GCN training, which improves the network performance by rectifying the pseudo labels and the node features alternately. Theoretical analysis and experimental results show that our proposed IFC module can effectively modify the node features. Experimental results on public datasets demonstrate that IFC-GCN outperforms state-of-the-art methods on the semi-supervised node classification task.

Two-tier Graph Contextual Embedding for Cross-device User Matching

Hongren Huang
Shu Guo
Chen Li
Jiawei Sheng
Lihong Wang
Jianxin Li
Jing Liu
Shenghai Zhong

The cross-device user matching task is to identify the behavior-logs (i.e., behavior sequences) on multiple devices that belong to one real person. Due to its anonymous and long-term properties, most previous methods of learning behavior embeddings cannot effectively capture two important features in the sequences, namely high-order connections and long-range dependencies. To this end, we propose a novel framework called Two-tier Graph Contextual Embedding (TGCE) to solve the above problems simultaneously. In the first tier, we construct behavior evolutionary graphs (BEGs) for behavior sequences and design an order-preserving neighbor aggregation network to collectively model transitions of behaviors with their neighbors. As repeated behaviors can be grouped into single nodes, our model joints neighboring environments around behaviors in a collective way, and behavior embeddings can be enriched. In the second tier, we further build scaled shortcut graphs (SSGs) by refining BEGs with random walk-based edge addition, then a position-aware graph attention network is further imposed on SSGs to facilitate fast information propagation. As distant graph nodes can be directly connected by shortcut edges, we can further capture long-range dependencies. By stacking two graph tiers, our approach can obtain graph contextual embeddings for behaviors to further improve user matching. Experimental results on the benchmark dataset show that our model outperforms various baselines in the user matching task. Our code is released on https://github.com/13061051/TGCE_2021.

Signed Bipartite Graph Neural Networks

Junjie Huang
Huawei Shen
Qi Cao
Shuchang Tao
Xueqi Cheng

Signed networks are such social networks having both positive and negative links. A lot of theories and algorithms have been developed to model such networks (e.g., balance theory). However, previous work mainly focuses on the unipartite signed networks where the nodes have the same type. Signed bipartite networks are different from classical signed networks, which contain two different node sets and signed links between two node sets. Signed bipartite networks can be commonly found in many fields including business, politics, and academics, but have been less studied. In this work, we firstly define the signed relationship of the same set of nodes and provide a new perspective for analyzing signed bipartite networks. Then we do some comprehensive analysis of balance theory from two perspectives on several real-world datasets. Specifically, in the peer review dataset, we find that the ratio of balanced isomorphism in signed bipartite networks increased after rebuttal phases. Guided by these two perspectives, we propose a novel Signed Bipartite Graph Neural Networks (SBGNNs) to learn node embeddings for signed bipartite networks. SBGNNs follow most GNNs message-passing scheme, but we design new message functions, aggregation functions, and update functions for signed bipartite networks. We validate the effectiveness of our model on four real-world datasets on Link Sign Prediction task, which is the main machine learning task for signed networks. Experimental results show that our SBGNN model achieves significant improvement compared with strong baseline methods, including feature-based methods and network embedding methods.

Spatio-Temporal-Social Multi-Feature-based Fine-Grained Hot Spots Prediction for Content Delivery Services in 5G Era

Shaoyuan Huang
Heng Zhang
Xiaofei Wang
Min Chen
Jianxin Li
Victor C. M. Leung

The arrival of 5G networks has extensively promoted the growth of content delivery services (CDSs). Understanding and predicting the spatio-temporal distribution of CDSs are beneficial to mobile users, Internet Content Providers and carriers. Conventional methods for predicting the spatio-temporal distribution of CDSs are mostly base-stations (BSs) centric, leading to weak generalization and spatio coarse-grained. To improve the spatio accuracy and generalization of modeling, we propose user-centric methods for CDSs spatio-temporal analysis. With geocoding and spatio-temporal graphs modeling algorithms, CDSs records collected from mobile devices are modeled as dynamic graphs with spatio-temporal attributes. Moreover, we propose a spatio-temporal-social multi-feature extraction framework for spatio fine-grained CDSs hot spots prediction. Specifically, an edge-enhanced graph convolutional block is designed to encode CDSs information based on the social relations and the spatio dependence features. Besides, we introduce the Long Short Term Memory (LSTM) to further capture the temporal dependence. Experiments on two real-world CDSs datasets verified the effectiveness of the proposed framework, and ablation studies are taken to evaluate the importance of each feature.

Mixed Attention Transformer for Leveraging Word-Level Knowledge to Neural Cross-Lingual Information Retrieval

Zhiqi Huang
Hamed Bonab
Sheikh Muhammad Sarwar
Razieh Rahimi
James Allan

Pre-trained contextualized representations offer great success for many downstream tasks, including document ranking. The multilingual versions of such pre-trained representations provide a possibility of jointly learning many languages with the same model. Although it is expected to gain big with such joint training, in the case of cross-lingual information retrieval (CLIR), the models under a multilingual setting are not achieving the same level of performance as those under a monolingual setting. We hypothesize that the performance drop is due to thetranslation gap between query and documents. In the monolingual retrieval task, because of the same lexical inputs, it is easier for model to identify the query terms that occurred in documents. However, in the multilingual pre-trained models that the words in different languages are projected into the same hyperspace, the model tends to "translate" query terms into related terms - i.e., terms that appear in a similar context - in addition to or sometimes rather than synonyms in the target language. This property is creating difficulties for the model to connect terms that co-occur in both query and document. To address this issue, we propose a novel Mixed Attention Transformer (MAT) that incorporates external word-level knowledge, such as a dictionary or translation table. We design a sandwich-like architecture to embed MAT into the recent transformer-based deep neural models. By encoding the translation knowledge into an attention matrix, the model with MAT is able to focus on the mutually translated words in the input sequence. Experimental results demonstrate the effectiveness of the external knowledge and the significant improvement of MAT-embedded neural reranking model on CLIR task.

A Neural Conversation Generation Model via Equivalent Shared Memory Investigation

Changzhen Ji
Yating Zhang
Xiaozhong Liu
Adam Jatowt
Changlong Sun
Conghui Zhu
Tiejun Zhao

Conversation generation as a challenging task in Natural Language Generation (NLG) has been increasingly attracting attention over the last years. A number of recent works adopted sequence-to-sequence structures along with external knowledge, which successfully enhanced the quality of generated conversations. Nevertheless, few works utilized the knowledge extracted from similar conversations for utterance generation. Taking conversations in customer service and court debate domains as examples, it is evident that essential entities/phrases, as well as their associated logic and inter-relationships, can be extracted and borrowed from similar conversation instances. Such information could provide useful signals for improving conversation generation. In this paper, we propose a novel reading and memory framework called Deep Reading Memory Network (DRMN) which is capable of remembering useful information of similar conversations for improving utterance generation. We apply our model to two large-scale conversation datasets of justice and e-commerce fields. Experiments prove that the proposed model outperforms the state-of-the-art approaches.

Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation

Luo Ji
Qi Qin
Bingqing Han
Hongxia Yang

Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In order to alleviate the cold-start issue, most existing methods introduce content and contextual information as the auxiliary information. Nevertheless, these methods assume the recommended items behave steadily over time, while in a typical E-commerce scenario, items generally have very different performances throughout their life period. In such a situation, it would be beneficial to consider the long-term return from the item perspective, which is usually ignored in conventional methods. Reinforcement learning (RL) naturally fits such a long-term optimization problem, in which the recommender could identify high potential items, proactively allocate more user impressions to boost their growth, therefore improve the multi-period cumulative gains. Inspired by this idea, we model the process as a Partially Observable and Controllable Markov Decision Process (POC-MDP), and propose an actor-critic RL framework (RL-LTV) to incorporate the item lifetime values (LTV) into the recommendation. In RL-LTV, the critic studies historical trajectories of items and predict the future LTV of fresh item, while the actor suggests a score-based policy which maximizes the future LTV expectation. Scores suggested by the actor are then combined with classical ranking scores in a dual-rank framework, therefore the recommendation is balanced with the LTV consideration. Our method outperforms the strong live baseline with a relative improvement of 8.67% and 18.03% on IPV and GMV of cold-start items, on one of the largest E-commerce platform.

Complex Temporal Question Answering on Knowledge Graphs

Zhen Jia
Soumajit Pramanik
Rishiraj Saha Roy
Gerhard Weikum

Question answering over knowledge graphs (KG-QA) is a vital topic in IR. Questions with temporal intent are a special class of practical importance, but have not received much attention in research. This work presents EXAQT, the first end-to-end system for answering complex temporal questions that have multiple entities and predicates, and associated temporal conditions. EXAQT answers natural language questions over KGs in two stages, one geared towards high recall, the other towards precision at top ranks. The first step computes question-relevant compact subgraphs within the KG, and judiciously enhances them with pertinent temporal facts, using Group Steiner Trees and fine-tuned BERT models. The second step constructs relational graph convolutional networks (R-GCNs) from the first step's output, and enhances the R-GCNs with time-aware entity embeddings and attention over temporal relations. We evaluate EXAQT on TimeQuestions, a large dataset of 16k temporal questions we compiled from a variety of general purpose KG-QA benchmarks. Results show that EXAQT outperforms three state-of-the-art systems for answering complex questions over KGs, thereby justifying specialized treatment of temporal QA.

Contrastive Pre-Training of GNNs on Heterogeneous Graphs

Xunqiang Jiang
Yuanfu Lu
Yuan Fang
Chuan Shi

While graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs, they often require a large amount of labeled data to achieve satisfactory performance, which is often expensive or unavailable. To relieve the label scarcity issue, some pre-training strategies have been devised for GNNs, to learn transferable knowledge from the universal structural properties of the graph. However, existing pre-training strategies are only designed for homogeneous graphs, in which each node and edge belongs to the same type. In contrast, a heterogeneous graph embodies rich semantics, as multiple types of nodes interact with each other via different kinds of edges, which are neglected by existing strategies. In this paper, we propose a novel Contrastive Pre-Training strategy of GNNs on Heterogeneous Graphs (CPT-HG), to capture both the semantic and structural properties in a self-supervised manner. Specifically, we design semantic-aware pre-training tasks at both the relation- and subgraph-levels, and further enhance their representativeness by employing contrastive learning. We conduct extensive experiments on three real-world heterogeneous graphs, and promising results demonstrate the superior ability of our CPT-HG to transfer knowledge to various downstream tasks via pre-training.

Graph Feature Gating Networks

Wei Jin
Xiaorui Liu
Yao Ma
Tyler Derr
Charu Aggarwal
Jiliang Tang

Graph neural networks (GNNs) have received tremendous attention due to their power in learning effective representations for graphs. Most GNNs follow a message-passing scheme where the node representations are updated by aggregating and transforming the information from the neighborhood. Meanwhile, they adopt the same strategy in aggregating the information from different feature dimensions. However, suggested by social dimension theory and spectral embedding, there are potential benefits to treat the dimensions differently during the aggregation process. In this work, we investigate to enable heterogeneous contributions of feature dimensions in GNNs. In particular, we propose a general graph feature gating network (GFGN) based on the graph signal denoising problem and then correspondingly introduce three graph filters under GFGN to allow different levels of contributions from feature dimensions. Extensive experiments on various real-world datasets demonstrate the effectiveness and robustness of the proposed frameworks.

Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual Bandits

Weiyu Ju
Wei Bao
Liming Ge
Dong Yuan

Recent advances in Deep Neural Networks (DNNs) have dramatically improved the accuracy of DNN inference, but also introduce larger latency. In this paper, we investigate how to utilize early exit, a novel method that allows inference to exit at earlier exit points at the cost of an acceptable amount of accuracy. Scheduling the optimal exit point on a per-instance basis is challenging because the realized performance (i.e., confidence and latency) of each exit point is random and the statistics vary in different scenarios. Moreover, the performance has dependencies among the exit points, further complicating the problem. Therefore, the optimal exit scheduling decision cannot be known in advance but should be learned in an online fashion. To this end, we propose Dynamic Early Exit (DEE), a real-time online learning algorithm based on contextual bandit analysis. DEE observes the performance at each exit point as context and decides whether to exit or keep processing. Unlike standard contextual bandit analyses, the rewards of the decisions in our problem are temporally dependent. Furthermore, the performances of the earlier exit points are inevitably explored more compared to the later ones, which poses an unbalance exploration-exploitation trade-off. DEE addresses the aforementioned challenges, where its regret per inference asymptotically approaches zero. We compare DEE with four benchmark schemes in the real-world experiment. The experiment result shows that DEE can improve the overall performance by up to 98.1% compared to the best benchmark scheme.

Norma: A Hybrid Feature Alignment for Class-Aware Unsupervised Domain Adaptation

Mahsa Keramati
Zahra Zohrevand
Uwe Glässer

Unsupervised domain adaptation is the problem of transferring extracted knowledge from a labeled source domain to an unlabeled target domain. To achieve discriminative domain adaptation recent studies take advantage of target sample pseudo-labels to impose class-aware distribution alignment across the source and target domains. Still, they have some shortcomings such as making decisions based on inaccurate pseudo-labeled samples that mislead the adaptation process. In this paper, we propose a progressive deep feature alignment, called Norma, to tackle class-aware unsupervised domain adaptation for image classification by enforcing inter-class compactness and intra-class discrepancy through a hybrid learning process. To this end, Norma's optimization process is defined based on a novel triplet loss which not only addresses soft prototype alignment but also pushes away multiple negative centroids. Also, to extract maximum discriminative domain knowledge per iteration, we propose a joint positive and negative learning procedure along with an uncertainty-guided progressive pseudo-labeling on the basis of prototype-based clustering and conditional probability. Our experimental results on several benchmarks demonstrate that Norma outperforms the state-of-the-art methods.

Semantic Concept Annotation for Tabular Data

Udayan Khurana
Sainyam Galhotra

Determining the semantic concepts of columns in tabular data is of use for many applications ranging from data integration, cleaning, search to feature engineering and model building in machine learning. Several prior works have proposed supervised learning-based or heuristic-based approaches to semantic type annotation. These techniques suffer from poor generalizability over a large number of concepts or examples. Recent neural network based supervised learning methods generalize to different datasets but require large amounts of curated training data and also present scalability issues. Furthermore, none of the known methods works well for numerical data. We present C2, a system that maps each column to a concept based on a maximum likelihood estimation approach through ensembles. It is able to effectively utilize vast amounts of, albeit somewhat noisy, openly available table corpora in addition to two popular knowledge graphs (Wikidata and DBpedia), to perform effective and efficient concept annotation for tabular data. Specifically, we utilize a collection of 32 million openly available webtables from several sources. We also present efficient indexing techniques for categorical string, numeric and mixed-type data, and novel techniques for table context utilization. We demonstrate the effectiveness and efficiency of C2 over available techniques on 9 real-world datasets containing a wide variety of concepts.

Query Reformulation for Descriptive Queries of Jargon Words Using a Knowledge Graph based on a Dictionary

Bosung Kim
Hyewon Choi
Haeun Yu
Youngjoong Ko

Query reformulation (QR) is a key factor in overcoming the problems faced by the lexical chasm in information retrieval (IR) systems. In particular, when searching for jargon, people tend to use descriptive queries, such as "a medical examination of the colon" rather than "colonoscopy," or they often use them interchangeably. Thus, transforming users' descriptive queries into appropriate jargon queries helps to retrieve more relevant documents. In this paper, we propose a new graph-based QR system that uses a dictionary, where the model does not require human-labeled data. Given a descriptive query, our system predicts the corresponding jargon word over a graph consisting of pairs of a headword and its description in the dictionary. First, we train a graph neural network to represent the relational properties between words and to infer a jargon word using compositional information of the descriptive query's words. Moreover, we propose a graph search model that finds the target node in real time using the relevance scores of neighborhood nodes. By adding this fast graph search model to the front of the proposed system, we reduce the reformulating time significantly. Experimental results on two datasets show that the proposed method can effectively reformulate descriptive queries to corresponding jargon words as well as improve retrieval performance under several search frameworks.

ALADDIN: Asymmetric Centralized Training for Distributed Deep Learning

Yunyong Ko
Kibong Choi
Hyunseung Jei
Dongwon Lee
Sang-Wook Kim

To speed up the training of massive deep neural network (DNN) models, distributed training has been widely studied. In general, a centralized training, a type of distributed training, suffers from the communication bottleneck between a parameter server (PS) and workers. On the other hand, a decentralized training suffers from increased parameter variance among workers that causes slower model convergence. Addressing this dilemma, in this work, we propose a novel centralized training algorithm, ALADDIN, employing "asymmetric" communication between PS and workers for the PS bottleneck problem and novel updating strategies for both local and global parameters to mitigate the increased variance problem. Through a convergence analysis, we show that the convergence rate of ALADDIN is O(1 ønk ) on the non-convex problem, where n is the number of workers and k is the number of training iterations. The empirical evaluation using ResNet-50 and VGG-16 models demonstrates that (1) ALADDIN shows significantly better training throughput with up to 191% and 34% improvement compared to a synchronous algorithm and the state-of-the-art decentralized algorithm, respectively, (2) models trained by ALADDIN converge to the accuracies, comparable to those of the synchronous algorithm, within the shortest time, and (3) the convergence of ALADDIN is robust under various heterogeneous environments.

Fast Extraction of Word Embedding from Q-contexts

Junsheng Kong
Weizhao Li
Zeyi Liu
Ben Liao
Jiezhong Qiu
Chang-Yu Hsieh
Yi Cai
Shengyu Zhang

The notion of word embedding plays a fundamental role in natural language processing (NLP). However, pre-training word embedding for very large-scale vocabulary is computationally challenging for most existing methods. In this work, we show that with merely a small fraction of contexts (Q-contexts) which are typical in the whole corpus (and their mutual information with words), one can construct high-quality word embedding with negligible errors. Mutual information between contexts and words can be encoded canonically as a sampling state, thus, Q-contexts can be fast constructed. Furthermore, we present an efficient and effective WEQ method, which is capable of extracting word embedding directly from these typical contexts. In practical scenarios, our algorithm runs 11 ~ 13 times faster than well-established methods. By comparing with well-known methods such as matrix factorization, word2vec, GloVe and fasttext, we demonstrate that our method achieves comparable performance on a variety of downstream NLP tasks, and in the meanwhile maintains run-time and resource advantages over all these baselines.

Efficient Multi-Scale Feature Generation Adaptive Network

Gwanghan Lee
Minha Kim
Minha Kim
Simon S. Woo

Recently, an early exit network, which dynamically adjusts the model complexity during inference time, has achieved remarkable performance and neural network efficiency to be used for various applications. So far, many researchers have been focusing on reducing the redundancy of input sample or model architecture. However, they were unsuccessful at resolving the performance drop of early classifiers that make predictions with insufficient high-level feature information. Consequently, the performance degradation of early classifiers had a devastating effect on the entire network performance sharing the backbone. Thus, in this paper, we propose an Efficient Multi-Scale Feature Generation Adaptive Network (EMGNet), which not only reduced the redundancy of the architecture but also generates multi-scale features to improve the performance of the early exit network. Our approach renders multi-scale feature generation highly efficient through sharing weights in the center of the convolution kernel. Also, our gating network effectively learns to automatically determine the proper multi-scale feature ratio required for each convolution layer in different locations of the network. We demonstrate that our proposed model outperforms the state-of-the-art adaptive networks on CIFAR10, CIFAR100, and ImageNet datasets. The implementation code is available at https://github.com/lee-gwang/EMGNet

Certifying One-Phase Technology-Assisted Reviews

David D. Lewis
Eugene Yang
Ophir Frieder

Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stopping rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stopping rules for one-phase TAR. We further show theoretically and empirically that overshooting a recall target, which has been treated as innocuous or desirable in past evaluations of stopping rules, is a major source of excess cost in one-phase TAR workflows. Counterintuitively, incurring a larger sampling cost to reduce excess recall leads to lower total cost in almost all scenarios.

Detecting the Fake Candidate Instances: Ambiguous Label Learning with Generative Adversarial Networks

Changchun Li
Ximing Li
Jihong Ouyang
Yiming Wang

Ambiguous Label Learning (ALL), as an emerging paradigm of weakly supervised learning, aims to induce the prediction model from training datasets with ambiguous supervision, where, specifically, each training instance is annotated with a set of candidate labels but only one is valid. To handle this task, the existing shallow methods mainly disambiguate the candidate labels by leveraging various regularization techniques. Inspired by the great success of deep generative adversarial networks, we apply it to perform effective candidate label disambiguation from a new instance-pivoted perspective. Specifically, for each ALL instance, we recombine its feature representation with each of candidate labels to generate a set of candidate instances, where only one is real and all others are fake. We formulate a unified adversarial objective with respect to three players, i.e., a discriminator, a generator, and a classifier. The discriminator is used to detect the fake candidate instances, so that the classifier can be trained without them. With this insight, we develop a novel ALL method, namely Adversarial Ambiguous Label Learning with Candidate Instance Detection (A2L2CID). Theoretically, we analyze that there is a global equilibrium point between the three players. Empirically, extensive experimental results indicate that A2L2CID outperforms the state-of-the-art ALL methods.

Integrating Static and Time-Series Data in Deep Recurrent Models for Oncology Early Warning Systems

Dingwen Li
Patrick Lyons
Jeff Klaus
Brian Gage
Marin Kollef
Chenyang Lu

Machine learning techniques have shown promise in predicting clinical deterioration of hospitalized patients based on electronic health record (EHR). However, building accurate early warning systems (EWS) remains challenging in practice. EHRs are heterogeneous, comprising both static and time-series data. Moreover, missing values are prevalent in both static and time-series data, and the missingness of certain data can be correlated to clinical outcomes. This paper proposes a novel approach for integrating static and time-series clinical data in deep recurrent models through multi-modal fusion. Furthermore, we exploit the correlation of static and time-series data through cross-modal imputation in an integrated recurrent model. We apply the proposed approaches to a dataset extracted from the EHR of 20,700 hospitalizations of adult oncology patients in a research hospital. The experiments demonstrate the proposed approaches outperform the state-of-the-art models in terms of predictive accuracy in generating early warnings for clinical deterioration. A case study further establishes the efficacy of the predictive model for early warning systems under realistic clinical settings.

Cache-based GNN System for Dynamic Graphs

Haoyang Li
Lei Chen

Graph Neural Networks (GNNs) have achieved great success in downstream applications due to their ability to learn node representations. However, in many applications, graphs are not static. They often evolve with changes, such as the adjustment of node attributes or graph structures. These changes require node representations to be updated accordingly. It is non-trivial to apply current GNNs to update node representations in a scalable manner. Recent research proposes two types of solutions. The first solution, sampling neighbors for the influenced nodes, requires expensive processing for each node. The second solution, reducing the repeated computations by merging the shared neighbors, cannot speed up the updating process if the influenced nodes do not share neighbors. Most importantly, the above solutions ignore the hidden representations obtained in the previous times that can be reused to accelerate the representation updating. In this paper, we propose a general cache-based GNN system to accelerate the representation updating. Specifically, we cache a set of hidden representations obtained in the previous times, and then reuse them in the next time. To identify valuable hidden representations, we first estimate the number of hidden representations and their combinations that can be reused. Secondly, we formulate the k-assembler problem that selects k representations to maximize the saved time for the next updating process. Experiments on three real-world graphs show that the cache-based GNN system can significantly speed up the representation updating for various GNNs.

Privacy-Preserving Batch-based Task Assignment in Spatial Crowdsourcing with Untrusted Server

Maocheng Li
Jiachuan Wang
Libin Zheng
Han Wu
Peng Cheng
Lei Chen
Xuemin Lin

In this paper, we study the privacy-preserving task assignment problem in spatial crowdsourcing, where the locations of both workers and tasks, prior to their release to the server, are perturbed with Geo-Indistinguishability (a differential privacy notion for location-based systems). Different from the previously studied online setting, where each task is assigned immediately upon arrival, we target the batch-based setting, where the server maximizes the number of successfully assigned tasks after a batch of tasks arrive. To achieve this goal, we propose the k-Switch solution, which first divides the workers into small groups based on the perturbed distance between workers/tasks, and then utilizes Homomorphic Encryption (HE) based secure computation to enhance the task assignment. Furthermore, we expedite HE-based computation by limiting the size of the small groups under k. Extensive experiments demonstrate that, in terms of the number of successfully assigned tasks, the k-Switch solution improves batch-based baselines by 5.9X and the existing online solution by 1.74X, with no privacy leak.

Block Access Pattern Discovery via Compressed Full Tensor Transformer

Xing Li
Qiquan Shi
Gang Hu
Lei Chen
Hui Mao
Yiyuan Yang
Mingxuan Yuan
Jia Zeng
Zhuo Cheng

The discovery and prediction of block access patterns in hybrid storage systems is of crucial importance for effective tier management. Existing methods are usually based on heuristics and unable to handle complex patterns. This work newly introduces transformer to block access pattern prediction. We remark that block accesses in the tier management systems are aggregated temporally and spatially as multivariate time series of block access frequency, so the runtime requirements are relaxed, making complex models applicable for the deployment. Moreover, enormous and rarely accessed blocks in storage systems and the structure of traditional transformer models would result in millions of redundant parameters and make them impractical to be deployed. We incorporate Tensor-Train Decomposition (TTD) with transformer and propose the Compressed Full Tenor Transformer (CFTT), in which all linear layers in the vanilla transformer are replaced with tensor-train layers. Weights of input and output layers are shared to further reduce parameters and reuse knowledge implicitly. CFTT can significantly reduce the model size and computation cost, which is critical to save storage space and inference time. Extensive experiments are conducted on synthetic and real-world datasets. The results demonstrate that transformers achieve state-of-the-art performance stably in terms of top-k hit rates. Moreover, the proposed CFTT compresses transformers 16× to 461× and speeds up inference 5× without sacrificing performance on the whole, which facilitates its applications in tier management in hybrid storage systems.

Lightweight Self-Attentive Sequential Recommendation

Yang Li
Tong Chen
Peng-Fei Zhang
Hongzhi Yin

Modern deep neural networks (DNNs) have greatly facilitated the development of sequential recommender systems by achieving state-of-the-art recommendation performance on various sequential recommendation tasks. Given a sequence of interacted items, existing DNN-based sequential recommenders commonly embed each item into a unique vector to support subsequent computations of the user interest. However, due to the potentially large number of items, the over-parameterised item embedding matrix of a sequential recommender has become a memory bottleneck for efficient deployment in resource-constrained environments, e.g., smartphones and other edge devices. Furthermore, we observe that the widely-used multi-head self-attention, though being effective in modelling sequential dependencies among items, heavily relies on redundant attention units to fully capture both global and local item-item transition patterns within a sequence.

In this paper, we introduce a novel lightweight self-attentive network (LSAN) for sequential recommendation. To aggressively compress the original embedding matrix, LSAN leverages the notion of compositional embeddings, where each item embedding is composed by merging a group of selected base embedding vectors derived from substantially smaller embedding matrices. Meanwhile, to account for the intrinsic dynamics of each item, we further propose a temporal context-aware embedding composition scheme. Besides, we develop an innovative twin-attention network that alleviates the redundancy of the traditional multi-head self-attention while retaining full capacity for capturing long- and short-term (i.e., global and local) item dependencies. Comprehensive experiments demonstrate that LSAN significantly advances the accuracy and memory efficiency of existing sequential recommenders.

Learning to Cluster via Same-Cluster Queries

Yi Li
Yan Song
Qin Zhang

We study the problem of learning to cluster data points using an oracle which can answer same-cluster queries. Different from previous approaches, we do not assume that the total number of clusters is known at the beginning and do not require that the true clusters are consistent with a predefined objective function such as the K-means. These relaxations are critical from the practical perspective and, meanwhile, make the problem more challenging. We propose two algorithms with provable theoretical guarantees and verify their effectiveness via an extensive set of experiments on both synthetic and real-world data.

Hyperbolic Hypergraphs for Sequential Recommendation

Yicong Li
Hongxu Chen
Xiangguo Sun
Zhenchao Sun
Lin Li
Lizhen Cui
Philip S. Yu
Guandong Xu

Hypergraphs have been becoming a popular choice to model complex, non-pairwise, and higher-order interactions for recommender systems. However, compared with traditional graph-based methods, the constructed hypergraphs are usually much sparser, which leads to a dilemma when balancing the benefits of hypergraphs and the modelling difficulty. Moreover, existing sequential hypergraph recommendation overlooks the temporal modelling among user relationships, which neglects rich social signals from the recommendation data. To tackle the above shortcomings of the existing hypergraph-based sequential recommendations, we propose a novel architecture named Hyperbolic Hypergraph representation learning method for Sequential Recommendation (H2SeqRec) with the pre-training phase. Specifically, we design three self-supervised tasks to obtain the pre-training item embeddings to feed or fuse into the following recommendation architecture (with two ways to use the pre-trained embeddings). In the recommendation phase, we learn multi-scale item embeddings via a hierarchical structure to capture multiple time-span information. To alleviate the negative impact of sparse hypergraphs, we utilize a hyperbolic space-based hypergraph convolutional neural network to learn the dynamic item embeddings. Also, we design an item enhancement module to capture dynamic social information at each timestamp to improve effectiveness. Extensive experiments are conducted on two real-world datasets to prove the effectiveness and high performance of the model.

Extracting Attentive Social Temporal Excitation for Sequential Recommendation

Yunzhe Li
Yue Ding
Bo Chen
Xin Xin
Yule Wang
Yuxiang Shi
Ruiming Tang
Dong Wang

In collaborative filtering, it is an important way to make full use of social information to improve the recommendation quality, which has been proved to be effective because user behavior will be affected by her friends. However, existing works leverage the social relationship to aggregate user features from friends' historical behavior sequences in a user-levelindirect paradigm. A significant defect of the indirect paradigm is that it ignores the temporal relationships between behavior events across users. In this paper, we propose a novel time-aware sequential recommendation framework called Social Temporal Excitation Networks (STEN), which introduces temporal point processes to model the fine-grained impact of friends' behaviors on the user's dynamic interests in an event-leveldirect paradigm. Moreover, we propose to decompose the temporal effect in sequential recommendation into social mutual temporal effect and ego temporal effect. Specifically, we employ a social heterogeneous graph embedding layer to refine user representation via structural information. To enhance temporal information propagation, STEN directly extracts the fine-grained temporal mutual influence of friends' behaviors through themutually exciting temporal network. Besides, user's dynamic interests are captured through theself-exciting temporal network. Extensive experiments on three real-world datasets show that STEN outperforms state-of-the-art baseline methods. Moreover, STEN provides event-level recommendation explainability, which is also illustrated experimentally.

Unsupervised Large-Scale Social Network Alignment via Cross Network Embedding

Zhehan Liang
Yu Rong
Chenxin Li
Yunlong Zhang
Yue Huang
Tingyang Xu
Xinghao Ding
Junzhou Huang

Nowadays, it is common for a person to possess different identities on multiple social platforms. Social network alignment aims to match the identities that from different networks. Recently, unsupervised network alignment methods have received significant attention since no identity anchor is required. However, to capture the relevance between identities, the existing unsupervised methods generally rely heavily on user profiles, which is unobtainable and unreliable in real-world scenarios. In this paper, we propose an unsupervised alignment framework named Large-Scale Network Alignment (LSNA) to integrate the network information and reduce the requirement on user profile. The embedding module of LSNA, named Cross Network Embedding Model (CNEM), aims to integrate the topology information and the network correlation to simultaneously guide the embedding process. Moreover, in order to adapt LSNA to large-scale networks, we propose a network disassembling strategy to divide the costly large-scale network alignment problem into multiple executable sub-problems. The proposed method is evaluated over multiple real-world social network datasets, and the results demonstrate that the proposed method outperforms the state-of-the-art methods.

Grammatical Error Correction with Dependency Distance

Haowen Lin
Jinlong Li
Xu Zhang
Huanhuan Chen

Grammatical Error Correction (GEC) task is always considered as low resource machine translation task which translates a sentence in an ungrammatical language to a grammatical language. As the state-of-the-art approach to GEC task, transformer-based neural machine translation model takes input sentence as a token sequence without sentence's structure information, and may be misled by some strange ungrammatical contexts. In response, to lay more attention on a given token's correct collocation rather than the misleading tokens, we propose dependent self-attention to relatively increase the attention score between correct collocations according to the dependency distance between tokens. However, as the source sentence is ungrammatical in GEC task, the correct collocations can hardly be extracted by normal dependency parser. Therefore, we propose dependency parser for ungrammatical sentence to get the dependency distance between tokens in the ungrammatical sentence. Our method achieves competitive results on both BEA-2019 shared task, CoNLL-2014 shared task and JFLEG test sets.

Deep Self-Adaptive Hashing for Image Retrieval

Qinghong Lin
Xiaojun Chen
Qin Zhang
Shangxuan Tian
Yudong Chen

Hashing technology has been widely used in image retrieval due to its computational and storage efficiency. Recently, deep unsupervised hashing methods have attracted increasing attention due to the high cost of human annotations in the real world and the superiority of deep learning technology. However, most deep unsupervised hashing methods usually pre-compute a similarity matrix to model the pairwise relationship in the pre-trained feature space. Then this similarity matrix would be used to guide hash learning, in which most of the data pairs are treated equivalently. The above process is confronted with the following defects:1) The pre-computed similarity matrix is inalterable and disconnected from the hash learning process, which cannot explore the underlying semantic information. 2) The informative data pairs may be buried by the large number of less-informative data pairs. To solve the aforementioned problems, we propose a Deep Self-Adaptive Hashing(DSAH) model to adaptively capture the semantic information with two special designs: Adaptive Neighbor Discovery(AND) and Pairwise Information Content(PIC). Firstly, we adopt the AND to initially construct a neighborhood-based similarity matrix, and then refine this initial similarity matrix with a novel update strategy to further investigate the semantic structure behind the learned representation. Secondly, we measure the priorities of data pairs with PIC and assign adaptive weights to them, which is relies on the assumption that more dissimilar data pairs contain more discriminative information for hash learning. Extensive experiments on several datasets demonstrate that the above two technologies facilitate the deep hashing model to achieve superior performance.

Multi-Relational Graph based Heterogeneous Multi-Task Learning in Community Question Answering

Zizheng Lin
Haowen Ke
Ngo-Yin Wong
Jiaxin Bai
Yangqiu Song
Huan Zhao
Junpeng Ye

Various data mining tasks have been proposed to study Community Question Answering (CQA) platforms like Stack Overflow. The relatedness between some of these tasks provides useful learning signals to each other via Multi-Task Learning (MTL). However, due to the high heterogeneity of these tasks, few existing works manage to jointly solve them in a unified framework. To tackle this challenge, we develop a multi-relational graph based MTL model called Heterogeneous Multi-Task Graph Isomorphism Network (HMTGIN) which efficiently solves heterogeneous CQA tasks. In each training forward pass, HMTGIN embeds the input CQA forum graph by an extension of Graph Isomorphism Network and skip connections. The embeddings are then shared across all task-specific output layers to compute respective losses. Moreover, two cross-task constraints based on the domain knowledge about tasks' relationships are used to regularize the joint learning. In the evaluation, the embeddings are shared among different task-specific output layers to make corresponding predictions. To the best of our knowledge, HMTGIN is the first MTL model capable of tackling CQA tasks from the aspect of multi-relational graphs. To evaluate HMTGIN's effectiveness, we build a novel large-scale multi-relational graph CQA dataset with over two million nodes from Stack Overflow. Extensive experiments show that: (1) HMTGIN is superior to all baselines on five tasks; (2) The proposed MTL strategy and cross-task constraints have substantial advantages.

Discovering Urban Functions of High-Definition Zoning with Continuous Human Traces

Chunyu Liu
Yongjian Yang
Zijun Yao
Yuanbo Xu
Weitong Chen
Lin Yue
Haomeng Wu

Identifying the dynamic functions of different urban zones enables a variety of smart city applications, such as intelligent urban planning, real-time traffic scheduling, and community precision management. Traditional urban function research using government administrative zoning systems is often conducted in a coarse resolution with fixed split, and ignore the reshaping of zones by city growth. To solve this problem, we propose a two-stage framework in order to represent the high-definition distribution of urban function across the city, by analyzing continuous human traces extracted from the dense, widespread, and full-time cellular data. At the representation stage, we embed the locations of base stations by modeling the user movements with staying and transfer events, along with the consideration of dynamic trip purposes in continuous human traces. At the annotation stage, we first divide the city into the finest unit zones and each covers at least one base station. By clustering the base stations, we further group the unit zones into functional zones. Last, we annotate functional zones based on the local point-of-interest (POI) information. In experiments, we evaluate the proposed high-definition function study in two tasks: (i) in-zone crowd flow prediction, and (ii) zone-enhanced POI recommendation. The results demonstrate the advantage of the proposed method with both the effectiveness of city split and the high-quality function annotation.

SCMGR: Using Social Context and Multi-Granularity Relations for Unsupervised Social Summarization

Huanyu Liu
Ruifang He
Liangliang Zhao
Haocheng Wang
Ruifang Wang

Social summarization aims to produce a concise summary that describes the core content of a collection of posts on a specific topic. Existing methods tend to produce sparse or ambiguous representations of posts due to only using short and informal text content. Latest researches use social relations to improve diversity of summaries, yet they model social relations as a regularization item, which has poor flexibility and generalization. Those methods could not embody the deep semantic and social interactions among posts, making summaries still suffer from redundancy. We propose to use Social Context and Multi-Granularity Relations (SCMGR) to improve unsupervised social summarization. It learns more informative representations of posts considering both text semantics and social structure information without any annotated data. First, we design two sociologically motivated meta-paths to construct a social context graph among posts, and adopt a graph convolutional network to aggregate social context information from neighbors. Second, we design a multi-granularity relation decoder to capture the deeper semantic and social interactions from post-word and post-post aspects respectively, which can provide guidance for summary selection from semantic and social structure perspectives. Finally, a sparse reconstruction-based extractor is used to select posts that can best reconstruct original content and social network structure as summaries. Our approach improves the coverage and diversity of summaries. Experimental results on both English and Chinese corpora prove the effectiveness of our model.

Mining Cross Features for Financial Credit Risk Assessment

Qiang Liu
Zhaocheng Liu
Haoli Zhang
Yuntian Chen
Jun Zhu

For reliability, machine learning models in some areas, e.g., finance and healthcare, require to be both accurate and globally interpretable. Among them, credit risk assessment is a major application of machine learning for financial institutions to evaluate credit of users and detect default or fraud. Simple white-box models, such as Logistic Regression (LR), are usually used for credit risk assessment, but not powerful enough to model complex nonlinear interactions among features. In contrast, complex black-box models are powerful at modeling, but lack of interpretability, especially global interpretability. Fortunately, automatic feature crossing is a promising way to find cross features to make simple classifiers to be more accurate without heavy handcrafted feature engineering. However, existing automatic feature crossing methods have problems in efficiency on credit risk assessment, for corresponding data usually contains hundreds of feature fields.

In this work, we find local interpretations in Deep Neural Networks (DNNs) of a specific feature are usually inconsistent among different samples. We demonstrate this is caused by nonlinear feature interactions in the hidden layers of DNN. Thus, we can mine feature interactions in DNN, and use them as cross features in LR. This will result in mining cross features more efficiently. Accordingly, we propose a novel automatic feature crossing method called DNN2LR. The final model, which is a LR model empowered with cross features, generated by DNN2LR is a white-box model. We conduct experiments on both public and business datasets from real-world credit risk assessment applications, which show that, DNN2LR outperform both conventional models used for credit assessment and several feature crossing methods. Moreover, comparing with state-of-the-art feature crossing methods, i.e., AutoCross, the proposed DNN2LR method accelerates the speed by about 10 to 40 times on financial credit assessment datasets, which contain hundreds of feature fields.

A Knowledge-Aware Recommender with Attention-Enhanced Dynamic Convolutional Network

Yi Liu
Bohan Li
Yalei Zang
Aoran Li
Hongzhi Yin

Sequential recommendation systems seek to learn users' preferences to predict their next actions based on the items engaged recently. Static behavior of users requires a long time to form, but short-term interactions with items usually meet some actual needs in reality and are more variable. RNN-based models are always constrained by the strong order assumption and are hard to model the complex and changeable data flexibly. Most of the CNN-based models are limited to the fixed convolutional kernel. All these methods are suboptimal when modeling the dynamics of item-to-item transitions. It is difficult to describe the items with complex relations and extract the fine-grained user preferences from the interaction sequence. To address these issues, we propose a knowledge-aware sequential recommender with the attention-enhanced dynamic convolutional network (KAeDCN). Our model combines the dynamic convolutional network with attention mechanisms to capture changing dependencies in the sequence. Meanwhile, we enhance the representations of items with Knowledge Graph (KG) information through an information fusion module to capture the fine-grained user preferences. The experiments on four public datasets demonstrate that KAeDCN outperforms most of the state-of-the-art sequential recommenders. Furthermore, experimental results also prove that KAeDCN can enhance the representations of items effectively and improve the extractability of sequential dependencies.

ECMA: An Efficient Convoy Mining Algorithm for Moving Objects

Yiyang Liu
Hua Dai
Bohan Li
Jiawei Li
Geng Yang
Jun Wang

With the popularity of mobile devices equipped with positioning devices, it is convenient to obtain enormous amounts of trajectory data. The development promotes the study of extracting moving patterns from trajectory data of moving objects. One such pattern is the convoy, which refers to a group of objects moving together for a period of time. The existing convoy mining algorithms have a large time cost because they adopt a density-based clustering algorithm over global objects. In this paper, we propose an efficient convoy mining algorithm (ECMA) that adopts the divide-and-conquer methodology. A block-based partition model (BP-Model) is designed to divide objects into multiple maximized connected nonempty block areas (MOBAs). The convoy mining problem is then solved by processing each MOBA sequentially, which significantly reduces the time cost of convoy mining. In the experiments, we evaluate the performance of our algorithm on real-world datasets. The results show that the ECMA is more efficient than existing convoy mining algorithms.

Concept-Aware Denoising Graph Neural Network for Micro-Video Recommendation

Yiyu Liu
Qian Liu
Yu Tian
Changping Wang
Yanan Niu
Yang Song
Chenliang Li

Recently, micro-video sharing platforms such as Kuaishou and Tiktok have become a major source of information for people's lives. Thanks to the large traffic volume, short video lifespan and streaming fashion of these services, it has become more and more pressing to improve the existing recommender systems to accommodate these challenges in a cost-effective way. In this paper, we propose a novel concept-aware denoising graph neural network (named Conde) for micro-video recommendation. Conde consists of a three-phase graph convolution process to derive user and micro-video representations: warm-up propagation, graph denoising and preference refinement. A heterogeneous tripartite graph is constructed by connecting user nodes with video nodes, and video nodes with associated concept nodes, extracted from captions and comments of the videos. To address the noisy information in the graph, we introduce a user-oriented graph denoising phase to extract a subgraph which can better reflect the user's preference. Despite the main focus of micro-video recommendation in this paper, we also show that our method can be generalized to other types of tasks. Therefore, we also conduct empirical studies on a well-known public E-commerce dataset. The experimental results suggest that the proposed Conde achieves significantly better recommendation performance than the existing state-of-the-art solutions.

Meta Hyperparameter Optimization with Adversarial Proxy Subsets Sampling

Yue Liu
Xin Wang
Xue Xu
Jianbo Yang
Wenwu Zhu

Hyperparameter optimization (HPO), aiming at automatically searching optimal hyperparameter configurations, has attracted increasing attention in the machine learning community. HPO generally suffers from high searching costs when dealing with large-scale real-world datasets since training the model with a certain hyperparameter configuration is time-consuming. Existing works suggest sampling subsets uniformly to represent the full dataset for HPO but ignoring the complex and dynamic distribution in real-world scenarios and the exploration of hyperparameter transfer. To tackle this problem, we propose a novel meta hyperparameter optimization model with an adversarial proxy subsets sampling strategy (Meta-HPO), which can transfer hyperparameters optimized on the sampled proxy subsets to the full dataset and further adapt to the new data in an out-of-sample updating manner. In particular, a perturbation-aware adversarial sampling strategy is designed to select the proxy subsets that significantly influence the model performance. With the searched hyperparameter configurations and corresponding performance scores on the proxy subsets, we propose a meta transfer framework, named "hp-learner'', to build the connection between the distribution of dataset and the optimal hyperparameter configuration. Our Meta-HPO provides a flexible and efficient hyperparameter optimization algorithm. Extensive experiments on real-world datasets validate the advantages of our proposed Meta-HPO model against existing state-of-the-art benchmarks.

POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling

Zeyang Liu
Ke Zhou
Jiaxin Mao
Max L. Wilson

Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems. Evaluating such systems is very challenging since search results are presented in the format of natural language sentences. Given the unlimited number of possible responses, collecting relevance assessments for all the possible responses is infeasible. In this paper, we propose POSSCORE, a simple yet effective automatic evaluation method for conversational search. The proposed embedding-based metric takes the influence of part of speech (POS) of the terms in the response into account. To the best knowledge, our work is the first to systematically demonstrate the importance of incorporating syntactic information, such as POS labels, for conversational search evaluation. Experimental results demonstrate that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.

Tracking Semantic Evolutionary Changes in Large-Scale Ontological Knowledge Bases

Zhao Liu
Chang Lu
Ghadah Alghamdi
Renate A. Schmidt
Yizheng Zhao

This paper is concerned with the problem of computing the semantic difference between different versions of large-scale ontological knowledge bases using a uniform interpolation (UI) approach. The semantic difference between two versions of an ontology are the axioms entailed by one version but not the other version, reflecting the evolutionary changes of the content of the ontology. In general, computing such axioms is not computationally feasible, since there are infinitely many of them. UI is an advanced reasoning technique that seeks to create restricted views of ontologies; it provides an effective means for computing a finite representation of the difference between two ontologies. While existing UI methods are designed for languages that are either more expressive or less expressive than the description logic ELH, the underlying language of typical large-scale ontologies, in this paper, we introduce a practical UI method tailored for the task of computing the semantic difference in large-scale ELH-ontologies. The method is terminating, sound, and can always compute UI results possibly including fresh definer symbols. Two case studies on different versions of the SNOMED CT terminology show that the method has overcome major limitations of existing UI methods and can be used to reveal modeling changes that have occurred over successive releases of SNOMED CT.

Graph Embedding Based on Euclidean Distance Matrix and its Applications

Zhihong Liu
Huiyu Li
Ruixin Li
Yong Zeng
Jianfeng Ma

Graph embedding converts a graph into a multi-dimensional space in which the graph structural information or graph properties are maximumly preserved. It is an effective and efficient way to provide users a deeper understanding of what is behind the data and thus can benefit a lot of useful applications. However, most graph embedding methods suffer from high computation and space costs. In this paper, we present a simple graph embedding method that directly embeds the graph into its Euclidean distance space. This method does not require the learned representations to be low dimensional, but it has several good characteristics. We find that the centrality of nodes/edges can be represented by the position of nodes or the length of edges when a graph is embedded. Besides, the edge length is closely related to the density of regions in a graph. We then apply this graph embedding method into graph analytics, such as community detection, graph compression, and wormhole detection, etc. Our evaluation shows the effectiveness and efficiency of this embedding method and contends that it yields a promising approach to graph analytics.

BNN: Boosting Neural Network Framework Utilizing Limited Amount of Data

Amit Livne
Roy Dor
Bracha Shapira
Lior Rokach

Deep learning (DL) algorithms have played a major role in achieving state-of-the-art (SOTA) performance in various learning applications, including computer vision, natural language processing, and recommendation systems (RSs). However, these methods are based on a vast amount of data and do not perform as well when there is a limited amount of data available. Moreover, some of these applications (e.g., RSs) suffer from other issues such as data sparsity and the cold-start problem. While recent research on RSs used DL models based on side information (SI) (e.g., product reviews, film plots, etc.) to tackle these challenges, we propose boosting neural network (BNN), a new DL framework for capturing complex patterns, which requires just a limited amount of data. Unlike conventional boosting, BNN does not sum the predictions generated by its components. Instead, it uses these predictions as new SI features which enhances accuracy. Our framework can be utilized for many problems, including classification, regression, and ranking. In this paper, we demonstrate BNN's use for addressing a classification task. Comprehensive experiments conducted to illustrate BNN's effectiveness on three real-world datasets demonstrated its ability to outperform existing SOTA models for classification tasks (e.g., clickthrough rate prediction).

Social Recommendation with Self-Supervised Metagraph Informax Network

Xiaoling Long
Chao Huang
Yong Xu
Huance Xu
Peng Dai
Lianghao Xia
Liefeng Bo

In recent years, researchers attempt to utilize online social information to alleviate data sparsity for collaborative filtering, based on the rationale that social networks offers the insights to understand the behavioral patterns. However, due to the overlook of inter-dependent knowledge across items (e.g., knowledge graph dependencies between products), existing social recommender systems are insufficient to distill the heterogeneous collaborative signals from both user and item side. In this work, we propose Self- Supervised Metagraph Informax Network (SMIN) which investigates the potential of jointly incorporating social- and knowledge-aware relational structures into the user preference representation framework. To model relation heterogeneity, we design a metapath-guided heterogeneous graph neural network to aggregate feature embeddings from different types of meta-relations across users and items, empowering SMIN to maintain dedicated representations for multifaceted user- and item-wise dependencies. Additionally, to inject high-order collaborative signals into recommendation, we generalize the mutual information learning paradigm from vector space to a self-supervised graph-based collaborative filtering. This endows the expressive modeling of user-item interactive patterns, by exploring global-level collaborative relations and underlying isomorphic transformation property of graph topology. Experimental results on several real-world datasets demonstrate the effectiveness of our model over various state-of-the-art recommendation methods. Further analysis provides insights into the performance superiority of our new recommendation framework. We release our source code at https://github.com/SocialRecsys/SMIN.

Detecting Communities from Heterogeneous Graphs: A Context Path-based Graph Neural Network Model

Linhao Luo
Yixiang Fang
Xin Cao
Xiaofeng Zhang
Wenjie Zhang

Community detection, aiming to group the graph nodes into clusters with dense inner-connection, is a fundamental graph mining task. Recently, it has been studied on the heterogeneous graph, which contains multiple types of nodes and edges, posing great challenges for modeling the high-order relationship between nodes. With the surge of graph embedding mechanism, it has also been adopted to community detection. A remarkable group of works use the meta-path to capture the high-order relationship between nodes and embed them into nodes' embedding to facilitate community detection. However, defining meaningful meta-paths requires much domain knowledge, which largely limits their applications, especially on schema-rich heterogeneous graphs like knowledge graphs. To alleviate this issue, in this paper, we propose to exploit the context path to capture the high-order relationship between nodes, and build a Context Path-based Graph Neural Network (CP-GNN) model. It recursively embeds the high-order relationship between nodes into the node embedding with attention mechanisms to discriminate the importance of different relationships. By maximizing the expectation of the co-occurrence of nodes connected by context paths, the model can learn the nodes' embeddings that both well preserve the high-order relationship between nodes and are helpful for community detection. Extensive experimental results on four real-world datasets show that CP-GNN outperforms the state-of-the-art community detection methods1.

GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation

Changsheng Ma
Xiangliang Zhang

Generating novel molecules with desired properties is a fundamental problem in modern drug discovery. This is a challenging problem because it requires the optimization of the given objectives while obeying the rules of chemical valence. An effective approach is to incorporate the molecular graph with deep generative models. However, recent generative models with high-performance are still computationally expensive. In this paper, we propose GF-VAE, a flow-based variational autoencoder (VAE) model for molecular graph generation. Specifically, the model equips VAE a lightweight flow model as its decoder, in which, the encoder aims to accelerate the training process of the decoder, while the decoder in turns to optimize the performance of the encoder. Thanks to the invertibility of flow model, the generation process is easily accomplished by reversing the decoder. Additionally, the final generated molecules are processed by validity correction. Therefore, our GF-VAE inherits the advantages of both VAE and flow-based methods. We validate our model on molecule generation and reconstruction, smoothness of learned latent space, property optimization and constrained property optimization. The results show that our model achieves state-of-the-arts performance on these tasks. Moreover, the time performance of GF-VAE on two classical datasets can achieve 31.3% and 62.9% improvements separately than the state-of-the-art model.

LEReg: Empower Graph Neural Networks with Local Energy Regularization

Xiaojun Ma
Hanyue Chen
Guojie Song

Researches on analyzing graphs with Graph Neural Networks (GNNs) have been receiving more and more attention because of the great expressive power of graphs. GNNs map the adjacency matrix and node features to node representations by message passing through edges on each convolution layer. However, the message passed through GNNs is not always beneficial for all parts in a graph. Specifically, as the data distribution is different over the graph, the receptive field (the farthest nodes that a node can obtain information from) needed to gather information is also different. Existing GNNs treat all parts of the graph uniformly, which makes it difficult to adaptively pass the most informative message for each unique part. To solve this problem, we propose two regularization terms that consider message passing locally: (1) Intra-Energy Reg and (2) Inter-Energy Reg. Through experiments and theoretical discussion, we first show that the speed of smoothing of different parts varies enormously and the topology of each part affects the way of smoothing. With Intra-Energy Reg, we strengthen the message passing within each part, which is beneficial for getting more useful information. With Inter-Energy Reg, we improve the ability of GNNs to distinguish different nodes. With the proposed two regularization terms, GNNs are able to filter the most useful information adaptively, learn more robustly and gain higher expressiveness. Moreover, the proposed LEReg can be easily applied to other GNN models with plug-and-play characteristics. Extensive experiments on several benchmarks verify that GNNs with LEReg outperform or match the state-of-the-art methods. The effectiveness and efficiency are also empirically visualized with elaborate experiments.

A Unified View on Graph Neural Networks as Graph Signal Denoising

Yao Ma
Xiaorui Liu
Tong Zhao
Yozen Liu
Jiliang Tang
Neil Shah

Graph Neural Networks (GNNs) have risen to prominence in learning representations for graph structured data. A single GNN layer typically consists of a feature transformation and a feature aggregation operation. The former normally uses feed-forward networks to transform features, while the latter aggregates the transformed features over the graph. Numerous recent works have proposed GNN models with different designs in the aggregation operation. In this work, we establish mathematically that the aggregation processes in a group of representative GNN models including GCN, GAT, PPNP, and APPNP can be regarded as (approximately) solving a graph denoising problem with a smoothness assumption. Such a unified view across GNNs not only provides a new perspective to understand a variety of aggregation operations but also enables us to develop a unified graph neural network framework UGNN. To demonstrate its promising potential, we instantiate a novel GNN model, ADA-UGNN, derived from UGNN, to handle graphs with adaptive smoothness across nodes. Comprehensive experiments show the effectiveness of ADA-UGNN.

Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need

Zhengyi Ma
Zhicheng Dou
Wei Xu
Xinyu Zhang
Hao Jiang
Zhao Cao
Ji-Rong Wen

Designing pre-training objectives that more closely resemble the downstream tasks for pre-trained language models can lead to better performance at the fine-tuning stage, especially in the ad-hoc retrieval area. Existing pre-training approaches tailored for IR tried to incorporate weak supervised signals, such as query-likelihood based sampling, to construct pseudo query-document pairs from the raw textual corpus. However, these signals rely heavily on the sampling method. For example, the query likelihood model may lead to much noise in the constructed pre-training data. In this paper, we propose to leverage the large-scale hyperlinks and anchor texts to pre-train the language model for ad-hoc retrieval. Since the anchor texts are created by webmasters and can usually summarize the target document, it can help to build more accurate and reliable pre-training samples than a specific algorithm. Considering different views of the downstream ad-hoc retrieval, we devise four pre-training tasks based on the hyperlinks. We then pre-train the Transformer model to predict the pair-wise preference, jointly with the Masked Language Model objective. Experimental results on two large-scale ad-hoc retrieval datasets show the significant improvement of our model compared with the existing methods.

Generating Compositional Color Representations from Text

Paridhi Maheshwari
Nihal Jain
Praneetha Vaddamanu
Dhananjay Raut
Shraiysh Vaishay
Vishwa Vinay

We consider the cross-modal task of producing color representations for text phrases. Motivated by the fact that a significant fraction of user queries on an image search engine follow an (attribute, object) structure, we propose a generative adversarial network that generates color profiles for such bigrams. We design our pipeline to learn composition - the ability to combine seen attributes and objects to unseen pairs. We propose a novel dataset curation pipeline from existing public sources. We describe how a set of phrases of interest can be compiled using a graph propagation technique, and then mapped to images. While this dataset is specialized for our investigations on color, the method can be extended to other visual dimensions where composition is of interest. We provide detailed ablation studies that test the behavior of our GAN architecture with loss functions from the contrastive learning literature. We show that the generative model achieves lower Frechet Inception Distance than discriminative ones, and therefore predicts color profiles that better match those from real images. Finally, we demonstrate improved performance in image retrieval and classification, indicating the crucial role that color plays in these downstream tasks.

Principled Multi-Aspect Evaluation Measures of Rankings

Maria Maistro
Lucas Chaves Lima
Jakob Grue Simonsen
Christina Lioma

Information Retrieval evaluation has traditionally focused on defining principled ways of assessing the relevance of a ranked list of documents with respect to a query. Several methods extend this type of evaluation beyond relevance, making it possible to evaluate different aspects of a document ranking (e.g., relevance, usefulness, or credibility) using a single measure (multi-aspect evaluation). However, these methods either are (i) tailor-made for specific aspects and do not extend to other types or numbers of aspects, or (ii) have theoretical anomalies, e.g. assign maximum score to a ranking where all documents are labelled with the lowest grade with respect to all aspects (e.g., not relevant, not credible, etc.).

We present a theoretically principled multi-aspect evaluation method that can be used for any number, and any type, of aspects. A thorough empirical evaluation using up to 5 aspects and a total of 425 runs officially submitted to 10 TREC tracks shows that our method is more discriminative than the state-of-the-art and overcomes theoretical limitations of the state-of-the-art.

SimpleX: A Simple and Strong Baseline for Collaborative Filtering

Kelong Mao
Jieming Zhu
Jinpeng Wang
Quanyu Dai
Zhenhua Dong
Xi Xiao
Xiuqiang He

Collaborative filtering (CF) is a widely studied research topic in recommender systems. The learning of a CF model generally depends on three major components, namely interaction encoder, loss function, and negative sampling. While many existing studies focus on the design of more powerful interaction encoders, the impacts of loss functions and negative sampling ratios have not yet been well explored. In this work, we show that the choice of loss function as well as negative sampling ratio is equivalently important. More specifically, we propose the cosine contrastive loss (CCL) and further incorporate it to a simple unified CF model, dubbed SimpleX. Extensive experiments have been conducted on 10 benchmark datasets and compared with 28 existing CF models in total. Surprisingly, the results show that, under our CCL loss and a large negative sampling ratio, SimpleX can surpass most sophisticated state-of-the-art models by a large margin (e.g., max 48.5% improvement in NDCG@20 over LightGCN). We believe that SimpleX could not only serve as a simple strong baseline to foster future research on CF, but also shed light on the potential research direction towards improving loss function and negative sampling.

UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation

Kelong Mao
Jieming Zhu
Xi Xiao
Biao Lu
Zhaowei Wang
Xiuqiang He

With the recent success of graph convolutional networks (GCNs), they have been widely applied for recommendation, and achieved impressive performance gains. The core of GCNs lies in its message passing mechanism to aggregate neighborhood information. However, we observed that message passing largely slows down the convergence of GCNs during training, especially for large-scale recommender systems, which hinders their wide adoption. LightGCN makes an early attempt to simplify GCNs for collaborative filtering by omitting feature transformations and nonlinear activations. In this paper, we take one step further to propose an ultra-simplified formulation of GCNs (dubbed UltraGCN), which skips infinite layers of message passing for efficient recommendation. Instead of explicit message passing, UltraGCN resorts to directly approximate the limit of infinite-layer graph convolutions via a constraint loss. Meanwhile, UltraGCN allows for more appropriate edge weight assignments and flexible adjustment of the relative importances among different types of relationships. This finally yields a simple yet effective UltraGCN model, which is easy to implement and efficient to train. Experimental results on four benchmark datasets show that UltraGCN not only outperforms the state-of-the-art GCN models but also achieves more than 10x speedup over LightGCN.

Are Negative Samples Necessary in Entity Alignment?: An Approach with High Performance, Scalability and Robustness

Xin Mao
Wenting Wang
Yuanbin Wu
Man Lan

Entity alignment (EA) aims to find the equivalent entities in different KGs, which is a crucial step in integrating multiple KGs. However, most existing EA methods have poor scalability and are unable to cope with large-scale datasets. We summarize three issues leading to such high time-space complexity in existing EA methods: (1) Inefficient graph encoders, (2) Dilemma of negative sampling, and (3) "Catastrophic forgetting" in semi-supervised learning. To address these challenges, we propose a novel EA method with three new components to enable high Performance, high Scalability, and high Robustness (PSR): (1) Simplified graph encoder with relational graph sampling, (2) Symmetric negative-free alignment loss, and (3) Incremental semi-supervised learning. Furthermore, we conduct detailed experiments on several public datasets to examine the effectiveness and efficiency of our proposed method. The experimental results show that PSR not only surpasses the previous SOTA in performance but also has impressive scalability and robustness.

A Projected Gradient Method for Opinion Optimization with Limited Changes of Susceptibility to Persuasion

Naoki Marumo
Atsushi Miyauchi
Akiko Takeda
Akira Tanaka

Many social phenomena are triggered by public opinion that is formed in the process of opinion exchange among individuals. To date, from the engineering point of view, a large body of work has been devoted to studying how to manipulate individual opinions so as to guide public opinion towards the desired state. Recently, Abebe et al. (KDD 2018) have initiated the study of the impact of interventions at the level of susceptibility rather than the interventions that directly modify individual opinions themselves. For the model, Chan et al. (The Web Conference 2019) designed a local search algorithm to find an optimal solution in polynomial time. However, it can be seen that the solution obtained by solving the above model might not be implemented in real-world scenarios. In fact, as we do not consider the amount of changes of the susceptibility, it would be too costly to change the susceptibility values for agents based on the solution.

In this paper, we study an opinion optimization model that is able to limit the amount of changes of the susceptibility in various forms. First we introduce a novel opinion optimization model, where the initial susceptibility values are given as additional input and the feasible region is defined using the ℓp-ball centered at the initial susceptibility vector. For the proposed model, we design a projected gradient method that is applicable to the case where there are millions of agents. Finally we conduct thorough experiments using a variety of real-world social networks and demonstrate that the proposed algorithm outperforms baseline methods.

L2NAS: Learning to Optimize Neural Architectures via Continuous-Action Reinforcement Learning

Keith G. Mills
Fred X. Han
Mohammad Salameh
Seyed Saeed Changiz Rezaei
Linglong Kong
Wei Lu
Shuo Lian
Shangling Jui
Di Niu

Neural architecture search (NAS) has achieved remarkable results in deep neural network design. Differentiable architecture search converts the search over discrete architectures into a hyperparameter optimization problem which can be solved by gradient descent. However, questions have been raised regarding the effectiveness and generalizability of gradient methods for solving non-convex architecture hyperparameter optimization problems. In this paper, we propose L2NAS, which learns to intelligently optimize and update architecture hyperparameters via an actor neural network based on the distribution of high-performing architectures in the search history. We introduce a quantile-driven training procedure which efficiently trains L2NAS in an actor-critic framework via continuous-action reinforcement learning. Experiments show that L2NAS achieves state-of-the-art results on NAS-Bench-201 benchmark as well as DARTS search space and Once-for-All MobileNetV3 search space. We also show that search policies generated by L2NAS are generalizable and transferable across different training datasets with minimal fine-tuning.

POSHAN: Cardinal POS Pattern Guided Attention for News Headline Incongruence

Rahul Mishra
Shuo Zhang

Automatic detection of click-baits and incongruent news headlines is crucial to maintain the reliability of the Web and has raised much research attention. However, most existing methods perform poorly when news headline contains contextually important cardinal values such as a quantity or an amount. In this work, we focus on this particular case and propose a neural attention based solution, which uses a novel cardinal Part of Speech (POS) tags pattern based hierarchical attention network, namely POSHAN, to learn effective representations of sentences in the news article. In addition, we investigate a novel cardinal phrase guided attention, which uses word embeddings of the contextually important cardinal value and neighbouring words. In the experiments conducted on two publicly available datasets, we observe that the proposed method gives appropriate significance to cardinal values and outperforms all the baselines. An ablation study of the POSHAN, shows that the cardinal POS-tag pattern based hierarchical attention is very effective for the cases in which headline contains cardinal values.

50 Ways to Bake a Cookie: Mapping the Landscape of Procedural Texts

Moran Mizrahi
Dafna Shahaf

The web is full of guidance on a wide variety of tasks, from changing the oil in your car to baking an apple pie. However, as content is created independently, a single task could have thousands of corresponding procedural texts. This makes it difficult for users to view the bigger picture and understand the multiple ways the task could be accomplished. In this work we propose an unsupervised learning approach for summarizing multiple procedural texts into an intuitive graph representation, allowing users to easily explore commonalities and differences. We demonstrate our approach on recipes, a prominent example of procedural texts. User studies show that our representation is intuitive and coherent and that it has the potential to help users with several sensemaking tasks, including adapting recipes for a novice cook and finding creative ways to spice up a dish.

Agenda: Robust Personalized PageRanks in Evolving Graphs

Dingheng Mo
Siqiang Luo

Given a source node s and a target node t in a graph G, the Personalized PageRank (PPR) from s to t is the probability of a random walk starting from s terminates at t. PPR is a classic measure of the relevance among different nodes in a graph, and has been applied in numerous real-world systems. However, existing techniques for PPR queries are not robust to dynamic real-world graphs, which typically have different evolving speeds. Their performance is significantly degraded either at a lower graph evolving rate (e.g., much more queries than updates) or a higher rate.

To address the above deficiencies, we propose Agenda to efficiently process, with strong approximation guarantees, the single-source PPR (SSPPR) queries on dynamically evolving graphs with various evolving speeds. Compared with previous methods, Agenda has significantly better workload robustness, while ensuring the same result accuracy. Agenda also has theoretically-guaranteed small query and update costs. Experiments on up to billion-edge scale graphs show that Agenda significantly outperforms state-of-the-art methods for various query/update workloads, while maintaining better or comparable approximation accuracies.

Learning Ideological Embeddings from Information Cascades

Corrado Monti
Giuseppe Manco
Cigdem Aslay
Francesco Bonchi

Modeling information cascades in a social network through the lenses of the ideological leaning of its users can help understanding phenomena such as misinformation propagation and confirmation bias, and devising techniques for mitigating their toxic effects.

In this paper we propose a stochastic model to learn the ideological leaning of each user in a multidimensional ideological space, by analyzing the way politically salient content propagates. In particular, our model assumes that information propagates from one user to another if both users are interested in the topic and ideologically aligned with each other. To infer the parameters of our model, we devise a gradient-based optimization procedure maximizing the likelihood of an observed set of information cascades. Our experiments on real-world political discussions on Twitter and Reddit confirm that our model is able to learn the political stance of the social media users in a multidimensional ideological space.

Defining an Optimal Configuration Set for Selective Search Strategy - A Risk-Sensitive Approach

Josiane Mothe
Md Zia Ullah

A search engine generally applies a single search strategy to any user query. The search combines many component processes (e.g., indexing, query expansion, search-weighting model, document ranking) and their hyperparameters, whose values are optimized based on past queries and then applied to all future queries. Even an optimized system may perform poorly on some queries, however, whereas another system might perform better on those queries. Selective search strategy aims to select the most appropriate combination of components and hyperparameter values to apply for each individual query. The number of candidate combinations is huge. To adapt best to any query, the ideal system would use many combinations. In the real world it would be too costly to use and maintain thousands of configurations. A trade-off must therefore be found between performance and cost. In this paper, we describe a risk-sensitive approach to optimize the set of configurations that should be included in a selective search strategy. This approach solves the problem of which and how many configurations to include in the system. We show that the use of 20 configurations results in significantly greater effectiveness than current approaches when tested on three TREC reference collections, by about 23% when compared to L2R documents and about 10% when compared to other selective approaches, and that it offers an appropriate trade-off between system complexity and system effectiveness.

Revisiting State Augmentation methods for Reinforcement Learning with Stochastic Delays

Somjit Nath
Mayank Baranwal
Harshad Khadilkar

Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays. The presence of delays degrades the performance of reinforcement learning (RL) algorithms, often to such an extent that algorithms fail to learn anything substantial. This paper formally describes the notion of Markov Decision Processes (MDPs) with stochastic delays and shows that delayed MDPs can be transformed into equivalent standard MDPs (without delays) with significantly simplified cost structure. We employ this equivalence to derive a model-free Delay-Resolved RL framework and show that even a simple RL algorithm built upon this framework achieves near-optimal rewards in environments with stochastic delays in actions and observations. The delay-resolved deep Q-network (DRDQN) algorithm is bench-marked on a variety of environments comprising of multi-step and stochastic delays and results in better performance, both in terms of achieving near-optimal rewards and minimizing the computational overhead thereof, with respect to the currently established algorithms.

Disentangling Preference Representations for Recommendation Critiquing with ß-VAE

Preksha Nema
Alexandros Karatzoglou
Filip Radlinski

Modern recommender systems usually embed users and items into a learned vector space representation. Similarity in this space is used to generate recommendations, and recommendation methods are agnostic to the structure of the embedding space. Motivated by the need for recommendation systems to be more transparent and controllable, we postulate that it is beneficial to assign meaning to some of the dimensions of user and item representations. Disentanglement is one technique commonly used for this purpose. We presenta novel supervised disentangling approach for recommendation tasks. Our model learns embeddings where attributes of interest are disentangled, while requiring only a very small number of labeled items at training time. The model can then generate interactive and critiquable recommendations for all users, without requiring any labels at recommendation time, and without sacrificing any recommendation performance. Our approach thus provides users with levers to manipulate, critique and fine-tune recommendations, and gives insight into why particular recommendations are made. Given only user-item interactions at recommendation time, we show that it identifies user tastes with respect to the attributes that have been disentangled, allowing for users to manipulate recommendations across these attributes.

Interpreting Convolutional Sequence Model by Learning Local Prototypes with Adaptation Regularization

Jingchao Ni
Zhengzhang Chen
Wei Cheng
Bo Zong
Dongjin Song
Yanchi Liu
Xuchao Zhang
Haifeng Chen

In many high-stakes applications of machine learning models, outputting only predictions or providing statistical confidence is usually insufficient to gain trust from end users, who often prefer a transparent reasoning paradigm. Despite the recent encouraging developments on deep networks for sequential data modeling, due to the highly recursive functions, the underlying rationales of their predictions are difficult to explain. Thus, in this paper, we aim to develop a sequence modeling approach that explains its own predictions by breaking input sequences down into evidencing segments (i.e., sub-sequences) in its reasoning. To this end, we build our model upon convolutional neural networks, which, in their vanilla forms, associates local receptive fields with outputs in an obscure manner. To unveil it, we resort to case-based reasoning, and design prototype modules whose units (i.e., prototypes) resemble exemplar segments in the problem domain. Each prediction is obtained by combining the comparisons between the prototypes and the segments of an input. To enhance interpretability, we propose a training objective that delicately adapts the distribution of prototypes to the data distribution in latent spaces, and design an algorithm to map prototypes to human-understandable segments. Through extensive experiments in a variety of domains, we demonstrate that our model can achieve high interpretability generally, together with a competitive accuracy to the state-of-the-art approaches.

Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

Masahiro Nomura
Yuta Saito

A typical assumption in supervised machine learning is that the train (source) and test (target) datasets follow completely the same distribution. This assumption is, however, often violated in uncertain real-world applications, which motivates the study of learning under covariate shift. In this setting, the naive use of adaptive hyperparameter optimization methods such as Bayesian optimization does not work as desired since it does not address the distributional shift among different datasets. In this work, we consider a novel hyperparameter optimization problem under the i>multi-source covariate shift whose goal is to find the optimal hyperparameters for a target task of interest using only unlabeled data in a target task and labeled data inmultiple source tasks. To conduct efficient hyperparameter optimization for the target task, it is essential to estimate the target objective using only the available information. To this end, we construct the variance reduced estimator that unbiasedly approximates the target objective with a desirable variance property. Building on the proposed estimator, we provide a general and tractable hyperparameter optimization procedure, which works preferably in our setting with a no-regret guarantee. The experiments demonstrate that the proposed framework broadens the applications of automated hyperparameter optimization.

Influence-guided Data Augmentation for Neural Tensor Completion

Sejoon Oh
Sungchul Kim
Ryan A. Rossi
Srijan Kumar

How can we predict missing values in multi-dimensional data (or tensors) more accurately? The task of tensor completion is crucial in many applications such as personalized recommendation, image and video restoration, and link prediction in social networks. Many tensor factorization and neural network-based tensor completion algorithms have been developed to predict missing entries in partially observed tensors. However, they can produce inaccurate estimations as real-world tensors are very sparse, and these methods tend to overfit on the small amount of data. Here, we overcome these shortcomings by presenting a data augmentation technique for tensors. In this paper, we propose DAIN, a general data augmentation framework that enhances the prediction accuracy of neural tensor completion methods. Specifically, DAIN first trains a neural model and finds tensor cell importances with influence functions. After that, DAIN aggregates the cell importance to calculate the importance of each entity (i.e., an index of a dimension). Finally, DAIN augments the tensor by weighted sampling of entity importances and a value predictor. Extensive experimental results show that DAIN outperforms all data augmentation baselines in terms of enhancing imputation accuracy of neural tensor completion on four diverse real-world tensors. Ablation studies of DAIN substantiate the effectiveness of each component of DAIN. Furthermore, we show that DAIN scales near linearly to large datasets.

Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching

Liang Pang
Yanyan Lan
Xueqi Cheng

Neural text matching models have been widely used in community question answering, information retrieval, and dialogue. However, these models designed for short texts cannot well address the long-form text matching problem, because there are many contexts in long-form texts can not be directly aligned with each other, and it is difficult for existing models to capture the key matching signals from such noisy data. Besides, these models are computationally expensive for simply use all textual data indiscriminately. To tackle the effectiveness and efficiency problem, we propose a novel hierarchical noise filtering model, namely Match-Ignition. The main idea is to plug the well-known PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information in the matching process. Noisy sentences are usually easy to detect because previous work has shown that their similarity can be explicitly evaluated by the word overlapping, so we directly use PageRank to filter such information based on a sentence similarity graph. Unlike sentences, words rely on their contexts to express concrete meanings, so we propose to jointly learn the filtering and matching process, to well capture the critical word-level matching signals. Specifically, a word graph is first built based on the attention scores in each self-attention block of Transformer, and key words are then selected by applying PageRank on this graph. In this way, noisy words will be filtered out layer by layer in the matching process. Experimental results show that Match-Ignition outperforms both SOTA short text matching models and recent long-form text matching models. We also conduct detailed analysis to show that Match-Ignition efficiently captures important sentences and words, to facilitate the long-form text matching process.

Learning Saliency Maps to Explain Deep Time Series Classifiers

Prathyush S. Parvatharaju
Ramesh Doddaiah
Thomas Hartvigsen
Elke A. Rundensteiner

Explainable classification is essential to high-impact settings where practitioners requireevidence to support their decisions. However, state-of-the-art deep learning models lack transparency in how they make their predictions. One increasingly popular solution is attribution-based explainability, which finds the impact of input features on the model's predictions. While this is popular for computer vision, little has been done to explain deep time series classifiers.In this work, we study this problem and propose PERT, a novel perturbation-based explainability method designed to explain deep classifiers' decisions on time series. PERT extends beyond recent perturbation methods to generate a saliency map that assigns importance values to the timesteps of the instance-of-interest.

First, PERT uses a novel Prioritized Replacement Selector to learn which alternative time series from a larger dataset are most useful to perform this perturbation. Second, PERT mixes the instance with the replacements using a Guided Perturbation Strategy, which learns to what degree each timestep can be perturbed without altering the classifier's final prediction. These two steps jointly learn to identify the fewest and most impactful timesteps that explain the classifier's prediction. We evaluate PERT using three metrics on nine popular datasets with two black-box models. We find that PERT consistently outperforms all five state-of-the-art methods. Using a case study, we also demonstrate that PERT succeeds in finding the relevant regions of the input time series.

Differentially Private Federated Knowledge Graphs Embedding

Hao Peng
Haoran Li
Yangqiu Song
Vincent Zheng
Jianxin Li

Knowledge graph embedding plays an important role in knowledge representation, reasoning, and data mining applications. However, for multiple cross-domain knowledge graphs, state-of-the-art embedding models cannot make full use of the data from different knowledge domains while preserving the privacy of exchanged data. In addition, the centralized embedding model may not scale to the extensive real-world knowledge graphs. Therefore, we propose a novel decentralized scalable learning framework, Federated Knowledge Graphs Embedding (FKGE), where embeddings from different knowledge graphs can be learnt in an asynchronous and peer-to-peer manner while being privacy-preserving. FKGE exploits adversarial generation between pairs of knowledge graphs to translate identical entities and relations of different domains into near embedding spaces. In order to protect the privacy of the training data, FKGE further implements a privacy-preserving neural network structure to guarantee no raw data leakage. We conduct extensive experiments to evaluate FKGE on 11 knowledge graphs, demonstrating a significant and consistent improvement in model quality with at most 17.85% and 7.90% increases in performance on triple classification and link prediction tasks.

Sparse Shield: Social Network Immunization vs. Harmful Speech

Alexandru Petrescu
Ciprian-Octavian Truică
Elena-Simona Apostol
Panagiotis Karras

With the rise of social media users and the general shift of communication from traditional media to online platforms, the spread of harmful content (e.g., hate speech, misinformation, fake news) has been exacerbated. Harmful content in the form of hate speech causes a person distress or harm, having a negative impact on the individual mental health, with even more detrimental effects on the psychology of children and teenagers. In this paper, we propose an end-to-end solution with real-time capabilities to detect harmful content in real-time and mitigate its spread over the network. Our main contribution is Sparse Shield, a novel method that out-scales existing state-of-the-art methods for network immunization. We also propose a novel architecture for harmful speech mitigation that maximizes the impact of immunization. Our solution aims to identify a set of users for which to move harmful content at the bottom of the user feed, rather than censoring users. By immunizing certain network nodes in this manner, we minimize the negative impact on the network and minimize the interference with and limitation of individual freedoms: the information is not hidden but rather not as easy to reach without an explicit search. Our analysis is based on graphs built on real-world data collected from Twitter; these graphs reflect real user behavior. We perform extensive scalability experiments to prove the superiority of our method over existing state-of-the-art network immunization techniques. We also perform extensive experiments to showcase that Sparse Shield outperforms existing techniques on the task of harmful speech mitigation on a real-world dataset.

MentalSpot: Effective Early Screening for Depression Based on Social Contagion

Jahandad Pirayesh
Haiquan Chen
Xiao Qin
Wei-Shinn Ku
Da Yan

While depression is rated as the most important leading factor to global disability, early detection of depression is a non-trivial task. Existing depression detection mechanisms harvesting social media data suffer from two major limitations. First, existing solutions rely heavily on the amount, quality, and variety of content types (textual, visual, etc.) posted by users to make accurate inferences, therefore suffering from the cold-start problem when coping with users with limited training data (e.g., most existing works exclude users with fewer than 25 tweets). Second, existing approaches ignore the social impact or indication from users' social circles that can be leveraged to enhance the inference results. In this paper, we present MentalSpot, a social-contagion based depression early-screening framework using meta-learning. Specifically, we first construct a social-contagion driven data repository PsycheNet, filling the void of social-circle based depression datasets. We design a triplet network to extract users' embeddings based on the similarities of the linguistic features extracted from written texts. Afterwards, for each target user, we employ dynamic mean shift pruning to select her top-k homogeneous friends in the metric space, the texts written by whom will then be leveraged to train a friend based depression detection model. Extensive experiments show that MentalSpot outperforms the state of the art in terms of all effectiveness metrics, especially for users with very few tweets. Specifically, by using only five tweets per user, MentalSpot successfully yields an F1 score that would otherwise be achieved by the state-of-the-art methods requiring at least twenty tweets. Our approach represents a step forward to address the cold-start problem that deep learning techniques struggle with for their applications in psychiatric diagnosis. The principal beneficiaries of this study are healthcare professionals in medical institutions to determine timely and targeted interventions in a clinical setting. This study also supports non-profit groups in reaching out to people with mental health issues, helping in a global health task that cannot be fully covered by clinicians.

Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning

Paul J. Pritz
Liang Ma
Kin K. Leung

While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we establish the theoretical foundations for the validity of training a reinforcement learning agent using embedded states and actions. We then propose a new approach for jointly learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that leverages these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming, robotic control, and recommender systems show it significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces, thus confirming its efficacy.

Unsupervised Domain Adaptation for Static Malware Detection based on Gradient Boosting Trees

Panpan Qi
Wei Wang
Lei Zhu
See Kiong Ng

Static malware detection is important for protection against malware by allowing for malicious files to be detected prior to execution. It is also especially suitable for machine learning-based approaches. Recently, gradient boosting decision trees (GBDT) models, e.g., LightGBM (a popular implementation of GBDT), have shown outstanding performance for malware detection. However, as malware programs are known to evolve rapidly, malware classification models trained on the (source) training data often fail to generalize to the target domain, i.e., the deployed environment. To handle the underlying data distribution drifts, unsupervised domain adaptation techniques have been proposed for machine learning models including deep learning models. However, unsupervised domain adaptation for GBDT has remained challenging. In this paper, we adapt the adversarial learning framework for unsupervised domain adaptation to enable GBDT learn domain-invariant features and alleviate performance degradation in the target domain. In addition, to fully exploit the unlabelled target data, we merge them into the training dataset after pseudo-labelling. We propose a new weighting scheme integrated into GBDT for sampling instances in each boosting round to reduce the negative impact of wrongly labelled target instances. Experiments on two large malware datasets demonstrate the superiority of our proposed method.

Learning Implicit User Profile for Personalized Retrieval-Based Chatbot

Hongjin Qian
Zhicheng Dou
Yutao Zhu
Yueyuan Ma
Ji-Rong Wen

In this paper, we explore the problem of developing personalized chatbots. A personalized chatbot is designed as a digital chatting assistant for a user. The key characteristic of a personalized chatbot is that it should have a consistent personality with the corresponding user. It can talk the same way as the user when it is delegated to respond to others' messages. Many methods have been proposed to assign a personality to dialogue chatbots, but most of them utilize explicit user profiles, including several persona descriptions or key-value-based personal information. In a practical scenario, however, users might be reluctant to write detailed persona descriptions, and obtaining a large number of explicit user profiles requires tremendous manual labour. To tackle the problem, we present a retrieval-based personalized chatbot model, namely IMPChat, to learn an implicit user profile from the user's dialogue history. We argue that the implicit user profile is superior to the explicit user profile regarding accessibility and flexibility. IMPChat aims to learn an implicit user profile through modeling user's personalized language style and personalized preferences separately. To learn a user's personalized language style, we elaborately build language models from shallow to deep using the user's historical responses; To model a user's personalized preferences, we explore the conditional relations underneath each post-response pair of the user. The personalized preferences are dynamic and context-aware: we assign higher weights to those historical pairs that are topically related to the current query when aggregating the personalized preferences. We match each response candidate with the personalized language style and personalized preference, respectively, and fuse the two matching signals to determine the final ranking score. We conduct comprehensive experiments on two large datasets, and the results show that our method outperforms all baseline models.

Learning to Augment Imbalanced Data for Re-ranking Models

Zi-Hao Qiu
Ying-Chun Jian
Qing-Guo Chen
Lijun Zhang

The conventional solution to learning to rank problems ranks individual documents by prediction scores greedily. Recent emerged re-ranking models, which take as input initial lists, aim to capture document interdependencies and directly generate the optimal ordered lists. Typically, a re-ranking model is learned from a set of labeled data, which can achieve favorable performance on average. However, it can be suboptimal for individual queries because the available training data is usually highly imbalanced. This problem is challenging due to the absence of informative data for some queries and furthermore, the lack of a good data augmentation policy.

In this paper, we propose a novel method named Learning to Augment (LTA), which mitigates the imbalance issue through learning to augment the initial lists for re-ranking models. Specifically, we first design a data generation model based on Gaussian Mixture Variational Autoencoder (GMVAE) for generating informative data. GMVAE imposes a mixture of Gaussians on the latent space, which allows it to cluster queries in an unsupervised manner and then generate new data with different query types using the learned components. Then, to obtain a good augmentation strategy (instead of heuristics), we design a teacher model that consists of two intelligent agents to determine how to generate new data for a given list and how to rank both the raw data and generated data to produce augmented lists, respectively. The teacher model leverages the feedback from the re-ranking model to optimize its augmentation policy by means of reinforcement learning. Our method offers a general learning paradigm that is applicable to both supervised and reinforced re-ranking models. Experimental results on benchmark learning to rank datasets show that our proposed method can significantly improve the performance of re-ranking models.

Natural Language Understanding with Privacy-Preserving BERT

Chen Qu
Weize Kong
Liu Yang
Mingyang Zhang
Michael Bendersky
Marc Najork

Privacy preservation remains a key challenge in data mining and Natural Language Understanding (NLU). Previous research shows that the input text or even text embeddings can leak private information. This concern motivates our research on effective privacy preservation approaches for pretrained Language Models (LMs). We investigate the privacy and utility implications of applying dχ-privacy, a variant of Local Differential Privacy, to BERT fine-tuning in NLU applications. More importantly, we further propose privacy-adaptive LM pretraining methods and show that our approach can boost the utility of BERT dramatically while retaining the same level of privacy protection. We also quantify the level of privacy preservation and provide guidance on privacy configuration. Our experiments and findings lay the groundwork for future explorations of privacy-preserving NLU with pretrained LMs.

A Study of Explainability Features to Scrutinize Faceted Filtering Results

Jiaming Qu
Jaime Arguello
Yue Wang

Faceted search systems enable users to filter results by selecting values along different dimensions or facets. Traditionally, facets have corresponded to properties of information items that are part of the document metadata. Recently, faceted search systems have begun to use machine learning to automatically associate documents with facet-values that are more subjective and abstract. Examples include search systems that support topic-based filtering of research articles, concept-based filtering of medical documents, and tag-based filtering of images. While machine learning can be used to infer facet-values when the collection is too large for manual annotation, machine-learned classifiers make mistakes. In such cases, it is desirable to have a scrutable system that explains why a filtered result is relevant to a facet-value. Such explanations are missing from current systems. In this paper, we investigate how explainability features can help users interpret results filtered using machine-learned facets. We consider two explainability features: (1) showing prediction confidence values and (2) highlighting rationale sentences that played an influential role in predicting a facet-value. We report on a crowdsourced study involving 200 participants. Participants were asked to scrutinize movie plot summaries predicted to satisfy multiple genres and indicate their agreement or disagreement with the system. Participants were exposed to four interface conditions. We found that both explainability features had a positive impact on participants' perceptions and performance. While both features helped, the sentence-highlighting feature played a more instrumental role in enabling participants to reject false positive cases. We discuss implications for designing tools to help users scrutinize automatically assigned facet-values.

Fairness-Aware Training of Decision Trees by Abstract Interpretation

Francesco Ranzato
Caterina Urban
Marco Zanella

We study the problem of formally verifying individual fairness of decision tree ensembles, as well as training tree models which maximize both accuracy and individual fairness. In our approach, fairness verification and fairness-aware training both rely on a notion of stability of a classifier, which is a generalization of the standard notion of robustness to input perturbations used in adversarial machine learning. Our verification and training methods leverage abstract interpretation, a well-established mathematical framework for designing computable, correct, and precise approximations of potentially infinite behaviors. We implemented our fairness-aware learning method by building on a tool for adversarial training of decision trees. We evaluated it in practice on the reference datasets in the literature on fairness in machine learning. The experimental results show that our approach is able to train tree models exhibiting a high degree of individual fairness with respect to the natural state-of-the-art CART trees and random forests. Moreover, as a by-product, these fairness-aware decision trees turn out to be significantly compact, which naturally enhances their interpretability.

QuAX: Mining the Web for High-utility FAQ

Muhammad Shihab Rashid
Fuad Jamour
Vagelis Hristidis

Frequently Asked Questions (FAQ) are a form of semi-structured data that provides users with commonly requested information and enables several natural language processing tasks. Given the plethora of such question-answer pairs on the Web, there is an opportunity to automatically build large FAQ collections for any domain, such as COVID-19 or Plastic Surgery. These collections can be used by several information-seeking portals and applications, such as AI chatbots. Automatically identifying and extracting such high-utility question-answer pairs is a challenging endeavor, which has been tackled by little research work. For a question-answer pair to be useful to a broad audience, it must (i) provide general information -- not be specific to the Web site or Web page where it is hosted -- and (ii) must be self-contained -- not have references to other entities in the page or missing terms (ellipses) that render the question-answer pair ambiguous. Although identifying general, self-contained questions may seem like a straightforward binary classification problem, the limited availability of training data for this task and the countless domains make building machine learning models challenging. Existing efforts in extracting FAQs from the Web typically focus on FAQ retrieval without much regard to the utility of the extracted FAQ. We propose QuAX: a framework for extracting high-utility (i.e., general and self-contained) domain-specific FAQ lists from the Web. QuAX receives a set of keywords from a user, and works in a pipelined fashion to find relevant web pages and extract general and self-contained questions-answer pairs. We experimentally show how QuAX generates high-utility FAQ collections with little and domain-agnostic training data, and how the individual stages of the pipeline improve on the corresponding state-of-the-art.

AdaSim: A Recursive Similarity Measure in Graphs

Masoud Rehyani Hamedani
Sang-Wook Kim

In the literature, various link-based similarity measures such as Adamic/Adar (in short Ada), SimRank, and random walk with restart (RWR) have been proposed. Contrary to SimRank and RWR, Ada is a non-recursive measure, which exploits the local graph structure in similarity computation. Motivated by Ada's promising results in various graph-related tasks, along with the fact that SimRank is a recursive generalization of the co -citation measure, in this paper, we propose AdaSim, a recursive similarity measure based on the Ada philosophy. Our AdaSim provides identical accuracy to that of Ada on the first iteration and it is applicable to both directed and undirected graphs. To accelerate our iterative form, we also propose a matrix form that is dramatically faster while providing the exact AdaSim scores. We conduct extensive experiments with five real-world datasets to evaluate both the effectiveness and efficiency of our AdaSim in comparison with those of existing similarity measures and graph embedding methods in the task of similarity computation of nodes. Our experimental results show that 1) AdaSim significantly improves the effectiveness of Ada and outperforms other competitors, 2) its efficiency is comparable to that of SimRank* while being better than the others, 3) AdaSim is not sensitive to the parameter tuning, and 4) similarity measures are better than embedding methods to compute similarity of nodes.

Power to the Relational Inductive Bias: Graph Neural Networks in Electrical Power Grids

Martin Ringsquandl
Houssem Sellami
Marcel Hildebrandt
Dagmar Beyer
Sylwia Henselmeyer
Sebastian Weber
Mitchell Joblin

The application of graph neural networks (GNNs) to the domain of electrical power grids has high potential impact on smart grid monitoring. Even though there is a natural correspondence of power flow to message-passing in GNNs, their performance on power grids is not well-understood. We argue that there is a gap between GNN research driven by benchmarks which contain graphs that differ from power grids in several important aspects. Additionally, inductive learning of GNNs across multiple power grid topologies has not been explored with real-world data.

We address this gap by means of (i) defining power grid graph datasets in inductive settings, (ii) an exploratory analysis of graph properties, and (iii) an empirical study of the concrete learning task of state estimation on real-world power grids. Our results show that GNNs are more robust to noise with up to 400% lower error compared to baselines. Furthermore, due to the unique properties of electrical grids, we do not observe the well known over-smoothing phenomenon of GNNs and find the best performing models to be exceptionally deep with up to 13 layers. This is in stark contrast to existing benchmark datasets where the consensus is that 2--3 layer GNNs perform best. Our results demonstrate that a key challenge in this domain is to effectively handle long-range dependence.

Corrective Guidance and Learning for Dialogue Management

Mahdin Rohmatillah
Jen-Tzung Chien

Establishing robust dialogue policy with low computation cost is challenging, especially for multi-domain task-oriented dialogue management due to the high complexity in state and action spaces. The previous works mostly using the deterministic policy optimization only attain moderate performance. Meanwhile, state-of-the-art result that uses end-to-end approach is computationally demanding since it utilizes a large-scaled language model based on the generative pre-trained transformer-2 (GPT-2). In this study, a new learning procedure consisting of three learning stages is presented to improve multi-domain dialogue management with corrective guidance. Firstly, the behavior cloning with an auxiliary task is developed to build a robust pre-trained model by mitigating the causal confusion problem in imitation learning. Next, the pre-trained model is rectified by using reinforcement learning via the proximal policy optimization. Lastly, human-in-the-loop learning strategy is fulfilled to enhance the agent performance by directly providing corrective feedback from rule-based agent so that the agent is prevented to trap in confounded states. The experiments on end-to-end evaluation show that the proposed learning method achieves state-of-the-art result by performing nearly identical to the rule-based agent. This method outperforms the second place of 9th dialog system technology challenge (DSTC9) track 2 that uses GPT-2 as the core model in dialogue management.

The Shapley Value of Classifiers in Ensemble Games

Benedek Rozemberczki
Rik Sarkar

What is the value of an individual model in an ensemble of binary classifiers? We answer this question by introducing a class of transferable utility cooperative games called ensemble games. In machine learning ensembles, pre-trained models cooperate to make classification decisions. To quantify the importance of models in these ensemble games, we define Troupe - an efficient algorithm that allocates payoffs based on approximate Shapley values of the classifiers. We argue that the Shapley value of models in these games is an effective decision metric for choosing a high-performing subset of models from the ensemble. Our analytical findings prove that our Shapley value estimation scheme is precise and scalable; its performance increases with the size of the dataset and ensemble. Empirical results on real-world graph classification tasks demonstrate that our algorithm produces high-quality estimates of the Shapley value. We find that Shapley values can be utilized for ensemble pruning and that adversarial models receive a low valuation. Complex classifiers are frequently found to be responsible for both correct and incorrect classification decisions.

odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks

Ilie Sarpe
Fabio Vandin

Counting the number of occurrences of small connected subgraphs, called temporal motifs, has become a fundamental primitive for the analysis of temporal networks, whose edges are annotated with the time of the event they represent. One of the main complications in studying temporal motifs is the large number of motifs that can be built even with a limited number of vertices or edges. As a consequence, since in many applications motifs are employed for exploratory analyses, the user needs to iteratively select and analyze several motifs that represent different aspects of the network, resulting in an inefficient, time-consuming process. This problem is exacerbated in large networks, where the analysis of even a single motif is computationally demanding. As a solution, in this work we propose and study the problem of simultaneously counting the number of occurrences of multiple temporal motifs, all corresponding to the same (static) topology (e.g., a triangle). Given that for large temporal networks computing the exact counts is unfeasible, we propose odeN, a sampling-based algorithm that provides an accurate approximation of all the counts of the motifs. We provide analytical bounds on the number of samples required by odeN to compute rigorous, probabilistic, relative approximations. Our extensive experimental evaluation shows that odeN enables the approximation of the counts of motifs in temporal networks in a fraction of the time needed by state-of-the-art methods, and that it also reports more accurate approximations than such methods.

ClaSP - Time Series Segmentation

Patrick Schäfer
Arik Ermshaus
Ulf Leser

The study of biological or physical processes often results in long sequences of temporally-ordered values, aka time series (TS). Changes in the observed processes, e.g. as a cause of natural events or internal state changes, result in changes of the measured values. Time series segmentation (TSS) tries to find such changes in TS to deduce changes in the underlying process. TSS is typically approached as an unsupervised learning problem aiming at the identification of segments distinguishable by some statistical property. We present ClaSP, a novel and highly accurate method for TSS. ClaSP hierarchically splits a TS into two parts, where each split point is determined by training a binary TS classifier for each possible split point and selecting the one with highest accuracy, i.e., the one that is best at identifying subsequences to be from either of the partitions. In our experimental evaluation using a benchmark of 98 datasets, we show that ClaSP outperforms the state-of-the-art in terms of accuracy and is also faster than the second best method. We highlight properties of ClaSP using several real-life time series.

One-shot Transfer Learning for Population Mapping

Erzhuo Shao
Jie Feng
Yingheng Wang
Tong Xia
Yong Li

Fine-grained population distribution data is of great importance for many applications, e.g., urban planning, traffic scheduling, epidemic modeling, and risk control. However, due to the limitations of data collection, including infrastructure density, user privacy, and business security, such fine-grained data is hard to collect and usually, only coarse-grained data is available. Thus, obtaining fine-grained population distribution from coarse-grained distribution becomes an important problem. To tackle this problem, existing methods mainly rely on sufficient fine-grained ground truth for training, which is not often available for the majority of cities. That limits the applications of these methods and brings the necessity to transfer knowledge between data-sufficient source cities to data-scarce target cities.

In knowledge transfer scenario, we employ single reference fine-grained ground truth in target city, which is easy to obtain via remote sensing or questionnaire, as the ground truth to inform the large-scale urban structure and support the knowledge transfer in target city. By this approach, we transform the fine-grained population mapping problem into a one-shot transfer learning problem.

In this paper, we propose a novel one-shot transfer learning framework PSRNet to transfer spatial-temporal knowledge across cities from three views. From the view of network structure, we build a dense connection-based population mapping network with temporal feature enhancement to capture the complicated spatial-temporal correlation between population distributions of different granularities. From the view of data, we design a generative model to synthesize fine-grained population samples with POI distribution and the single fine-grained ground truth in data-scarce target city. From the view of optimization, after combining above structure and data, we propose a pixel-level adversarial domain adaption mechanism for universal feature extraction and knowledge transfer during training with scarce ground truth for supervision.

Experiments on real-life datasets of 4 cities demonstrate that PSRNet has significant advantages over 8 state-of-the-art baselines by reducing RMSE and MAE by more than 25%. Our code and datasets are released in Github (https://github.com/erzhuoshao/PSRNet-CIKM).

Identifying Untrustworthy Samples: Data Filtering for Open-domain Dialogues with Bayesian Optimization

Lei Shen
Haolan Zhan
Xin Shen
Hongshen Chen
Xiaofang Zhao
Xiaodan Zhu

Being able to reply with a related, fluent, and informative response is an indispensable requirement for building high-quality conversational agents. In order to generate better responses, some approaches have been proposed, such as feeding extra information by collecting large-scale datasets with human annotations, designing neural conversational models (NCMs) with complex architecture and loss functions, or filtering out untrustworthy samples based on a dialogue attribute, e.g., Relatedness or Genericness. In this paper, we follow the third research branch and present a data filtering method for open-domain dialogues, which identifies untrustworthy samples from training data with a quality measure that linearly combines seven dialogue attributes. The attribute weights are obtained via Bayesian Optimization (BayesOpt) that aims to optimize an objective function for dialogue generation iteratively on the validation set. Then we score training samples with the quality measure, sort them in descending order, and filter out those at the bottom. Furthermore, to accelerate the "filter-train-evaluate'' iterations involved in BayesOpt on large-scale datasets, we propose a training framework that integrates maximum likelihood estimation (MLE) and negative training method (NEG). The training method updates parameters of a trained NCMs on two small sets with newly maintained and removed samples, respectively. Specifically, MLE is applied to maximize the log-likelihood of newly maintained samples, while NEG is used to minimize the log-likelihood of newly removed ones. Experimental results on two datasets show that our method can effectively identify untrustworthy samples, and NCMs trained on the filtered datasets achieve better performance.

Inductive Matrix Completion Using Graph Autoencoder

Wei Shen
Chuheng Zhang
Yun Tian
Liang Zeng
Xiaonan He
Wanchun Dou
Xiaolong Xu

Recently, the graph neural network (GNN) has shown great power in matrix completion by formulating a rating matrix as a bipartite graph and then predicting the link between the corresponding user and item nodes. The majority of GNN-based matrix completion methods are based on Graph Autoencoder (GAE), which considers the one-hot index as input, maps a user (or item) index to a learnable embedding, applies a GNN to learn the node-specific representations based on these learnable embeddings and finally aggregates the representations of the target users and its corresponding item nodes to predict missing links. However, without node content (i.e., side information) for training, the user (or item) specific representation can not be learned in the inductive setting, that is, a model trained on one group of users (or items) cannot adapt to new users (or items). To this end, we propose an inductive matrix completion method using GAE (IMC-GAE), which utilizes the GAE to learn both the user-specific (or item-specific) representation for personalized recommendation and local graph patterns for inductive matrix completion. Specifically, we design two informative node features and employ a layer-wise node dropout scheme in GAE to learn local graph patterns which can be generalized to unseen data. The main contribution of our paper is the capability to efficiently learn local graph patterns in GAE, with good scalability and superior expressiveness compared to previous GNN-based matrix completion methods. Furthermore, extensive experiments demonstrate that our model achieves state-of-the-art performance on several matrix completion benchmarks.

How Powerful is Graph Convolution for Recommendation?

Yifei Shen
Yongji Wu
Yao Zhang
Caihua Shan
Jun Zhang
B. Khaled Letaief
Dongsheng Li

Graph convolutional networks (GCNs) have recently enabled a popular class of algorithms for collaborative filtering (CF). Nevertheless, the theoretical underpinnings of their empirical successes remain elusive. In this paper, we endeavor to obtain a better understanding of GCN-based CF methods via the lens of graph signal processing. By identifying the critical role of smoothness, a key concept in graph signal processing, we develop a unified graph convolution-based framework for CF. We prove that many existing CF methods are special cases of this framework, including the neighborhood-based methods, low-rank matrix factorization, linear auto-encoders, and LightGCN, corresponding to different low-pass filters. Based on our framework, we then present a simple and computationally efficient CF baseline, which we shall refer to as Graph Filter based Collaborative Filtering (GF-CF). Given an implicit feedback matrix, GF-CF can be obtained in a closed form instead of expensive training with back-propagation. Experiments will show that GF-CF achieves competitive or better performance against deep learning-based methods on three well-known datasets, notably with a 70% performance gain over LightGCN on the Amazon-book dataset.

DataType-Aware Knowledge Graph Representation Learning in Hyperbolic Space

Yuxin Shen
Zhao Li
Xin Wang
Jianxin Li
Xiaowang Zhang

Knowledge Graph (KG) representation learning aims to encode both entities and relations into a continuous low-dimensional vector space. Most existing methods only concentrate on learning representations from structural triples in Euclidean space, which cannot well exploit the rich semantic information with hierarchical structure in KGs. In this paper, we propose a novel DataType-aware hyperbolic knowledge representation learning model called DT-GCN, which has the advantage of fully embedding attribute values of data types information. We refine data types into five primitive modalities, including integer, double, Boolean, temporal, and textual. For each modality, an encoder is specifically designed to learn its embedding. In addition, we define a unified space based on Euclidean, spherical, and hyperbolic space, which is a continuous curvature space that combines advantages of three different spaces. Extensive experiments on both synthetic and real-world datasets show that our model is consistently better than the state-of-the-art models. The average performance is improved by 2.19% and 3.46% than the optimal baseline model on node classification and link prediction tasks, respectively. The results of ablation experiments demonstrate the advantages of embedding data types information and leveraging the unified space.

Integrating Pattern- and Fact-based Fake News Detection via Model Preference Learning

Qiang Sheng
Xueyao Zhang
Juan Cao
Lei Zhong

To defend against fake news, researchers have developed various methods based on texts. These methods can be grouped as 1) pattern-based methods, which focus on shared patterns among fake news posts rather than the claim itself; and 2) fact-based methods, which retrieve from external sources to verify the claim's veracity without considering patterns. The two groups of methods, which have different preferences of textual clues, actually play complementary roles in detecting fake news. However, few works consider their integration. In this paper, we study the problem of integrating pattern- and fact-based models into one framework via modeling their preference differences, i.e., making the pattern- and fact-based models focus on respective preferred parts in a post and mitigate interference from non-preferred parts as possible. To this end, we build a Preference-aware Fake News Detection Framework (Pref-FEND), which learns the respective preferences of pattern- and fact-based models for joint detection. We first design a heterogeneous dynamic graph convolutional network to generate the respective preference maps, and then use these maps to guide the joint learning of pattern- and fact-based models for final prediction. Experiments on two real-world datasets show that Pref-FEND effectively captures model preferences and improves the performance of models based on patterns, facts, or both.

WG4Rec: Modeling Textual Content with Word Graph for News Recommendation

Shaoyun Shi
Weizhi Ma
Zhen Wang
Min Zhang
Kun Fang
Jingfang Xu
Yiqun Liu
Shaoping Ma

News recommendation plays an indispensable role in acquiring daily news for users. Previous studies make great efforts to model high-order feature interactions between users and items, where various neural models are applied (e.g., RNN, GNN). However, we find that seldom efforts are made to get better representations for news. Most previous methods simply adopt pre-trained word embeddings to represent news and also suffer from cold-start users.

In this work, we propose a new textual content representation method by building a word graph for recommendation, which is named WG4Rec. Three types of word associations are adopted in WG4Rec for content representation and user preference modeling, namely: 1)semantically-similar according to pre-trained word vectors, 2)co-occurrence in documents, and 3)co-click by users across documents. As extra information can be unified by adding nodes/edges to the word graph easily, WG4Rec is flexible to make use of cross-platform and cross-domain context for recommendation to alleviate the cold-start issue. To the best of our knowledge, it is the first attempt that using these relationships for news recommendation to better model textual content and adopt cross-platform information. Experimental results on two large-scale real-world datasets show that WG4Rec significantly outperforms state-of-the-art algorithms, especially for cold users in the online environment. Besides, WG4Rec achieves better performances when cross-platform information is utilized.

XPM: An Explainable Deep Reinforcement Learning Framework for Portfolio Management

Si Shi
Jianjun Li
Guohui Li
Peng Pan
Ke Liu

Reinforcement learning-based portfolio management has recently attracted extensive attention. However, deep reinforcement learning methods are unexplainable and considered to be potentially risky, difficult to be trusted and regulated by users. To address these problems, we propose an eXplainable reinforcement learning framework for Portfolio Management, named XPM, which is efficient, concise, and can provide faithful explanations for network outputs. Specifically, we first design a policy network for portfolio management, which uses temporal convolutional network (TCN) to extract temporal features of multiple time series in portfolio. Then, we employ global average pooling (GAP) and a fully connected layer to integrate the global feature maps to handle asset correlations. Finally, we utilize softmax to determine the output portfolio weights. To assemble explainability into our model, we employ an explainable artificial intelligence method, class activation mapping (CAM), to explain the network outputs, which computes an activation map for an asset of interest. The map highlights the important assets and time intervals in the input state. In this way, end users can understand which part of the portfolio's recent price movements makes the network decision to invest in the target asset. Experimental results show that XPM outperforms the current state-of-the-art portfolio management methods in NASDAQ and NYSE markets, and can provide faithful and informative explanations to end users.

SGCL: Contrastive Representation Learning for Signed Graphs

Lin Shu
Erxin Du
Yaomin Chang
Chuan Chen
Zibin Zheng
Xingxing Xing
Shaofeng Shen

Graph contrastive representation learning aims to learn discriminative node representations by contrasting positive and negative samples. It helps models learn more generalized representations to achieve better performances on downstream tasks, which has aroused increasing research interest in recent years. Simultaneously, signed graphs consisting of both positive and negative links have become ubiquitous with the growing popularity of social media. However, existing works on graph contrastive representation learning are only proposed for unsigned graphs (containing only positive links) and it remains unexplored how they could be applied to signed graphs due to the distinct semantics and complex relations between positive and negative links. Therefore we propose a novel Signed Graph Contrastive Learning model (SGCL) to bridge this gap, which to the best of our knowledge is the first research to employ graph contrastive representation learning on signed graphs. Concretely, we design two types of graph augmentations specific to signed graphs based on a significant signed social theory, i.e., balance theory. Besides, inter-view and intra-view contrastive learning are proposed to learn discriminative node representations from perspectives of graph augmentations and signed structures respectively. Experimental results demonstrate the superiority of the proposed model over state-of-the-art methods on both real-world social datasets and online game datasets.

Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

Yaman Kumar Singla
Avyakt Gupta
Shaurya Bagga
Changyou Chen
Balaji Krishnamurthy
Rajiv Ratn Shah

Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from single audio, making them suffer from the lack of speaker-specific context required to model such a complex task. We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modelling. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context vectors from these responses and feed them as additional speaker-specific context to our network to score a particular response. We compare our technique with strong baselines and find that such modelling improves the model's average performance by 6.92% (maximum = 12.86%, minimum = 4.51%). We further show both quantitative and qualitative insights into the importance of this additional context in solving the problem of ASS.

SciClops: Detecting and Contextualizing Scientific Claims for Assisting Manual Fact-Checking

Panayiotis Smeros
Carlos Castillo
Karl Aberer

This paper describes SciClops, a method to help combat online scientific misinformation. Although automated fact-checking methods have gained significant attention recently, they require pre-existing ground-truth evidence, which, in the scientific context, is sparse and scattered across a constantly-evolving scientific literature. Existing methods do not exploit this literature, which can effectively contextualize and combat science-related fallacies. Furthermore, these methods rarely require human intervention, which is essential for the convoluted and critical domain of scientific misinformation.

SciClops involves three main steps to process scientific claims found in online news articles and social media postings: extraction, clustering, and contextualization. First, the extraction of scientific claims takes place using a domain-specific, fine-tuned transformer model. Second, similar claims extracted from heterogeneous sources are clustered together with related scientific literature using a method that exploits their content and the connections among them. Third, check-worthy claims, broadcasted by popular yet unreliable sources, are highlighted together with an enhanced fact-checking context that includes related verified claims, news articles, and scientific papers. Extensive experiments show that SciClops tackles sufficiently these three steps, and effectively assists non-expert fact-checkers in the verification of complex scientific claims, outperforming commercial fact-checking systems.

Metric Sentiment Learning for Label Representation

Chengyu Song
Fei Cai
Jianming Zheng
Wanyu Chen
Zhiqiang Pan

Label representation aims to generate a so-called verbalizer to an input text, which has a broad application in the field of text classification, event detection, question answering, etc. Previous works on label representation, especially in a few-shot setting, mainly define the verbalizers manually, which is accurate but time-consuming. Other models fail to correctly produce antonymous verbalizers for two semantically opposite classes. Thus, in this paper, we propose a metric sentiment learning framework (MSeLF) to generate the verbalizers automatically, which can capture the sentiment differences between the verbalizers accurately. In detail, MSeLF consists of two major components, i.e., the contrastive mapping learning (CML) module and the equal-gradient verbalizer acquisition (EVA) module. CML learns a transformation matrix to project the initial word embeddings to the antonym-aware embeddings by enlarging the distance between the antonyms. After that, in the antonym-aware embedding space, EVA first takes a pair of antonymous words as verbalizers for two opposite classes and then applies a sentiment transition vector to generate verbalizers for intermediate classes. We use the generated verbalizers for the downstream text classification task in a few-shot setting on two publicly available fine-grained datasets. The results indicate that our proposal outperforms the state-of-the-art baselines in terms of accuracy. In addition, we find CML can be used as a flexible plug-in component in other verbalizer acquisition approaches.

CBML: A Cluster-based Meta-learning Model for Session-based Recommendation

Jiayu Song
Jiajie Xu
Rui Zhou
Lu Chen
Jianxin Li
Chengfei Liu

Session-based recommendation is to predict an anonymous user's next action based on the user's historical actions in the current session. However, the cold-start problem of limited number of actions at the beginning of an anonymous session makes it difficult to model the user's behavior, i.e., hard to capture the user's various and dynamic preferences within the session. This severely affects the accuracy of session-based recommendation. Although some existing meta-learning based approaches have alleviated the cold-start problem by borrowing preferences from other users, they are still weak in modeling the behavior of the current user. To tackle the challenge, we propose a novel cluster-based meta-learning model for session-based recommendation. Specially, we adopt a soft-clustering method and design a parameter gate to better transfer shared knowledge across similar sessions and preserve the characteristics of the session itself. Besides, we apply two self-attention blocks to capture the transition patterns of sessions in both item and feature aspects. Finally, comprehensive experiments are conducted on two real-world datasets and demonstrate the superior performance of CBML over existing approaches.

Semi-supervised Multi-label Learning for Graph-structured Data

Zixing Song
Ziqiao Meng
Yifei Zhang
Irwin King

The semi-supervised multi-label classification problem primarily deals with Euclidean data, such as text with a 1D grid of tokens and images with a 2D grid of pixels. However, the non-Euclidean graph-structured data naturally and constantly appears in semi-supervised multi-label learning tasks from various domains like social networks, citation networks, and protein-protein interaction (PPI) networks. Moreover, the existing popular node embedding methods, like Graph Neural Networks (GNN), focus on graphs with simplex labels and tend to neglect label correlations in the multi-label setting, so the easy adaption proves empirically ineffective. Therefore, graph representation learning for the semi-supervised multi-label learning task is crucial and challenging. In this work, we incorporate the idea of label embedding into our proposed model to capture both network topology and higher-order multi-label correlations. The label embedding is generated along with the node embedding based on the topological structure to serve as the prototype center for each class. Moreover, the similarity of the label embedding and node embedding can be used as a confidence vector to guide the label smoothing process, formulating as a margin ranking optimization problem to learn the second-order relations between labels. Extensive experiments on real-world datasets from various domains demonstrate that our model significantly outperforms the state-of-the-art models for node-level tasks.

PeriodicMove: Shift-aware Human Mobility Recovery with Graph Neural Network

Hao Sun
Changjie Yang
Liwei Deng
Fan Zhou
Feiteng Huang
Kai Zheng

Human mobility recovery is of great importance for a wide range of location-based services. However, recovering human mobility is not trivial because of three challenges: 1) complex transition patterns among locations; 2) multi-level periodicity and shifting periodicity of human mobility; 3) sparsity of the collected trajectory data. In this paper, we propose PeriodicMove, a neural attention model based on graph neural network for human mobility recovery from lengthy and sparse trajectories. In PeriodicMove, we first construct a directed graph for each trajectory and capture complex location transition patterns using graph neural network. Then, we design two attention mechanisms which capture multi-level periodicity and shifting periodicity of human mobility respectively. Finally, a spatial-aware loss function is proposed to incorporate spatial proximity into the model optimization, which alleviates the data sparsity problem. We perform extensive experiments and the evaluation results demonstrate that PeriodicMove yields significant improvements over the competitors on two representative real-life mobility datasets. In addition, by providing high-quality mobility data, our model can benefit a variety of mobility-oriented downstream applications.

Tabular Functional Block Detection with Embedding-based Agglomerative Cell Clustering

Kexuan Sun
Fei Wang
Muhao Chen
Jay Pujara

Tables are a widely-used format for data curation. The diversity of domains, layouts, and content of tables makes knowledge extraction challenging. Understanding table layouts is an important step for automatically harvesting knowledge from tabular data. Since table cells are spatially organized into regions, correctly identifying such regions and inferring their functional roles, referred to as functional block detection, is a critical part of understanding table layouts. Earlier functional block detection approaches fail to leverage spatial relationships and higher-level structure, either depending on cell-level predictions or relying on data types as signals for identifying blocks. In this paper, we introduce a flexible functional block detection method by applying agglomerative clustering techniques which merge smaller blocks into larger blocks using two merging strategies. Our proposed method uses cell embeddings with a customized dissimilarity function which utilizes local and margin distances, as well as block coherence metrics to capture cell, block, and table scoped features. Given the diversity of tables in real-world corpora, we also introduce a sampling-based approach for automatically tuning distance thresholds for each table. Experimental results show that our method improves over the earlier state-of-the-art method in terms of several evaluation metrics.

Budget-constrained Truss Maximization over Large Graphs: A Component-based Approach

Xin Sun
Xin Huang
Zitan Sun
Di Jin

Cohesive substructure identification is one fundamental task of graph analytics. Recently, a useful problem of dense subgraph maximization has attracted significant attentions, which aims at enlarging a dense subgraph pattern using a few new edge insertions, e.g., k-core maximization. As a more cohesive subgraph of k-core, k-truss requires that each edge has at least k-2 triangles within this subgraph. However, the problem of k-truss maximization has not been studied yet. In this paper, we motivate and formulate a new problem of budget-constrained k-truss maximization. Given a budget of b edges and an integer k≥2, the problem is to find and insert b new edges into a graph G such that the resulted k-truss of G is maximized. We theoretically prove the NP-hardness of k-truss maximization problem. To efficiently tackle it, we analyze non-submodular property of k-truss newcomers function and develop non-conventional heuristic strategies for edge insertions. We first identify high-quality candidate edges with regard to (k-1)-light subgraphs and propose a greedy algorithm using per-edge insertion. Besides further improving the efficiency by pruning disqualified candidate edges, we finally develop a component-based dynamic programming algorithm for enlarging k-truss mostly, which makes a balance of budget assignment and inserts multiple edges simultaneously into all (k-1)-light components. Extensive experiments on nine real-world graphs demonstrate the efficiency and effectiveness of our proposed methods.

DESYR: Definition and Syntactic Representation Based Claim Detection on the Web

Megha Sundriyal
Parantak Singh
Md. Shad Akhtar
Shubhashis Sengupta
Tanmoy Chakraborty

The formulation of a claim rests at the core of argument mining. To demarcate between a claim and a non-claim is arduous for both humans and machines, owing to latent linguistic variance between the two and the inadequacy of extensive definition-based formalization. Furthermore, the increase in the usage of online social media has resulted in an explosion of unsolicited information on the web presented as informal text. To account for the aforementioned, in this paper, we propose DESYR. It is a framework that intends on annulling the said issues for informal web-based text by leveraging a combination of hierarchical representation learning (dependency-inspired Poincaré embedding), definition-based alignment, and feature projection. We do away with fine-tuning compute-heavy language models in favor of fabricating a more domain-centric but lighter approach. Experimental results indicate that DESYR builds upon the state-of-the-art system across four benchmark claim datasets, most of which were constructed with informal texts. We see an increase of 3 claim-F1 points on the LESA-Twitter dataset, an increase of 1 claim-F1 point and 9 macro-F1 points on the Online Comments (OC) dataset, an increase of 24 claim-F1 points and 17 macro-F1 points on the Web Discourse (WD) dataset, and an increase of 8 claim-F1 points and 5 macro-F1 points on the Micro Texts (MT) dataset. We also perform an extensive analysis of the results. We make a 100-D pre-trained version of our Poincaré-variant along with the source code.

Actionable Insights in Urban Multivariate Time-series

Anika Tabassum
Supriya Chinthavali
Varisara Tansakul
B. Aditya Prakash

Multivariate time-series data are gaining popularity in various urban applications, such as emergency management, public health, etc. Segmentation algorithms mostly focus on identifying discrete events with changing phases in such data. For example, consider a power outage scenario during a hurricane. Each time-series can represent the number of power failures in a county for a time period. Segments in such time-series are found in terms of different phases, such as, when a hurricane starts, counties face severe damage, and hurricane ends. Disaster management domain experts typically want to identify the most affected counties (time-series of interests) during these phases. These can be effective for retrospective analysis and decision-making for resource allocation to those regions to lessen the damage. However, getting these actionable counties directly (either by simple visualization or looking into the segmentation algorithm) is typically hard. Hence we introduce and formalize a novel problem RaTSS (Rationalization for time-series segmentation) that aims to find such time-series (rationalizations), which are actionable for the segmentation. We also propose an algorithm Find-RaTSS to find them for any black-box segmentation. We show Find-RaTSS outperforms non-trivial baselines on generalized synthetic and real data, also provides actionable insights in multiple urban domains, especially disasters and public health.

Counterfactual Explainable Recommendation

Juntao Tan
Shuyuan Xu
Yingqiang Ge
Yunqi Li
Xu Chen
Yongfeng Zhang

By providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (CountER), which takes the insights of counterfactual reasoning from causal inference for explainable recommendation. CountER is able to formulate the complexity and the strength of explanations, and it adopts a counterfactual learning framework to seek simple (low complexity) and effective (high strength) explanations for the model decision. Technically, for each item recommended to each user, CountER formulates a joint optimization problem to generate minimal changes on the item aspects so as to create a counterfactual item, such that the recommendation decision on the counterfactual item is reversed. These altered aspects constitute the explanation of why the original item is recommended. The counterfactual explanation helps both the users for better understanding and the system designers for better model debugging.

Another contribution of the work is the evaluation of explainable recommendation, which has been a challenging task. Fortunately, counterfactual explanations are very suitable for standard quantitative evaluation. To measure the explanation quality, we design two types of evaluation metrics, one from user's perspective (i.e. why the user likes the item), and the other from model's perspective (i.e. why the item is recommended by the model). We apply our counterfactual learning algorithm on a black-box recommender system and evaluate the generated explanations on five real-world datasets. Results show that our model generates more accurate and effective explanations than state-of-the-art explainable recommendation models. Source code is available at https://github.com/chrisjtan/counter.

Single Node Injection Attack against Graph Neural Networks

Shuchang Tao
Qi Cao
Huawei Shen
Junjie Huang
Yunfan Wu
Xueqi Cheng

Node injection attack on Graph Neural Networks (GNNs) is an emerging and practical attack scenario that the attacker injects malicious nodes rather than modifying original nodes or edges to affect the performance of GNNs. However, existing node injection attacks ignore extremely limited scenarios, namely the injected nodes might be excessive such that they may be perceptible to the target GNN. In this paper, we focus on an extremely limited scenario of single node injection evasion attack, i.e., the attacker is only allowed to inject one single node during the test phase to hurt GNN's performance. The discreteness of network structure and the coupling effect between network structure and node features bring great challenges to this extremely limited scenario. We first propose an optimization-based method to explore the performance upper bound of single node injection evasion attack. Experimental results show that 100%, 98.60%, and 94.98% nodes on three public datasets are successfully attacked even when only injecting one node with one edge, confirming the feasibility of single node injection evasion attack. However, such an optimization-based method needs to be re-optimized for each attack, which is computationally unbearable. To solve the dilemma, we further propose a Generalizable Node Injection Attack model, namely G-NIA, to improve the attack efficiency while ensuring the attack performance. Experiments are conducted across three well-known GNNs. Our proposed G-NIA significantly outperforms state-of-the-art baselines and is 500 times faster than the optimization-based method when inferring.

On Influencing the Influential: Disparity Seeding

Ya-Wen Teng
Hsi-Wen Chen
De-Nian Yang
Yvonne-Anne Pignolet
Ting-Wei Li
Lydia Chen

Online social networks have become a crucial medium to disseminate the latest political, commercial, and social information. Users with high visibility are often selected as seeds to spread information and affect their adoption in target groups. We study how gender differences and similarities can impact the information spreading process. Using a large-scale Instagram dataset and a small-scale Facebook dataset, we first conduct a multi-faceted analysis taking the interaction type, directionality and frequency into account. To this end, we explore a variety of existing and new single and multihop centrality measures. Our analysis unveils that males and females interact differently depending on the interaction types, e.g., likes or comments, and they feature different support and promotion patterns. We complement prior work showing that females do not reach top visibility (often referred to as the glass ceiling effect) jointly factoring in the connectivity and interaction intensity, both of which were previously mainly discussed independently.

Inspired by these observations, we propose a novel seeding framework, called Disparity Seeding, which aims at maximizing spread while reaching a target user group, e.g., a certain percentage of females - promoting the influence of under-represented groups. Disparity Seeding ranks influential users with two gender-aware measures, the Target HI-index and the Embedding index. Extensive simulations comparing Disparity Seeding with target-agnostic algorithms show that Disparity Seeding meets the target percentage while effectively maximizing the spread. Disparity Seeding can be generalized to counter different types of inequality, e.g., race, and proactively promote minorities in the society.

Self-supervised Representation Learning on Dynamic Graphs

Sheng Tian
Ruofan Wu
Leilei Shi
Liang Zhu
Tao Xiong

Graph representation learning has now become the de facto standard when dealing with graph-structured data. Using powerful tools from deep learning and graph neural networks, recent works have applied graph representation learning to time-evolving dynamic graphs and showed promising results. However, all the previous dynamic graph models require labeled samples to train, which might be costly to acquire in practice. Self-supervision offers a principled way of utilizing unlabeled data and has achieved great success in computer vision community. In this paper we propose debiased dynamic graph contrastive learning (DDGCL), the first self-supervised representation learning framework on dynamic graphs. The proposed model extends the contrastive learning idea to dynamic graphs via contrasting two nearby temporal views of the same node identity, with a time-dependent similarity critic. Inspired by recent theoretical developments contrastive learning, we propose a novel debiased GAN-type contrastive loss as the learning objective in order to correct the sampling bias occurred in negative sample construction process. We conduct extensive experiments on benchmark datasets via testing the DDGCL framework under two different self-supervision schemes: pretraining and finetuning and multi task learning. The results show that using a simple time-aware GNN encoder, the performance of downstream tasks is significantly improved under either scheme to closely match, or even outperform state-of-the-art dynamic graph models with more elegant encoder architectures. Further empirical evaluations suggest that the proposed approach offers more performance improvement than previously established self-supervision mechanisms over static graphs.

Recipe Representation Learning with Networks

Yijun Tian
Chuxu Zhang
Ronald Metoyer
Nitesh V. Chawla

Learning effective representations for recipes is essential in food studies for recommendation, classification, and other applications. Unlike what has been developed for learning textual or cross-modal embeddings for recipes, the structural relationship among recipes and food items are less explored. In this paper, we formalize the problem recipe representation learning with networks to involve both the textual feature and the structural relational feature into recipe representations. Specifically, we first present RecipeNet, a new and large-scale corpus of recipe data to facilitate network based food studies and recipe representation learning research. We then propose a novel heterogeneous recipe network embedding model, rn2vec, to learn recipe representations. The proposed model is able to capture textual, structural, and nutritional information through several neural network modules, including textual CNN, inner-ingredients transformer, and a graph neural network with hierarchical attention. We further design a combined objective function of node classification and link prediction to jointly optimize the model. The extensive experiments show that our model outperforms state-of-the-art baselines on two classic food study tasks. Dataset and codes are available at https://github.com/meettyj/rn2vec.

Conditional Graph Attention Networks for Distilling and Refining Knowledge Graphs in Recommendation

Ke Tu
Peng Cui
Daixin Wang
Zhiqiang Zhang
Jun Zhou
Yuan Qi
Wenwu Zhu

Knowledge graph is generally incorporated into recommender systems to improve overall performance. Due to the generalization and scale of the knowledge graph, most knowledge relationships are not helpful for a target user-item prediction. To exploit the knowledge graph to capture target-specific knowledge relationships in recommender systems, we need to distill the knowledge graph to reserve the useful information and refine the knowledge to capture the users' preferences. To address the issues, we propose Knowledge-aware Conditional Attention Networks (KCAN), which is an end-to-end model to incorporate knowledge graph into a recommender system. Specifically, we use a knowledge-aware attention propagation manner to obtain the node representation first, which captures the global semantic similarity on the user-item network and the knowledge graph. Then given a target, i.e., a user-item pair, we automatically distill the knowledge graph into the target-specific subgraph based on the knowledge-aware attention. Afterward, by applying a conditional attention aggregation on the subgraph, we refine the knowledge graph to obtain target-specific node representations. Therefore, we can gain both representability and personalization to achieve overall performance. Experimental results on real-world datasets demonstrate the effectiveness of our framework over the state-of-the-art algorithms.

Attention Based Dynamic Graph Learning Framework for Asset Pricing

Ajim Uddin
Xinyuan Tao
Dantong Yu

Recent studies suggest that financial networks play an essential role in asset valuation and investment decisions. Unlike road networks, financial networks are neither given nor static, posing significant challenges in learning meaningful networks and promoting their applications in price prediction. In this paper, we first apply the attention mechanism to connect the "dots" (firms) and learn dynamic network structures among stocks over time. Next, the end-to-end graph neural networks pipeline diffuses and propagates the firms' accounting fundamentals into the learned networks and ultimately predicts stock future returns. The proposed model reduces the prediction errors by 6% compared to the state-of-the-art models. Our results are robust with different assessment measures. We also show that portfolios based on our model outperform the S&P-500 index by 34% in terms of Sharpe Ratio, suggesting that our model is better at capturing the dynamic inter-connection among firms and identifying stocks with fast recovery from major events. Further investigation on the learned networks reveals that the network structure aligns closely with the market conditions. Finally, with an ablation study, we investigate different alternative versions of our model and the contribution of each component.

Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank

Ali Vardasbi
Maarten de Rijke
Ilya Markov

In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that corrects for position bias and trust bias. IPS and AC provably remove bias, conditioned on an accurate estimation of the bias parameters. Estimating the bias parameters, in turn, requires an accurate estimation of the relevance probabilities. This cyclic dependency introduces practical limitations in terms of sensitivity, convergence and efficiency.

We propose a new correction method for position and trust bias in CLTR in which, unlike the existing methods, the correction does not rely on relevance estimation. Our proposed method, mixture-based correction (MBC), is based on the assumption that the distribution of the CTRs over the items being ranked is a mixture of two distributions: the distribution of CTRs for relevant items and the distribution of CTRs for non-relevant items. We prove that our method is unbiased. The validity of our proof is not conditioned on accurate bias parameter estimation. Our experiments show that MBC, when used in different bias settings and accompanied by different LTR algorithms, outperforms AC, the state-of-the-art method for correcting position and trust bias, in some settings, while performing on par in other settings. Furthermore, MBC is orders of magnitude more efficient than AC in terms of the training time.

Reliable and Privacy-Preserving Task Matching in Blockchain-Based Crowdsourcing

Baolai Wang
Shaojing Fu
Xuyun Zhang
Tao Xie
Lingjuan Lyu
Yuchuan Luo

With the number of users in crowdsourcing increasing rapidly, task matching service is attracting more and more attention. However, it also causes many security concerns, one of which is the leakage of sensitive information. Privacy-preserving task matching techniques can protect the private information of task requesters and workers. Whereas existing privacy-preserving task matching schemes are constructed on a central server, and thereby they may suffer from potential wrongdoings of a malicious server. In addition, most of them only provide accurate task matching, which means that they cannot tolerate keyword spelling errors, leading to the decline of task matching accuracy. In this paper, we propose a Reliable and Privacy-preserving Task Matching scheme (RPTM) for crowdsourcing. To guarantee the reliability of task matching results, RPTM employs smart contracts to ensure that operations of RPTM are faithfully performed. However, it may still disclose the privacy of users due to the transparency of the blockchain. In order to deal with this problem, RPTM can perform task matching service without compromising the privacy of task requesters and workers by leveraging a novel integer vector encryption scheme. Moreover, RPTM supports multi-keyword fuzzy matching by exploiting locality sensitive hashing and Bloom filter, which can tolerate keyword spelling errors and different expression formats. Extensive analysis and experiments based on a test net of EOS show that RPTM is efficient and secure.

Enhancing User Interest Modeling with Knowledge-Enriched Itemsets for Sequential Recommendation

Chunyang Wang
Yanmin Zhu
Haobing Liu
Wenze Ma
Tianzi Zang
Jiadi Yu

Sequential recommendation which aims to predict a user's next interaction based on his/her previous behaviors, has attracted great attention. Recent studies mainly employ deep recurrent neural networks or self-attention networks to capture dynamic user preferences. However, existing methods merely focus on modeling users' clear interests in interacted items. We argue that for an interaction, the user may also have ambiguous interests in items that are semantically related to the interacted one. For comprehensively capturing user preferences, it is beneficial to discover potential interests from historical interactions at a broader itemset level. Therefore, in this paper, we propose a knowledge graph enhanced sequential recommendation model namely KGIE, which focuses on enhancing user interest modeling with knowledge-enriched itemsets by incorporating the knowledge graph. Specifically, in addition to item-level interest modeling with interacted items, we further construct knowledge-enriched itemsets that are extracted via high-order knowledge associations with the interacted items. For capturing personalized itemset-level interests, we design an attentive aggregation unit to combine item embeddings considering both inherent and contextual personalization signals. Furthermore, to balance the contributions of both two levels of interest modeling, we adaptively learn high-level preference representations with a gating fusion unit. Extensive experiments on three real-world datasets demonstrate the superior performance beyond state-of-the-art methods and recommendation interpretability of our model.

Neural Information Diffusion Prediction with Topic-Aware Attention Network

Hao Wang
Cheng Yang
Chuan Shi

Information diffusion prediction targets on forecasting how information items spread among a set of users. Recently, neural networks have been widely used in modeling information diffusion, owing to the great successes of deep learning. However, in real-world information diffusion scenarios, users are likely to have different behaviors to information items from different topics. Existing neural-based methods failed to model the topic-specific diffusion patterns and dependencies, which have been shown to be useful in conventional non-neural methods. In this paper, we propose Topic-aware Attention Network (TAN) to take advantage of both topic-specific diffusion modeling and deep learning techniques. We jointly model the text content of information items and cascade sequences by incorporating topical context and user/position dependencies into user representations via attention mechanisms. A time-decayed aggregation module is further employed to integrate user representations for cascade representations, which can encode the topic-specific diffusion dependencies independently. Experimental results on diffusion prediction tasks over three realistic cascade datasets show that our model can achieve a relative improvement up to 9% against the best performing baseline in terms of Hits@10.

A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and Storage

Haoyu Wang
Yaqing Wang
Defu Lian
Jing Gao

Knowledge graphs, which consist of entities and their relations, have become a popular way to store structured knowledge. Knowledge graph embedding (KGE), which derives a representation for each entity and relation, has been widely used to capture the semantics of the information in the knowledge graphs, and has demonstrated great success in many downstream applications, such as the extraction of similar entities in response to a query entity. However, existing KGE methods cannot work well on emerging knowledge graphs that are large-scale due to the constraints in storage and inference efficiency. In this paper, we propose a lightweight KGE model, LightKG, which significantly reduces storage as well as running time needed for inference. Instead of storing a continuous vector for every entity, LightKG only needs to store a few codebooks, each of which contains some codewords that correspond to the representatives among the embeddings, and the indices that correspond to the codeword selections for entities. Hence LightKG can achieve highly efficient storage. The efficiency of the downstream querying process can be significantly boosted too with the proposed LightKG model as the relevance score between the query and an entity can be efficiently calculated via a quick look-up in a table that contains the scores between the query and codewords. The storage and inference efficiency of LightKG is achieved by its novel design. LightKG is an end-to-end framework that automatically infers codebooks and codewords and generates an approximated embedding for each entity. A residual module is included in LightKG to induce the diversity among codebooks, and a continuous function is adopted to approximate codeword selection, which is non-differential. In addition, to further improve the performance of KGE, we propose a novel dynamic negative sampling method based on quantization, which can be applied to the proposed LightKG or other KGE methods. We conduct extensive experiments on five public datasets. The experiments show that LightKG is search and memory efficient with high approximate search accuracy. Also, the dynamic negative sampling can dramatically improve model performance with over 19% improvement on average.

Addressing the Hardness of k-Facility Relocation Problem: A Pair of Approximate Solutions

Hu Wang
Hui Li
Meng Wang
Jiangtao Cui

Facility Relocation (FR), which is an effort to reallocate the placement of facilities to adapt to the changes of urban planning and population distribution, has remarkable impact on many application areas. Existing solutions to the FR problem either focus on relocating one facility (ie 1-FR) or fail to guarantee the result quality on relocating k>1 facilities (ie k-FR). As k-FR problem is NP-hard and is not submodular or non-decreasing, traditional hill-climb approximate algorithm cannot be directly applied. In light of that, we propose to transform k-FR into another facility placement problem, which is submodular and non-decreasing. We theoretically prove that the optimal solution of both problems are equivalent. Accordingly, we are able to present the first approximate solution towards the k-FR, namely FR2FP. Our extensive comparison over both FR2FP and the state-of-the-art heuristic solution shows that FR2FP, although provides approximation guarantee, cannot necessarily given superior results to the heuristic solution. The comparison motivates and, more importantly, directs us to present an advanced approximate solution, namely FR2FP-ex. Extensive experimental study over both real-world and synthetic datasets have verified that, FR2FP-ex demonstrates the best result quality. In addition, we also exactly unveil the scenarios when the state-of-the-art heuristic would fail to provide satisfied results in practice.

Fast k-NN Graph Construction by GPU based NN-Descent

Hui Wang
Wan-Lei Zhao
Xiangxiang Zeng
Jianye Yang

NN-Descent is a classic k-NN graph construction approach. It is still widely employed in machine learning, computer vision, and information retrieval tasks due to its efficiency and genericness. However, the current design only works well on CPU. In this paper, NN-Descent has been redesigned to adapt to the GPU architecture. A new graph update strategy called selective update is proposed. It reduces the data exchange between GPU cores and GPU global memory significantly, which is the processing bottleneck under GPU computation architecture. This redesign leads to full exploitation of the parallelism of the GPU hardware. In the meantime, the genericness, as well as the simplicity of NN-Descent, are well-preserved. Moreover, a procedure that allows to k-NN graph to be merged efficiently on GPU is proposed. It makes the construction of high-quality k-NN graphs for out-of-GPU-memory datasets tractable. Our approach is 100-250× faster than the single-thread NN-Descent and is 2.5-5× faster than the existing GPU-based approaches as we tested on million as well as billion scale datasets.

Top-k Tree Similarity Join

Jianhua Wang
Jianye Yang
Wenjie Zhang

Tree similarity join is useful for analyzing tree structured data. The traditional threshold-based tree similarity join requires a similarity threshold, which is usually a difficult task for users. To remedy this issue, we advocate the problem of top-k tree similarity join. Given a collection of trees and a parameter k, the top-k tree similarity join aims to find k tree pairs with minimum tree edit distance (TED). Although we show that this problem can be resolved by utilizing the threshold-based join, the efficiency is unsatisfactory. In this paper, we propose an efficient algorithm, namely TopKTJoin, which generates the candidate tree pairs incrementally using an inverted index. We also derive TED lower bound for the unseen tree pairs. Together with TED value of the k-th best join result seen so far, we have a chance to terminate the algorithm early without missing any correct results. To further improve the efficiency, we propose two optimization techniques in terms of index structure and verification mechanism. We conduct comprehensive performance studies on real and synthetic datasets. The experimental results demonstrate that TopKTJoin significantly outperforms the baseline method.

Popularity-Enhanced News Recommendation with Multi-View Interest Representation

Jingkun Wang
Yipu Chen
Zichun Wang
Wen Zhao

News recommendation is of vital importance to alleviating in-formation overload. Recent research shows that precise modeling of news content and user interests become critical for news rec-ommendation. Existing methods usually utilize information such as news title, abstract, entities to predict Click Through Rate(CTR) or add some auxiliary tasks to a multi-task learning framework. However, none of them directly consider predicted news popularity and the degree of users' attention to popular news into the CTR prediction results. Meanwhile, multiple inter-ests may arise throughout users' browsing history. Thus it is hard to represent user interests via a single user vector. In this paper, we propose PENR, a Popularity-Enhanced News Recommenda-tion method, which integrates popularity prediction task to im-prove the performance of the news encoder. News popularity score is predicted and added to the final CTR, while news popu-larity is utilized to model the degree of users' tendency to follow hot news. Moreover, user interests are modeled from different perspectives via a subspace projection method that assembles the browsing history to multiple subspaces. In this way, we capture users' multi-view interest representations. Experiments on a real-world dataset validate the effectiveness of our PENR approach.

Modeling Heterogeneous Graph Network on Fraud Detection: A Community-based Framework with Attention Mechanism

Li Wang
Peipei Li
Kai Xiong
Jiashu Zhao
Rui Lin

Fraud activities in e-commerce, such as spam reviews and fake shopping behaviors, significantly mislead customers' decision making, damage the platforms' reputation, and reduce enterprises' revenue. In recent years, GNN-based models have been widely adopted in fraud detection tasks, which have shown better performance compared to conventional rule-based methods and feature-based models. Most GNN-based models focus on homogeneous graphs, usually including user-to-user, or item-to-item connections. These types of graphs have limitations of eliminating certain types of connections, such as user-item connections. In addition, GNN-based models aggregate neighborhood information based on the assumption that neighbors share the similar structure and content. However, in fraud detection tasks, two major inconsistency issues arise: Severe mixture of structure-inconsistency due to extremely unbalanced positive and negative samples; and mixture of content-inconsistency due to the difference between various item categories. To address the above issues, we propose a Community-based Framework with ATtention mechanism for large-scale Heterogeneous graphs (C-FATH). In order to utilize the entire heterogeneous graph, we directly model on the heterogeneous graph and combine it with homogeneous graphs. The structure-inconsistent nodes are filtered by introducing the community information when constructing neighbors. Content-inconsistent nodes are selected with lower probability by a similarity-based sampling strategy. Further, the model is trained in a multi-task manner that each node type (e.g. user, item, device, order, and review) is associated with a specific loss function. Comprehensive experiments are conducted on two public review datasets and two large-scale datasets from JD.com, and the experimental results demonstrate the effectiveness and scalability of the proposed C-FATH compared to the state-of-the-art approaches.

Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification

Peiyi Wang
Runxin Xun
Tianyu Liu
Damai Dai
Baobao Chang
Zhifang Sui

Few-Shot Event Classification (FSEC) aims at developing a model for event prediction, which can generalize to new event types with a limited number of annotated data. Existing FSEC studies have achieved high accuracy on different benchmarks. However, we find they suffer from trigger biases that signify the statistical homogeneity between some trigger words and target event types, which we summarize as trigger overlapping and trigger separability. The biases can result in context-bypassing problem, i.e., correct classifications can be gained by looking at only the trigger words while ignoring the entire context. Therefore, existing models can be weak in generalizing to unseen data in real scenarios. To further uncover the trigger biases and assess the generalization ability of the models, we propose two new sampling methods, Trigger-Uniform Sampling (TUS) and COnfusion Sampling (COS), for the meta tasks construction during evaluation. Besides, to cope with the context-bypassing problem in FSEC models, we introduce adversarial training and trigger reconstruction techniques. Experiments show these techniques help not only improve the performance, but also enhance the generalization ability of models.

REFORM: Error-Aware Few-Shot Knowledge Graph Completion

Song Wang
Xiao Huang
Chen Chen
Liang Wu
Jundong Li

Knowledge graphs (KGs) are of great importance in various artificial intelligence systems, such as question answering, relation extraction, and recommendation. Nevertheless, most real-world KGs are highly incomplete, with many missing relations between entities. To discover new triples (i.e., head entity, relation, tail entity), many KG completion algorithms have been proposed in recent years. However, a vast majority of existing studies often require a large number of training triples for each relation, which contradicts the fact that the frequency distribution of relations in KGs often follows a long tail distribution, meaning a majority of relations have only very few triples. Meanwhile, since most existing large-scale KGs are constructed automatically by extracting information from crowd-sourcing data using heuristic algorithms, plenty of errors could be inevitably incorporated due to the lack of human verification, which greatly reduces the performance for KG completion. To tackle the aforementioned issues, in this paper, we study a novel problem of error-aware few-shot KG completion and present a principled KG completion framework REFORM. Specifically, we formulate the problem under the few-shot learning framework, and our goal is to accumulate meta-knowledge across different meta-tasks and generalize the accumulated knowledge to the meta-test task for error-aware few-shot KG completion. To address the associated challenges resulting from insufficient training samples and inevitable errors, we propose three essential modules neighbor encoder, cross-relation aggregation, and error mitigation in each meta-task. Extensive experiments on three widely used KG datasets demonstrate the superiority of the proposed framework REFORM over competitive baseline methods.

Adaptive Posterior Knowledge Selection for Improving Knowledge-Grounded Dialogue Generation

Weichao Wang
Wei Gao
Shi Feng
Ling Chen
Daling Wang

In open-domain dialogue systems, knowledge information such as unstructured persona profiles, text descriptions and structured knowledge graph can help incorporate abundant background facts for delivering more engaging and informative responses. Existing studies attempted to model a general posterior distribution over candidate knowledge by considering the entire response utterance as a whole at the beginning of decoding process for knowledge selection. However, a single smooth distribution could fail to model the variability of knowledge selection patterns over different decoding steps, and make the knowledge expression less consistent. To remedy this issue, we propose an adaptive posterior knowledge selection framework, which sequentially introduces a series of discriminative distributions to dynamically control when and what knowledge should be used in specific decoding steps. The adaptive distributions can also capture knowledge-relevant semantic dependencies between adjacent words to refine response generation. In particular, for knowledge graph-grounded dialogue generation, we further incorporate the adaptive distributions into generative word distributions to help express the knowledge entity words. The experimental results show that our developed methods outperform strong baseline systems by large margins.

Improving Chinese Character Representation with Formation Graph Attention Network

Xiaosu Wang
Yun Xiong
Hao Niu
Jingwen Yue
Yangyong Zhu
Philip S. Yu

Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation.

However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information.

In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.

Using Knowledge Concept Aggregation towards Accurate Cognitive Diagnosis

Xinping Wang
Caidie Huang
Jinfang Cai
Liangyu Chen

Cognitive diagnosis is a crucial task in the field of educational measurement and psychology, which is aimed to mine and analyze the level of knowledge for a student in his or her learning process periodically. While a number of approaches and tools have been developed to diagnose the learning states of students, they do not fully learn the relationship between students, exercises and knowledge concepts in the learning system, or do not consider the traits that it is easier to complete diagnosis when focusing on a small part of knowledge concepts rather than all knowledge concepts. To address these limitations, we develop CDGK, a model based artificial neural network to deal with cognitive diagnosis. Our method not only captures non-linear interactions between exercise features, student scores, and their mastery on each knowledge concept, but also performs an aggregation of the knowledge concepts via converting them into graph structure, and only considering the leaf node in the knowledge concept tree, which can reduce the dimension of the model without accuracy loss. In our evaluation on two real-world datasets, CDGK outperforms the state-of-the-art related approaches in terms of accuracy, reasonableness and interpretability.

Multi-hop Reading on Memory Neural Network with Selective Coverage for Medication Recommendation

Yanda Wang
Weitong Chen
Dechang Pi
Lin Yue
Miao Xu
Xue Li

Medication recommendation aiming at accurate prescription is a significant clinical application that assists caregivers in professional practice of medicine, and obtaining informative patient representations plays an important role in building effective recommendation models. Meanwhile, conducting attentive multi-hop reading on Memory Neural Network (MemNN) that stores knowledge from previous admissions is widely applied to derive contextual patterns for accurate patient representations. However, regular attentive reading may repeatedly attend to the same slots of MemNN. Although the coverage mechanism is proposed to tackle the problem, it is based on the assumption that there is one-to-one alignment between source information and target outputs, which medical records do not follow. In pursuit of a valuable model for medication recommendation, we propose the Multi-hop Reading with Selective Coverage (MRSC). MRSC firstly conducts information selection on MemNN based on the coverage of each slot. Then the method involves coverage into the attention calculation during the multi-hop reading on MemNN, making sure that all important historical records is fully utilized by balancing attention within selected information. Experiments on real-world clinical dataset demonstrate that MRSC successfully derives informative patient representations for the recommendation by conducting selection on MemNN and limiting attention adjustment within selected information.

The Skyline of Counterfactual Explanations for Machine Learning Decision Models

Yongjie Wang
Qinxu Ding
Ke Wang
Yue Liu
Xingyu Wu
Jinglong Wang
Yong Liu
Chunyan Miao

Counterfactual explanations are minimum changes of a given input to alter the original prediction by a machine learning model, usually from an undesirable prediction to a desirable one. Previous works frame this problem as a constrained cost minimization, where the cost is defined as L1/L2 distance (or variants) over multiple features to measure the change. In real-life applications, features of different types are hardly comparable and it is difficult to measure the changes of heterogeneous features by a single cost function. Moreover, existing approaches do not support interactive exploration of counterfactual explanations. To address above issues, we propose the skyline counterfactual explanations that define the skyline of counterfactual explanations as all non-dominated changes. We solve this problem as multi-objective optimization over actionable features. This approach does not require any cost function over heterogeneous features. With the skyline, the user can interactively and incrementally refine their goals on the features and magnitudes to be changed, especially when lacking prior knowledge to express their needs precisely. Intensive experiment results on three real-life datasets demonstrate that the skyline method provides a friendly way for finding interesting counterfactual explanations, and achieves superior results compared to the state-of-the-art methods.

Tree Decomposed Graph Neural Network

Yu Wang
Tyler Derr

Graph Neural Networks (GNNs) have achieved significant success in learning better representations by performing feature propagation and transformation iteratively to leverage neighborhood information. Nevertheless, iterative propagation restricts the information of higher-layer neighborhoods to be transported through and fused with the lower-layer neighborhoods', which unavoidably results in feature smoothing between neighborhoods in different layers and can thus compromise the performance, especially on heterophily networks. Furthermore, most deep GNNs only recognize the importance of higher-layer neighborhoods while yet to fully explore the importance of multi-hop dependency within the context of different layer neighborhoods in learning better representations. In this work, we first theoretically analyze the feature smoothing between neighborhoods in different layers and empirically demonstrate the variance of the homophily level across neighborhoods at different layers. Motivated by these analyses, we further propose a tree decomposition method to disentangle neighborhoods in different layers to alleviate feature smoothing among these layers. Moreover, we characterize the multi-hop dependency via graph diffusion within our tree decomposition formulation to construct Tree Decomposed Graph Neural Network (TDGNN), which can flexibly incorporate information from large receptive fields and aggregate this information utilizing the multi-hop dependency. Comprehensive experiments demonstrate the superior performance of TDGNN on both homophily and heterophily networks under a variety of node classification settings. Extensive parameter analysis highlights the ability of TDGNN to prevent over-smoothing and incorporate features from shallow layers with deeper multi-hop dependencies, which provides new insights towards deeper graph neural networks.

An Efficient Quantitative Approach for Optimizing Convolutional Neural Networks

Yuke Wang
Boyuan Feng
Xueqiao Peng
Yufei Ding

With the increasing popularity of deep learning, Convolutional Neural Networks (CNNs) have been widely applied in various domains, such as image classification and object detection, and achieve stunning success in terms of their high accuracy over the traditional statistical methods. To exploit the potentials of CNN models, a huge amount of research and industry efforts have been devoted to optimizing CNNs. Among these endeavors, CNN architecture design has attracted tremendous attention because of its great potential of improving model accuracy or reducing model complexity. However, existing work either introduces repeated training overhead in the search process or lacks an interpretable metric to guide the design.

To clear these hurdles, we propose 3D-Receptive Field (3DRF), an explainable and easy-to-compute metric, to estimate the quality of a CNN architecture and guide the search process of designs. To validate the effectiveness of 3DRF, we build a static optimizer to improve the CNN architectures at both the stage level and the kernel level. Our optimizer not only provides a clear and reproducible procedure but also mitigates unnecessary training efforts in the architecture search process. Extensive experiments and studies show that the models generated by our optimizer can achieve up to 5.47% accuracy improvement and up to 65.38% parameters deduction, compared with state-of-the-art CNN structures like MobileNet and ResNet.

Spatio-Temporal-Categorical Graph Neural Networks for Fine-Grained Multi-Incident Co-Prediction

Zhaonan Wang
Renhe Jiang
Zekun Cai
Zipei Fan
Xin Liu
Kyoung-Sook Kim
Xuan Song
Ryosuke Shibasaki

Forecasting incident occurrences (e.g. crime, EMS, traffic accident) is a crucial task for emergency service providers and transportation agencies in performing response time optimization and dynamic fleet management. However, such events are by nature rare and sparse, which causes the label imbalance problem and inferior performance of models relying on data sufficiency. The existing studies circumvent, instead of truly solving, this issue by defining the incident prediction problem in a coarse-grained temporal (e.g. daily) setting, which leaves the proposed models unrobust to fine-grained dynamics and trivial for the real-world decision making. In this paper, we tackle the temporally fine-grained incident prediction problem in a sparse setting by explicitly exploiting the behind-the-scene chainlike triggering mechanism. Moreover, this chain effect roots in multiple domains (i.e. spatial, categorical), which further entangles with the temporal dimension and happens to be time-variant. To be specific, we propose a novel deep learning framework, namely Spatio-Temporal-Categorical Graph Neural Networks (STC-GNN), to handle the multidimensional and dynamic chain effect for performing fine-grained multi-incident co-prediction. Extensive experiments on three real-world city-level incident datasets verify the insightfulness of our perspective and effectiveness of the proposed model.

Task Assignment with Worker Churn Prediction in Spatial Crowdsourcing

Ziwei Wang
Yan Zhao
Xuanhao Chen
Kai Zheng

The pervasiveness of GPS-enabled devices and wireless communication technologies flourish the market of Spatial Crowdsourcing (SC), which consists of location-based tasks and requires workers to physically be at specific locations to complete them. In this work, we study the problem of Worker Churn based Task Assignment in SC, where tasks are to be assigned by considering workers' churn. In particular, we aim to achieve the highest total rewards of task assignments based on the worker churn prediction. To solve the problem, we propose a two-phase framework, which consists of a worker churn prediction phase and a task assignment phase. In the first phase, we use an LSTM-based model to extract the latent feelings of workers based on the historical data and then estimate the idle time intervals of workers. In the assignment phase, we design an efficient greedy algorithm and a Kuhn-Munkras (KM)-based algorithm that can achieve the optimal task assignment. Extensive experiments offer insight into the effectiveness and efficiency of the proposed solutions.

Region Semantically Aligned Network for Zero-Shot Learning

Ziyang Wang
Yunhao Gou
Jingjing Li
Yu Zhang
Yang Yang

Zero-shot learning (ZSL) aims to recognize unseen classes based on the knowledge of seen classes. Previous methods focused on learning direct embeddings from global features to the semantic space in hope of knowledge transfer from seen classes to unseen classes. However, an unseen class shares local visual features with a set of seen classes and leveraging global visual features makes the knowledge transfer ineffective. To tackle this problem, we propose a Region Semantically Aligned Network (RSAN), which maps local features of unseen classes to their semantic attributes. Instead of using global features which are obtained by an average pooling layer after an image encoder, we directly utilize the output of the image encoder which maintains local information of the image. Concretely, we obtain each attribute from a specific region of the output and exploit these attributes for recognition. As a result, the knowledge of seen classes can be successfully transferred to unseen classes in a region-bases manner. In addition, we regularize the image encoder through attribute regression with a semantic knowledge to extract robust and attribute-related visual features. Experiments on several standard ZSL datasets reveal the benefit of the proposed RSAN method, outperforming state-of-the-art methods.

Pooling Architecture Search for Graph Classification

Lanning Wei
Huan Zhao
Quanming Yao
Zhiqiang He

Graph classification is an important problem with applications across many domains, like chemistry and bioinformatics, for which graph neural networks (GNNs) have been state-of-the-art (SOTA) methods. GNNs are designed to learn node-level representation based on neighborhood aggregation schemes, and to obtain graph-level representation, pooling methods are applied after the aggregation operation in existing GNN models to generate coarse-grained graphs. However, due to highly diverse applications of graph classification, and the performance of existing pooling methods vary on different graphs. In other words, it is a challenging problem to design a universal pooling architecture to perform well in most cases, leading to a demand for data-specific pooling methods in real-world applications. To address this problem, we propose to use neural architecture search (NAS) to search for adaptive pooling architectures for graph classification. Firstly we designed a unified framework consisting of four modules: Aggregation, Pooling, Readout, and Merge, which can cover existing human-designed pooling methods for graph classification. Based on this framework, a novel search space is designed by incorporating popular operations in human-designed architectures. Then to enable efficient search, a coarsening strategy is proposed to continuously relax the search space, thus a differentiable search method can be adopted. Extensive experiments on six real-world datasets from three domains are conducted, and the results demonstrate the effectiveness and efficiency of the proposed framework1

AutoIAS: Automatic Integrated Architecture Searcher for Click-Trough Rate Prediction

Zhikun Wei
Xin Wang
Wenwu Zhu

Automating architecture design for recommendation tasks becomes a trending topic because expert efforts are saved, and better performance is expected. Neural Architecture Search (NAS) is introduced to discover powerful CTR prediction model architectures in recent works. CTR prediction model usually consists of three components: embedding layer, interaction layer, and deep neural network. However, existing automation works focus on searching single component and leaving other components hand-crafted. The isolated searching will cause incompatibility among components and lead to weak generalization ability. Moreover, there is not a unified framework for integrated CTR prediction model architecture searching. This paper presents Automatic Integrated Architecture Searcher (AutoIAS), a framework that provides a practical and general method to find optimal CTR prediction model architecture in an automatic manner. In AutoIAS, we unify existing interaction-based CTR prediction model architectures and propose an integrated search space for a complete CTR prediction model. We utilize a supernet to predict the performance of sub-architectures, and the supernet is trained with Knowledge Distillation(KD) to enhance consistency among sub-architectures. To efficiently explore the search space, we design an architecture generator network that explicitly models the architecture dependencies among components and generates conditioned architectures distribution for each component. Experiments on public datasets show the outstanding performance and generalization ability of AutoIAS. Ablation study shows the effectiveness of the KD-based supernet training method and the Architecture Generator Network.

Predicting Instance Type Assertions in Knowledge Graphs Using Stochastic Neural Networks

Tobias Weller
Maribel Acosta

Instance type information is particularly relevant to perform reasoning and obtain further information about entities in knowledge graphs (KGs). However, during automated or pay-as-you-go KG construction processes, instance types might be incomplete or missing in some entities. Previous work focused mostly on representing entities and relations as embeddings based on the statements in the KG. While the computed embeddings encode semantic descriptions and preserve the relationship between the entities, the focus of these methods is often not on predicting schema knowledge, but on predicting missing statements between instances for completing the KG. To fill this gap, we propose an approach that first learns a KG representation suitable for predicting instance type assertions. Then, our solution implements a neural network architecture to predict instance types based on the learned representation. Results show that our representations of entities are much more separable with respect to their associations with classes in the KG, compared to existing methods. For this reason, the performance of predicting instance types on a large number of KGs, in particular on cross-domain KGs with a high variety of classes, is significantly better in terms of F1-score than previous work.

SeeQuery: An Automatic Method for Recommending Translations of Ontology Competency Questions into SPARQL-OWL

Dawid Wisniewski
Jedrzej Potoniec
Agnieszka Lawrynowicz

Ontology authoring is a complicated and error-prone process since the knowledge being modeled is expressed using logic-based formalisms, in which logical consequences of the knowledge have to be foreseen. To make that process easier, competency questions (CQs), being questions expressed in natural language are often stated to trace both the correctness and completeness of the ontology at a given time. However, CQs have to be translated into a formal language, like ontology query language (SPARQL-OWL), to query the ontology. Since the translation step is time-consuming and requires familiarity with the query language used, in this paper, we propose an automatic method named SeeQuery, which recommends SPARQL-OWL queries being translations of CQs stated against a given ontology. It consists of a pipeline of transformations based on template matching and filling, being motivated by the biggest to date publicly available CQ to SPARQL-OWL datasets. We provide a detailed description of SeeQuery and evaluate the method on a separate set of 2 ontologies with their CQs. It is, to date, the only automatic method available for recommending SPARQL-OWL queries out of CQs. The source code of SeeQuery is available at: https://github.com/dwisniewski/SeeQuery.

Clustering of Conversational Bandits for User Preference Learning and Elicitation

Junda Wu
Canzhe Zhao
Tong Yu
Jingyang Li
Shuai Li

Conversational recommender systems elicit user preference via interactive conversational interactions. By introducing conversational key-terms, existing conversational recommenders can effectively reduce the need for extensive exploration in a traditional interactive recommender. However, there are still limitations of existing conversational recommender approaches eliciting user preference via key-terms. First, the key-term data of the items needs to be carefully labeled, which requires a lot of human efforts. Second, the number of the human labeled key-terms is limited and the granularity of the key-terms is fixed, while the elicited user preference is usually from coarse-grained to fine-grained during the conversations. In this paper, we propose a clustering of conversational bandits algorithm. To avoid the human labeling efforts and automatically learn the key-terms with the proper granularity, we online cluster the items and generate meaningful key-terms for the items during the conversational interactions. Our algorithm is general and can also be used in the user clustering when the feedback from multiple users is available, which further leads to more accurate learning and generations of conversational key-terms. We analyze the regret bound of our learning algorithm. In the empirical evaluations, without using any human labeled key-terms, our algorithm effectively generates meaningful coarse-to-fine grained key-terms and performs as well as or better than the state-of-the-art baseline.

DisenKGAT: Knowledge Graph Embedding with Disentangled Graph Attention Network

Junkang Wu
Wentao Shi
Xuezhi Cao
Jiawei Chen
Wenqiang Lei
Fuzheng Zhang
Wei Wu
Xiangnan He

Knowledge graph completion (KGC) has become a focus of attention across deep learning community owing to its excellent contribution to numerous downstream tasks. Although recently have witnessed a surge of work on KGC, they are still insufficient to accurately capture complex relations, since they adopt the single and static representations. In this work, we propose a novel Disentangled Knowledge Graph Attention Network (DisenKGAT) for KGC, which leverages both micro-disentanglement and macro-disentanglement to exploit representations behind Knowledge graphs (KGs). To achieve micro-disentanglement, we put forward a novel relation-aware aggregation to learn diverse component representation. For macro-disentanglement, we leverage mutual information as a regularization to enhance independence. With the assistance of disentanglement, our model is able to generate adaptive representations in terms of the given scenario. Besides, our work has strong robustness and flexibility to adapt to various score functions. Extensive experiments on public benchmark datasets have been conducted to validate the superiority of DisenKGAT over existing methods in terms of both accuracy and explainability.

DynSTGAT: Dynamic Spatial-Temporal Graph Attention Network for Traffic Signal Control

Libing Wu
Min Wang
Dan Wu
Jia Wu

Adaptive traffic signal control plays a significant role in the construction of smart cities. This task is challenging because of many essential factors, such as cooperation among neighboring intersections and dynamic traffic scenarios. First, to facilitate the cooperation of traffic signals, existing work adopts graph neural networks to incorporate the temporal and spatial influences of the surrounding intersections into the target intersection, where spatial-temporal information is used separately. However, one drawback of these methods is that the spatial-temporal correlations are not adequately exploited to obtain a better control scheme. Second, in a dynamic traffic environment, the historical state of the intersection is also critical for predicting future signal switching. Previous work mainly solves this problem using the current intersection's state, neglecting the fact that traffic flow is continuously changing both spatially and temporally and does not handle the historical state.

In this paper, we propose a novel neural network framework named DynSTGAT, which integrates dynamic historical state into a new spatial-temporal graph attention network to address the above two problems. More specifically, our DynSTGAT model employs a novel multi-head graph attention mechanism, which aims to adequately exploit the joint relations of spatial-temporal information. Then, to efficiently utilize the historical state information of the intersection, we design a sequence model with the temporal convolutional network (TCN) to capture the historical information and further merge it with the spatial information to improve its performance. Extensive experiments conducted in the multi-intersection scenario on synthetic data and real-world data confirm that our method can achieve superior performance in travel time and throughput against the state-of-the-art methods.

Seq2Bubbles: Region-Based Embedding Learning for User Behaviors in Sequential Recommenders

Qitian Wu
Chenxiao Yang
Shuodian Yu
Xiaofeng Gao
Guihai Chen

User behavior sequences contain rich information about user interests and are exploited to predict user's future clicking in sequential recommendation. Existing approaches, especially recently proposed deep learning models, often embed a sequence of clicked items into a single vector, i.e., a point in vector space, which suffer from limited expressiveness for complex distributions of user interests with multi-modality and heterogeneous concentration. In this paper, we propose a new representation model, named as Seq2Bubbles, for sequential user behaviors via embedding an input sequence into a set of bubbles each of which is represented by a center vector and a radius vector in embedding space. The bubble embedding can effectively identify and accommodate multi-modal user interests and diverse concentration levels. Furthermore, we design an efficient scheme to compute distance between a target item and the bubble embedding of a user sequence to achieve next-item recommendation. We also develop a self-supervised contrastive loss based on our bubble embeddings as an effective regularization approach. Extensive experiments on four benchmark datasets demonstrate that our bubble embedding can consistently outperform state-of-the-art sequential recommendation models.

Incremental Graph Convolutional Network for Collaborative Filtering

Jiafeng Xia
Dongsheng Li
Hansu Gu
Tun Lu
Peng Zhang
Ning Gu

Graph neural networks (GNN) recently achieved huge success in collaborative filtering (CF) due to the useful graph structure information. However, users will continuously interact with items, which causes the user-item interaction graphs to change over time and well-trained GNN models to be out-of-date soon. Naive solutions such as periodic retraining lose important temporal information and are computationally expensive. Recent works that leverage recurrent neural networks to keep GNN up-to-date may suffer from the "catastrophic forgetting'' issue, and experience a cold start with new users and items. To this end, we propose the incremental graph convolutional network (IGCN) --- a pure graph convolutional network (GCN) based method to update GNN models when new user-item interactions are available. IGCN consists of two main components: 1) a historical feature generation layer, which generates the initial user/item embedding via model agnostic meta-learning and ensures good initial states and fast model adaptation; 2) a temporal feature learning layer, which first aggregates the features from local neighborhood to update the embedding of each user/item within each subgraph via graph convolutional network and then fuses the user/item embeddings from last subgraph and current subgraph via incremental temporal convolutional network. Experimental studies on real-world datasets show that IGCN can outperform state-of-the-art CF algorithms in sequential recommendation tasks.

Self-Supervised Graph Co-Training for Session-based Recommendation

Xin Xia
Hongzhi Yin
Junliang Yu
Yingxia Shao
Lizhen Cui

Session-based recommendation targets next-item prediction by exploiting user behaviors within a short time period. Compared with other recommendation paradigms, session-based recommendation suffers more from the problem of data sparsity due to the very limited short-term interactions. Self-supervised learning, which can discover ground-truth samples from the raw data, holds vast potentials to tackle this problem. However, existing self-supervised recommendation models mainly rely on item/segment dropout to augment data, which are not fit for session-based recommendation because the dropout leads to sparser data, creating unserviceable self-supervision signals. In this paper, for informative session-based data augmentation, we combine self-supervised learning with co-training, and then develop a framework to enhance session-based recommendation. Technically, we first exploit the session-based graph to augment two views that exhibit the internal and external connectivities of sessions, and then we build two distinct graph encoders over the two views, which recursively leverage the different connectivity information to generate ground-truth samples to supervise each other by contrastive learning. In contrast to the dropout strategy, the proposed self-supervised graph co-training preserves the complete session information and fulfills genuine data augmentation. Extensive experiments on multiple benchmark datasets show that, session-based recommendation can be remarkably enhanced under the regime of self-supervised graph co-training, achieving the state-of-the-art performance.

iMap: Incremental Node Mapping between Large Graphs Using GNN

Yikuan Xia
Jun Gao
Bin Cui

Node mapping between large graphs (or network alignment) plays a key preprocessing role in joint-graph data mining applications like social link prediction, cross-platform recommendation, etc. Most existing approaches attempt to perform alignment at the granularity of entire graphs, while handling the whole graphs may lower the scalability and the noisy nodes/edges in the graphs may impact the effectiveness. From the observation that potential node mappings always appear near known corresponding nodes, we propose iMAP, a novel sub-graph expansion based alignment framework to incrementally construct meaningful sub-graphs and perform alignment on each sub-graph pair iteratively, which reduces the unnecessary computation cost in the original raw networks and improves effectiveness via excluding possible noises. Specifically, iMap builds a candidate sub-graph around known matched nodes initially. In each following iteration, iMap trains an alignment model to infer the node mapping relationship between sub-graphs, from which the sub-graphs are further extended and refined. In addition, we design a Graph Neural Network(GNN) based model named MAP on each sub-graph pair in the iMap framework. MAP utilizes trainable Multi-layer Perception (MLP) prediction heads for similarity computation and employs a mixed loss function consisting of the ranking loss for contrastive learning and the cross-entropy loss for classification. Extensive experiments conducted on real social networks demonstrate superior efficiency and effectiveness (above 12% improvement) of our proposed method compared to several state-of-the-art methods.

Neural PathSim for Inductive Similarity Search in Heterogeneous Information Networks

Wenyi Xiao
Huan Zhao
Vincent W. Zheng
Yangqiu Song

PathSim is a widely used meta-path-based similarity in heterogeneous information networks. Numerous applications rely on the computation of PathSim, including similarity search and clustering. Computing PathSim scores on large graphs is computationally challenging due to its high time and storage complexity. In this paper, we propose to transform the problem of approximating the ground truth PathSim scores into a learning problem. We design an encoder-decoder based framework, NeuPath, where the algorithmic structure of PathSim is considered. Specifically, the encoder module identifies Top T optimized path instances, which can approximate the ground truth PathSim, and maps each path instance to an embedding vector. The decoder transforms each embedding vector into a scalar respectively, which identifies the similarity score. We perform extensive experiments on two real-world datasets in different domains, ACM and IMDB. Our results demonstrate that NeuPath performs better than state-of-the-art baselines in the PathSim approximation task and similarity search task.

WebKE: Knowledge Extraction from Semi-structured Web with Pre-trained Markup Language Model

Chenhao Xie
Wenhao Huang
Jiaqing Liang
Chengsong Huang
Yanghua Xiao

The World Wide Web contains rich up-to-date information for knowledge graph construction. However, most current relation extraction techniques are designed for free text and thus do not handle well semi-structured web content. In this paper, we propose a novel multi-phase machine reading framework, called WebKE. It processes the web content on different granularity by first detecting areas of interest at DOM tree node level and then extracting relational triples for each area. We also propose HTMLBERT as an encoder the web content. It is a pre-trained markup language model that fully leverages the visual layout information and DOM-tree structure, without the need of hand engineered features. Experimental results show that the proposed approach outperforms state-of- the-art methods by a considerable gain. The source code is available at https://github.com/redreamality/webke.

Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images

Zhongwei Xie
Ling Liu
Lin Li
Luo Zhong

This paper presents a three-tier modality alignment approach to learning text-image joint embedding, coined as JEMA, for cross-modal retrieval of cooking recipes and food images. The first tier improves recipe text embedding by optimizing the LSTM networks with term extraction and ranking enhanced sequence patterns, and optimizes the image embedding by combining the ResNeXt-101 image encoder with the category embedding using wideResNet-50 with word2vec. The second tier modality alignment optimizes the textual-visual joint embedding loss function using a double batch-hard triplet loss with soft-margin optimization. The third modality alignment incorporates two types of cross-modality alignments as the auxiliary loss regularizations to further reduce the alignment errors in the joint learning of the two modality-specific embedding functions. The category-based cross-modal alignment aims to align the image category with the recipe category as a loss regularization to the joint embedding. The cross-modal discriminator-based alignment aims to add the visual-textual embedding distribution alignment to further regularize the joint embedding loss. Extensive experiments with the one-million recipes benchmark dataset Recipe1M demonstrate that the proposed JEMA approach outperforms the state-of-the-art cross-modal embedding methods for both image-to-recipe and recipe-to-image retrievals.

Counterfactual Review-based Recommendation

Kun Xiong
Wenwen Ye
Xu Chen
Yongfeng Zhang
Wayne Xin Zhao
Binbin Hu
Zhiqiang Zhang
Jun Zhou

Incorporating review information into the recommender system has been demonstrated to be an effective method for boosting the recommendation performance. Previous research mainly focus on designing advanced architectures to better profile the users and items. However, the review information in realities can be highly sparse and imbalanced, which poses great challenges for effective user/item representations and satisfied performance enhancement. To alleviate this problem, in this paper, we propose to improve review-based recommendation by counterfactually augmenting the training samples. We focus on a common setting --- feature-aware recommendation, and the main building block of our idea lies in the counterfactual question: "what would be the user's decision if her feature-level preference had been different?''. When augmenting the training samples, we actively change the user preference (also called intervention), and predict the user feedback on the items based on pre-trained recommender models. Instead of changing the user preference in a random manner, we design a learning-based method to discover the samples which are more effective for model optimization. In order to improve the sample qualities, we propose two strategies --- constrained feature perturbation and frequency-based sampling --- to equip our model. Since the sample generation model can be not perfect, we theoretically analyze the relation between the model prediction error and the number of generated samples. As a byproduct, our framework can explain the user pair-wise preference, which is complementary to the traditional point-wise explanations. Extensive experiments demonstrate that our model can significantly improve the performance of the state-of-the-art methods.

Speedup Robust Graph Structure Learning with Low-Rank Information

Hui Xu
Liyao Xiang
Jiahao Yu
Anqi Cao
Xinbing Wang

Recent studies have shown that graph neural networks (GNNs) are vulnerable to unnoticeable adversarial perturbations, which largely confines their deployment in many safety-critical domains. Robust graph structure learning has been proposed to improve the GNN performance in the face of adversarial attacks. In particular, the low-rank methods are utilized to purify the perturbed graphs. However, these methods are mostly computationally expensive with O(n3) time complexity and O(n2) space complexity. We propose LRGNN, a fast and robust graph structure learning framework, which exploits the low-rank property as prior knowledge to speed up optimization. To eliminate adversarial perturbation, LRGNN decouples the adjacency matrix into a low-rank component and a sparse one, and learns by minimizing the rank of the first part while suppressing the second part. Its sparse variant is formed to reduce the memory footprint further. Experimental results on various attack settings have shown LRGNN acquires comparable robustness with the state-of-the-art much more efficiently, boasting a significant advantage on large-scale graphs.

Expanding Relationship for Cross Domain Recommendation

Kun Xu
Yuanzhen Xie
Liang Chen
Zibin Zheng

Cross-domain recommendation technique is a promising way to alleviate data sparsity issues by transferring knowledge from an auxiliary domain to a target domain. However, most existing works focus on utilizing the same users among different domains, while ignoring domain-specific users which forms the majority in real-world circumstances. In this paper, we propose a novel cross-domain learning approach--Relation Expansion based Cross-Domain Recommendation (ReCDR) to improve recommendation accuracies on small-overlapped domains. ReCDR first models the interactions in each domain as a local graph. It then forms a shared network by expanding out relationships using pre-trained node similarities. On the enhanced graph, ReCDR adopts a hierarchical attention mechanism. The output embedding will finally be combined with the local feature to balance the result for dual-target task. The proposed model is thoroughly evaluated on three real-world datasets. Experiments demonstrate superior performance compared to state-of-the-art methods.

Topic-aware Heterogeneous Graph Neural Network for Link Prediction

Siyong Xu
Cheng Yang
Chuan Shi
Yuan Fang
Yuxin Guo
Tianchi Yang
Luhao Zhang
Maodi Hu

Heterogeneous graphs (HGs), consisting of multiple types of nodes and links, can characterize a variety of real-world complex systems. Recently, heterogeneous graph neural networks (HGNNs), as a powerful graph embedding method to aggregate heterogeneous structure and attribute information, has earned a lot of attention. Despite the ability of HGNNs in capturing rich semantics which reveal different aspects of nodes, they still stay at a coarse-grained level which simply exploits structural characteristics. In fact, rich unstructured text content of nodes also carries latent but more fine-grained semantics arising from multi-facet topic-aware factors, which fundamentally manifest why nodes of different types would connect and form a specific heterogeneous structure. However, little effort has been devoted to factorizing them.

In this paper, we propose a Topic-aware Heterogeneous Graph Neural Network, named THGNN, to hierarchically mine topic-aware semantics for learning multi-facet node representations for link prediction in HGs. Specifically, our model mainly applies an alternating two-step aggregation mechanism including intra-metapath decomposition and inter-metapath mergence, which can distinctively aggregate rich heterogeneous information according to the inferential topic-aware factors and preserve hierarchical semantics. Furthermore, a topic prior guidance module is also designed to keep the quality of multi-facet topic-aware embeddings relying on the global knowledge from unstructured text content in HGs. It helps to simultaneously improve both performance and interpretability. Experimental results on three real-world HGs demonstrate that our proposed model can effectively outperform the state-of-the-art methods in the link prediction task, and show the potential interpretability of learnt multi-facet topic-aware representations.

PATROL: A Velocity Control Framework for Autonomous Vehicle via Spatial-Temporal Reinforcement Learning

Zhi Xu
Shuncheng Liu
Ziniu Wu
Xu Chen
Kai Zeng
Kai Zheng
Han Su

The largest portion of urban congestion is caused by 'phantom' traffic jams, causing significant delay travel time, fuel waste, and air pollution. It frequently occurs in high-density traffics without any obvious signs of accidents or roadworks. The root cause of 'phantom' traffic jams in one-lane traffics is the sudden change in velocity of some vehicles (i.e. harsh driving behavior (HDB)), which may generate a chain reaction with accumulated impact throughout the vehicles along the lane. This paper makes the first attempt to address this notorious problem in a one-lane traffic environment through velocity control of autonomous vehicles. Specifically, we propose a velocity control framework, called PATROL (sPAtial-temporal ReinfOrcement Learning). First, we design a spatial-temporal graph inside the reinforcement learning model to process and extract the information (e.g. velocity and distance difference) of multiple vehicles ahead across several historical time steps in the interactive environment. Then, we propose an attention mechanism to characterize the vehicle interactions and an LSTM structure to understand the vehicles' driving patterns through time. At last, we modify the reward function used in previous velocity control works to enable the autonomous driving agent to predict the HDB of preceding vehicles and smoothly adjust its velocity, which could alleviate the chain reaction caused by HDB. We conduct extensive experiments to demonstrate the effectiveness and superiority of PATROL in alleviating the 'phantom' traffic jam in simulation environments. Further, on the real-world velocity control dataset, our method significantly outperforms the existing methods in terms of driving safety, comfortability, and efficiency.

Node2Grids: A Cost-Efficient Uncoupled Training Framework for Large-Scale Graph Learning

Dalong Yang
Chuan Chen
Youhao Zheng
Zibin Zheng
Shih-wei Liao

Graph Convolutional Network (GCN) has been widely used in graph learning tasks. However, GCN-based models (GCNs) are inherently coupled training frameworks repetitively conducting the recursive neighborhood aggregation, which leads to high computational and memory overheads when processing large-scale graphs. To tackle these issues, we present Node2Grids, a cost-efficient uncoupled training framework that leverages the independent mapped data for obtaining the embedding. Instead of directly processing the coupled nodes as GCNs, Node2Grids supports a more efficacious method in practice, mapping the coupled graph data into the independent grid-like data which can be fed into the uncoupled models as Convolutional Neural Network (CNN). This simple but valid strategy significantly saves memory and computational resources while achieving comparable results with the leading GCN-based models. Specifically, in order to support a general and convenient mapping approach, Node2Grids selects the most influential neighborhood with central node fusion information to construct the grid-like data. To further improve the downstream tasks' efficiency, a simple CNN-based neural network is employed to capture the significant information from the mapped grid-like data. Moreover, the grid-level attention mechanism is implemented, which enables implicitly specifying the different weights for the extracted grids of CNN. In addition to the typical transductive and inductive learning tasks, we also verify our framework on million-scale graphs to demonstrate the superiority of cost performance against the state-of-the-art GCN-based approaches. The codes are available on the GitHub link.

Multi-task Learning for Bias-Free Joint CTR Prediction and Market Price Modeling in Online Advertising

Haizhi Yang
Tengyun Wang
Xiaoli Tang
Qianyu Li
Yueyue Shi
Siyu Jiang
Han Yu
Hengjie Song

The rapid rise of real-time bidding-based online advertising has brought significant economic benefits and attracted extensive research attention. From the perspective of an advertiser, it is crucial to perform accurate utility estimation and cost estimation for each individual auction in order to achieve cost-effective advertising. These problems are known as the click through rate (CTR) prediction task and the market price modeling task, respectively. However, existing approaches treat CTR prediction and market price modeling as two independent tasks to be optimized without regard to each other, thus resulting in suboptimal performance. Moreover, they do not make full use of unlabeled data from the losing bids during estimations, which makes them suffer from the sample selection bias issue. To address these limitations, we propose Multi-task Advertising Estimator (MTAE), an end-to-end joint optimization framework which performs both CTR prediction and market price modeling simultaneously. Through multi-task learning, both estimation tasks can take advantage of knowledge transfer to achieve improved feature representation and generalization abilities. In addition, we leverage the abundant bid price signals in the full-volume bid request data and introduce an auxiliary task of predicting the winning probability into the framework for unbiased learning. Through extensive experiments on two large-scale real-world public datasets, we demonstrate that our proposed approach has achieved significant improvements over the state-of-the-art models under various performance metrics.

Cycle or Minkowski: Which is More Appropriate for Knowledge Graph Embedding?

Han Yang
Leilei Zhang
Bingning Wang
Ting Yao
Junfei Liu

Knowledge graph (KG) embedding aims to encode entities and relations into low-dimensional vector spaces, in turn, can support various machine learning models on KG related tasks with good performance. However, existing methods for knowledge graph embedding fail to consider the influence of the embedding space, which makes them still unsatisfactory in practical applications. In this study, we try to improve the expressiveness of the embedding space from the perspective of the metric. Specifically, we first point out the implications of Minkowski metric used in KG embedding and then make a quantitative analysis. To solve the limitations, we introduce a new metric, named Cycle metric, based on the oscillation property of the periodic function. Furthermore, we find that the function period has a significant influence on the expressiveness of the embedding space. Given a fully trained model, the smaller the period, the better the expressive ability. Finally, to validate the findings, we propose a new model, named CyclE by combining Cycle Metric and the popular KG embeddings models. Comprehensive experimental results show that Cycle is more appropriate than Minkowski for KG embedding.

Knowledge Graph Representation Learning as Groupoid: Unifying TransE, RotatE, QuatE, ComplEx

Han Yang
Junfei Liu

Knowledge graph (KG) representation learning which aims to encode entities and relations into low-dimensional spaces, has been widely used in KG completion and link prediction. Although existing KG representation learning models have shown promising performance, the theoretical mechanism behind existing models is much less well-understood. It is challenging to accurately portray the internal connections between models and build a competitive model systematically. To overcome this problem, a unified KG representation learning framework, called GrpKG, is proposed in this paper to model the KG representation learning from a generic groupoid perspective. We discover that many existing models are essentially the same in the sense of groupoid isomorphism and further provide transformation methods between different models. Moreover, we explore the applications of GrpKG in the model classification as well as other processes. The experiments on several benchmark data sets validate the effectiveness and superiority of our framework by comparing two proposed models (GrpQ8 and GrpM2) with the state-of-the-art models.

CIExplore: Curiosity and Influence-based Exploration in Multi-Agent Cooperative Scenarios with Sparse Rewards

Huanhuan Yang
Dianxi Shi
Chenran Zhao
Guojun Xie
Shaowu Yang

Learning in a sparse-reward setting is a well-known challenge in RL (Reinforcement Learning). In the single-agent domain, this challenge can be addressed by introducing exploration bonuses driven by intrinsic motivation to encourage agents to visit unseen states. However, naively applying these methods in MARL (Multi-Agent Reinforcement Learning) cooperative settings with sparse rewards results in some inevitable problems: misunderstanding environmental knowledge and lack of collaboration among agents, etc. Based on this, in this paper, we propose the Curiosity and Influence-based Explore (CIExplore) method, which includes a new form of intrinsic reward and an internal counterfactual advantage function. Concretely, the intrinsic reward is a combination of joint curiosity reward and influence reward. The former is the variance of outputs across an ensemble of prediction models that take joint observations and actions of all agents as inputs to predict the next time's joint observations. And the latter quantifies the influence of one agent's behavior on other agents' state-value functions. Given that the joint curiosity reward is shared by all agents, we compute an internal counterfactual advantage function to address this intrinsic reward assignment problem. We demonstrate the efficacy of CIExplore in the multi-agent grid-world environments and show that it is compatible with both on-policy and off-policy MARL algorithms and be scalable to complex settings where agents' number or environment randomness increases.

Entity and Relation Matching Consensus for Entity Alignment

Jinzhu Yang
Ding Wang
Wei Zhou
Wanhui Qian
Xin Wang
Jizhong Han
Songlin Hu

Entity alignment aims to match synonymous entities across different knowledge graphs, which is a fundamental task for knowledge integration. Recently, researchers have devoted to leveraging rich information within relations to enhance entity alignment. They explicitly incorporate relations in entity representation and alignment, demonstrating remarkable results. However, affected by the semantic assumptions from early works, these works represent a relation by combining all the entities it connects, ignoring the semantic independence between entity and relation. Moreover, since these works perform alignment by comparing embedding similarity, they fail to consider a graph level alignment and tend to find local false correspondences.

In this paper, we propose Entity and Relation Matching Consensus (ERMC), a two-stage matching schema based on graph matching consensus that jointly models and aligns entities and relations and retains their semantic independence at the same time. In the first stage, we design a bidirectional relation-aware graph convolutional network to jointly learn entity and relation embeddings based on the triadic graph by a novel message passing mechanism. Then, we jointly align the entities and relations by computing a graph-level matching consensus. In the second stage, we introduce a refinement strategy to detect and correct false alignments in the first stage. Experimental results on three real-world multilingual datasets demonstrate that ERMC outperforms some state-of-the-art models on both entity alignment and relation alignment tasks.

Top-N Recommendation with Counterfactual User Preference Simulation

Mengyue Yang
Quanyu Dai
Zhenhua Dong
Xu Chen
Xiuqiang He
Jun Wang

Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. Traditional models usually motivate themselves by designing complex or tailored architectures based on different assumptions. However, the training data of recommender system can be extremely sparse and imbalanced, which poses great challenges for boosting the recommendation performance. To alleviate this problem, in this paper, we propose to reformulate the recommendation task within the causal inference framework, which enables us to counterfactually simulate user ranking-based preferences to handle the data scarce problem. The core of our model lies in the counterfactual question: "what would be the user's decision if the recommended items had been different?''. To answer this question, we firstly formulate the recommendation process with a series of structural equation models (SEMs), whose parameters are optimized based on the observed data. Then, we actively indicate many recommendation lists (called intervention in the causal inference terminology) which are not recorded in the dataset, and simulate user feedback according to the learned SEMs for generating new training samples. Instead of randomly intervening on the recommendation list, we design a learning-based method to discover more informative training samples. Considering that the learned SEMs can be not perfect, we, at last, theoretically analyze the relation between the number of generated samples and the model prediction error, based on which a heuristic method is designed to control the negative effect brought by the prediction error. Extensive experiments are conducted based on both synthetic and real-world datasets to demonstrate the effectiveness of our framework.

Adversarial Kernel Sampling on Class-imbalanced Data Streams

Peng Yang
Ping Li

This paper investigates online active learning in the setting of class-imbalanced data streams, where labels are allowed to be queried of with limited budgets. In this setup, conventional learning would be biased towards majority classes and consequently harm the performance. To address this issue, imbalance learning technique adopts both asymmetric losses and asymmetric queries to tackle the imbalance. Although this approach is effective, it may not guarantee the performance in an adversarial setting where the actual labels are unknown, and they may be chosen by the adversary

To learn a promising hypothesis in class-imbalanced and adversarial environment, we propose an asymmetric min-max optimization framework for online classification. The derived algorithm can track the imbalance and bound the choices of an adversary simultaneously. Despite the promising result, this algorithm assumes that the label is provided for every input, while label is scare and labeling is expensive in real-world application. To this end, we design a confidence-based sampling strategy to query the informative labels within a budget. We theoretically analyze this algorithm in terms of mistake bound, and two asymmetric measures. Empirically, we evaluate the algorithms on multiple real-world imbalanced tasks. Promising results could be achieved on various application domains.

HASTE: A Distributed System for Hybrid and Adaptive Processing on Streaming Spatial-Textual Data

Zhong Yang
Bolong Zheng
Chengdong Tong
Lianggui Weng
Chenliang Li
Guohui Li

Streaming spatial-textual data that contains geographic and textual information, e.g., geo-tagged tweets, has an unprecedented increase in amount. As one of the basic operations, the continuous spatial-textual queries that retrieve real-time results continuously on large-scale spatial-textual streams call for means of efficient distributed processing. However, existing proposals either are spatialaware only, or superficially exploit textual information for pruning. We propose a distributed system, called HASTE, for hybrid and adaptive processing on streaming spatial-textual data. The novelty lies on three aspects: (1) We propose a novel method to reduce the workload beforehand by dividing objects and queries into mutually exclusive types; (2) We develop a novel load partitioning strategy and a novel cost model that consider both spatial and textual properties; (3) We design a multi-level load adjustment strategy that adaptively copes with different degrees of load imbalance. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.

USER: A Unified Information Search and Recommendation Model based on Integrated Behavior Sequence

Jing Yao
Zhicheng Dou
Ruobing Xie
Yanxiong Lu
Zhiping Wang
Ji-Rong Wen

Search and recommendation are the two most common approaches used by people to obtain information. They share the same goal -- satisfying the user's information need at the right time. There are already a lot of Internet platforms and Apps providing both search and recommendation services, showing us the demand and opportunity to simultaneously handle both tasks. However, most platforms consider these two tasks independently -- they tend to train separate search model and recommendation model, without exploiting the relatedness and dependency between them. In this paper, we argue that jointly modeling these two tasks will benefit both of them and finally improve overall user satisfaction. We investigate the interactions between these two tasks in the specific information content service domain. We propose first integrating the user's behaviors in search and recommendation into a heterogeneous behavior sequence, then utilizing a joint model for handling both tasks based on the unified sequence. More specifically, we design the Unified Information SEarch and Recommendation model (USER), which mines user interests from the integrated sequence and accomplish the two tasks in a unified way. Experiments on a dataset from a real-world information content service platform verify that our model outperforms separate search and recommendation baselines.

An Interactive Neural Network Approach to Keyphrase Extraction in Talent Recruitment

Kaichun Yao
Chuan Qin
Hengshu Zhu
Chao Ma
Jingshuai Zhang
Yi Du
Hui Xiong

As a fundamental task of document content analysis, keyphrase extraction (KE) aims at predicting a set of lexical units that conveys the core information of the document. In this paper, we study the problem of KE in the talent recruitment. This problem is critical for the development of a variety of intelligent recruitment services, such as person-job fit, market trend analysis and course recommendation. However, unlike traditional textual data, the texts from the recruitment domain, such as resume and job postings, often have unique characteristics of abbreviation and succinctness, resulting in massive keyphrases consisting of inconsecutive words that are hard to be fully captured by existing KE methods. To this end, we propose an interactive neural network approach, INKE, for facilitating KE in the talent recruitment. To be specific, we first introduce a novel keyphrase indicator that captures the explicit hint information for each keyphrase. Then, we design a dynamically-initialized decoder which can generate keyphrases in an interactive manner. Moreover, we propose a hierarchical reinforcement learning algorithm to enhance the interaction between the hint information capture and keyphrase generation. Finally, extensive experiments on real-world data clearly validate the effectiveness and interpretability of INKE compared with state-of-the-art baselines.

AMPPERE: A Universal Abstract Machine for Privacy-Preserving Entity Resolution Evaluation

Yixiang Yao
Tanmay Ghai
Srivatsan Ravi
Pedro Szekely

Entity resolution is the task of identifying records in different datasets that refer to the same entity in the real world. In sensitive domains (e.g. financial accounts, hospital health records), entity resolution must meet privacy requirements to avoid revealing sensitive information such as personal identifiable information to untrusted parties. Existing solutions are either too algorithmically-specific or come with an implicit trade-off between accuracy of the computation, privacy, and run-time efficiency. We propose AMMPERE, an abstract computation model for performing universal privacy-preserving entity resolution. AMMPERE offers abstractions that encapsulate multiple algorithmic and platform-agnostic approaches using variants of Jaccard similarity to perform private data matching and entity resolution. Specifically, we show that two parties can perform entity resolution over their data, without leaking sensitive information. We rigorously compare and analyze the feasibility, performance overhead and privacy-preserving properties of these approaches on the Sharemind multi-party computation (MPC) platform as well as on PALISADE, a lattice-based homomorphic encryption library. The AMMPERE system demonstrates the efficacy of privacy-preserving entity resolution for real-world data while providing a precise characterization of the induced cost of preventing information leakage.

Task Allocation with Geographic Partition in Spatial Crowdsourcing

Guanyu Ye
Yan Zhao
Xuanhao Chen
Kai Zheng

Recent years have witnessed a revolution in Spatial Crowdsourcing (SC), in which people with mobile connectivity can perform spatio-temporal tasks that involve travel to specified locations. In this paper, we identify and study in depth a new multi-center-based task allocation problem in the context of SC, where multiple allocation centers exist. In particular, we aim to maximize the total number of the allocated tasks while minimizing the average allocated task number difference. To solve the problem, we propose a two-phase framework, called Task Allocation with Geographic Partition, consisting of a geographic partition phase and a task allocation phase. The first phase is to divide the whole study area based on the allocation centers by using both a basic Voronoi diagram-based algorithm and an adaptive weighted Voronoi diagram-based algorithm. In the allocation phase, we utilize a Reinforcement Learning method to achieve the task allocation, where a graph neural network with the attention mechanism is used to learn the embeddings of allocation centers, delivery points and workers. Extensive experiments give insight into the effectiveness and efficiency of the proposed solutions.

MedRetriever: Target-Driven Interpretable Health Risk Prediction via Retrieving Unstructured Medical Text

Muchao Ye
Suhan Cui
Yaqing Wang
Junyu Luo
Cao Xiao
Fenglong Ma

The broad adoption of electronic health record (EHR) systems and the advances of deep learning technology have motivated the development of health risk prediction models, which mainly depend on the expressiveness and temporal modeling capacity of deep neural networks (DNNs) to improve prediction performance. Some further augment the prediction by using external knowledge, however, a great deal of EHR information inevitably loses during the knowledge mapping. In addition, prediction made by existing models usually lacks reliable interpretation, which undermines their reliability in guiding clinical decision-making. To solve these challenges, we propose MedRetriever, an effective and flexible framework that leverages unstructured medical text collected from authoritative websites to augment health risk prediction as well as to provide understandable interpretation. Besides, MedRetriever explicitly takes the target disease documents into consideration, which provide key guidance for the model to learn in a target-driven direction, i.e., from the target disease to the input EHR. To specify, MedRetriever can flexibly choose its backbone from major predictive models to learn the EHR embedding for each visit. After that, the EHR embedding and features of target disease documents are aggregated into a query by self-attention to retrieve highly relevant text segments from the medical text pool, which is stored in the dynamically updated text memory. Finally, the comprehensive EHR embedding and the text memory are used for prediction and interpretation. We evaluate MedRetriever against nine state-of-the-art approaches across three real-world EHR datasets, which consistently achieves the best performance in AUC and recall metrics and outperforms the best baseline by at least 4.8% in recall on three test datasets. Furthermore, we conduct case studies to show the easy-to-understand interpretation by MedRetriever.

Robust Dynamic Clustering for Temporal Networks

Jingyi You
Chenlong Hu
Hidetaka Kamigaito
Kotaro Funakoshi
Manabu Okumura

Dynamic community detection (or graph clustering) in temporal networks has attracted much attention because it is promising for revealing the underlying mechanism of complex real-world systems. Current methods are criticized for the independence of graph representation learning and graph clustering, considerable noise during temporal information smoothing, and high time complexity. We propose a R obust T emporal S moothing C lustering method (RTSC), which involves joint graph representation learning and graph clustering, to solve these problems. RTSC can be formulated as a constrained multi-objective optimization problem. Specifically, three-order successive snapshots are first projected into the same subspace via graph embedding. We then use the embedding matrices to learn a common low-rank block-diagonal matrix that contains current clustering information and specific noise matrices with a sparse constraint to remove noise at each time step. To efficiently solve the challenging optimization problem, we also propose an optimization procedure based on the augmented Lagrangian multiplier (ALM) scheme. Experimental results on six artificial datasets and four real-world dynamic network datasets indicate that RTSC performs better than six state-of-the-art algorithms for dynamic clustering in temporal networks.

Learning to Learn the Future: Modeling Concept Drifts in Time Series Prediction

Xiaoyu You
Mi Zhang
Daizong Ding
Fuli Feng
Yuanmin Huang

Time series prediction has great practical value in a wide range of real-world scenarios such as stock market and retail. Existing methods typically face model aging issue caused by the concept drift: the model performance degrades along time. Undoubtedly, the model aging issue can cause serious damage in practical usage, e.g. wrong predictions in stock price may cause catastrophic losses in the financial domain. Therefore, it is essential to address the model aging issue so as to promise the predictor's performance in the future. In this paper, we propose a novel solution to address the issue. First, we uncover the theoretical connection between the complex concept drift in time series data and the gradients of deep neural networks. Based on this, we propose a novel framework called learning to learn the future. Specifically, we develop a learning method to model the concept drift during the inference stage, which can help the model generalize well in the future. Furthermore, to mitigate the impact of noises and randomness of time series data, we propose to enhance the framework by leveraging similar series in concept drift modeling. To the best of our knowledge, our approach is the first general solution to model aging issue in time series prediction. We conduct extensive experiments on three real-world datasets, which validate the effectiveness of our framework. For instance, it achieves a relative improvement of 33% in stock price prediction over the state-of-the-art methods.

Assorted Attention Network for Cross-Lingual Language-to-Vision Retrieval

Tan Yu
Yi Yang
Hongliang Fei
Yi Li
Xiaodong Chen
Ping Li

In this paper, we tackle the cross-lingual language-to-vision (CLLV) retrieval task. In the CLLV retrieval task, given the text query in one language, it seeks to retrieve the relevant images/videos from the database based on visual content in images/videos and their captions in another language. As the CLLV retrieval bridges the modal gap and the language gap, it makes many international cross-modal applications feasible. To tackle the CLLV retrieval, in this paper, we propose an assorted attention network (A2N) to synchronously overcome the language gap, bridge the modal gap and fuse features of two modals in an elegant and effective manner. It represents each text query as a set of word features and represents each image/video as a set of its caption's word features in another language and a set of its local visual features. In this case, the relevance between the text query and the image/video is obtained by the matching between the set of query's word features and two sets of image/video features. To enhance the effectiveness of the matching, A2N merges the query's word features and the image/video's visual and word features into an assorted set and further conducts the self-attention operation on items of the assorted set. On one hand, benefited from the attentions between the query's word features and the video/image's visual features, some important word features or visual features of the image/video can be emphasized. On the other hand, benefited from the attentions between the video/image's visual features and its caption word features, the image/video's visual content and the text information can be fused in a more effective manner. Systematic experiments conducted on four datasets demonstrate the effectiveness of the proposed A2N in the CLLV retrieval task.

Multiple Exemplars Learning for Fast Image Retrieval

Tan Yu
Ping Li

The past decade, we have witnessed rapid progress in compact representation learning for fast image retrieval. In the unsupervised scenario, product quantization (PQ) is one of the promising methods to generate compact image representation for fast and accurate retrieval. Inspired by the great success of deep neural network (DNN) achieved in computer vision, many works attempted to integrate PQ in DNN for end-to-end supervised training. Nevertheless, in existing deep PQ methods, data samples from different classes share the same codebook. Thus, they might be entangled with each other in the feature space. Meanwhile, existing deep PQ methods relying on triplet or pairwise loss require a huge number of training triplets or pairs, which are expensive in computation and scale poorly.

In this work, we propose a multiple exemplars learning (MEL) approach to improve retrieval accuracy and training efficiency. For each class, we learn a class-specific codebook consisting of multiple exemplars to partition the class-specific feature space. Since the feature space as well as the codebook is class-specific, samples of different classes are disentangled in the feature space. We incorporate the proposed MEL in a convolutional neural network, supporting end-to-end training. Moreover, we propose MEL loss which trains the network in a considerably more efficient manner than existing deep product quantization approaches based on pairwise or triplet loss. Systematic experiments conducted on two public benchmarks demonstrate the effectiveness and efficiency of our method.

Semi-Supervised and Self-Supervised Classification with Multi-View Graph Neural Networks

Jinliang Yuan
Hualei Yu
Meng Cao
Ming Xu
Junyuan Xie
Chongjun Wang

Graph Neural Networks (GNNs) have achieved significant success in handling graph-structured data, such as knowledge graphs, citation networks, molecular structures, etc. However, most of them are usually shallow structures because of the over-smoothing problem that the representations of nodes are indistinguishable when stacking many layers. Several recent studies have tried to design deep GNNs for powerful expression ability by enlarging the receptive fields to aggregate information from high-order neighbors. But deep models may give rise to overfitting problem. In this paper, we propose a novel insight to aggregate more useful information based on multi-view which does not require deep structures. Specifically, we first design two complementary views to describe global topology and feature similarity of nodes. Then we devise an attention strategy to fuse node representations, named M ulti-V iew G raph C onvolutional N etowrk(MV-GCN). Further, we introduce a self-supervised technique to learn node representations by contrastive learning on different views, which can learn distinctive node embeddings from a large number of unlabeled data, named M ulti-V iew C ontrastive G raph C onvolutional Network(MV-CGC). Finally, we conduct extensive experiments on six public datasets for node classification, which prove the superiority of two proposed models compared with state-of-the-art methods.

Reinforced Active Entity Alignment

Weixin Zeng
Xiang Zhao
Jiuyang Tang
Changjun Fan

Entity alignment (EA) is the task of detecting equivalent entities from different knowledge graphs (KGs). Although this problem has been intensively studied during the last few years, the majority of the state-of-the-arts heavily rely on the labeled data, which are difficult to obtain in practice. Therefore, it calls for the study of EA with scarce supervision. To resolve this issue, we put forward a reinforced active entity alignment framework to select the entities to be manually labeled with the aim of enhancing alignment performance with minimal labeling efforts. Under this framework, we further devise an unsupervised contrastive loss to contrast different views of entity representations and augment the limited supervision signals by exploiting the vast unlabeled data. We empirically evaluate our proposal on eight popular KG pairs, and the results demonstrate that our proposed model and its components consistently boost the alignment performance under scarce supervision.

Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance

Jingtao Zhan
Jiaxin Mao
Yiqun Liu
Jiafeng Guo
Min Zhang
Shaoping Ma

Recently, Information Retrieval community has witnessed fast-paced advances in Dense Retrieval (DR), which performs first-stage retrieval with embedding-based search. Despite the impressive ranking performance, previous studies usually adopt brute-force search to acquire candidates, which is prohibitive in practical Web search scenarios due to its tremendous memory usage and time cost. To overcome these problems, vector compression methods have been adopted in many practical embedding-based retrieval applications. One of the most popular methods is Product Quantization (PQ). However, although existing vector compression methods including PQ can help improve the efficiency of DR, they incur severely decayed retrieval performance due to the separation between encoding and compression. To tackle this problem, we present JPQ, which stands for Joint optimization of query encoding and Product Quantization. It trains the query encoder and PQ index jointly in an end-to-end manner based on three optimization strategies, namely ranking-oriented loss, PQ centroid optimization, and end-to-end negative sampling. We evaluate JPQ on two publicly available retrieval benchmarks. Experimental results show that JPQ significantly outperforms popular vector compression methods. Compared with previous DR models that use brute-force search, JPQ almost matches the best retrieval performance with 30x compression on index size. The compressed index further brings 10x speedup on CPU and 2x speedup on GPU in query latency.

Fraud Detection under Multi-Sourced Extremely Noisy Annotations

Chuang Zhang
Qizhou Wang
Tengfei Liu
Xun Lu
Jin Hong
Bo Han
Chen Gong

Fraud detection in e-commerce, which is critical to protecting the capital safety of users and financial corporations, aims at determining whether an online transaction or other activity is fraudulent or not. This problem has been previously addressed by various fully supervised learning methods. However, the true labels for training a supervised fraud detection model are difficult to collect in many real-world cases. To circumvent this issue, a series of automatic annotation techniques are employed instead in generating multiple noisy annotations for each unknown activity. In order to utilize these low-quality, multi-sourced annotations in achieving reliable detection results, we propose an iterative two-staged fraud detection framework with multi-sourced extremely noisy annotations. In label aggregation stage, multi-sourced labels are integrated by voting with adaptive weights; and in label correction stage, the correctness of the aggregated labels are properly estimated with the help of a handful of exactly labeled data and the results are used to train a robust fraud detector. These two stages benefit from each other, and the iterative executions lead to steadily improved detection results. Therefore, our method is termed "Label Aggregation and Correction" (LAC). Experimentally, we collect millions of transaction records from Alipay in two different fraud detection scenarios, i.e., credit card theft and promotion abuse fraud. When compared with state-of-the-art counterparts, our method can achieve at least 0.019 and 0.117 improvements in terms of average AUC on the two collected datasets, which clearly demonstrate the effectiveness.

Topic Modeling for Multi-Aspect Listwise Comparisons

Delvin Ce Zhang
Hady W. Lauw

As a well-established probabilistic method, topic models seek to uncover latent semantics from plain text. In addition to having textual content, we observe that documents are usually compared in listwise rankings based on their content. For instance, world-wide countries are compared in an international ranking in terms of electricity production based on their national reports. Such document comparisons constitute additional information that reveal documents' relative similarities. Incorporating them into topic modeling could yield comparative topics that help to differentiate and rank documents. Furthermore, based on different comparison criteria, the observed document comparisons usually cover multiple aspects, each expressing a distinct ranked list. For example, a country may be ranked higher in terms of electricity production, but fall behind others in terms of life expectancy or government budget. Each comparison criterion, or aspect, observes a distinct ranking. Considering such multiple aspects of comparisons based on different ranking criteria allows us to derive one set of topics that inform heterogeneous document similarities. We propose a generative topic model aimed at learning topics that are well aligned to multi-aspect listwise comparisons. Experiments on public datasets demonstrate the advantage of the proposed method in jointly modeling topics and ranked lists against baselines comprehensively.

Relation Prediction via Graph Neural Network in Heterogeneous Information Networks with Missing Type Information

Han Zhang
Yu Hao
Xin Cao
Yixiang Fang
Won-Yong Shin
Wei Wang

Relation prediction is a fundamental task in network analysis which aims to predict the relationship between two nodes. Thus, this differes from the traditional link prediction problem predicting whether a link exists between a pair of nodes, which can be viewed as a binary classification task. However, in the heterogeneous information network (HIN) which contains multiple types of nodes and multiple relations between nodes, the relation prediction task is more challenging. In addition, the HIN might have missing relation types on some edges and missing node types on some nodes, which makes the problem even harder.

In this work, we propose RPGNN, a novel relation prediction model based on the graph neural network (GNN) and multi-task learning to solve this problem. Existing GNN models for HIN representation learning usually focus on the node classification/clustering task. They require the type information of all edges and nodes and always learn a weight matrix for each type, thus requiring a large number of learning parameters on HINs with rich schema. In contrast, our model directly encodes and learns relations in HIN and avoids the requirement of type information during message passing in GNN. Hence, our model is more robust to the missing types for the relation prediction task on HINs. The experiments on real HINs show that our model can consistently achieve better performance than several state-of-the-art HIN representation learning methods.

DSDD: Domain-Specific Dataset Discovery on the Web

Haoxiang Zhang
Aécio Santos
Juliana Freire

With the push for transparency and open data, many datasets and data repositories are becoming available on the Web. This opens new opportunities for data-driven exploration, from empowering analysts to answer new questions and obtain insights to improving predictive models through data augmentation. But as datasets are spread over a plethora of Web sites, finding data that are relevant for a given task is difficult. In this paper, we take a first step towards the construction of domain-specific data lakes. We propose an end-to-end dataset discovery system, targeted at domain experts, which given a small set of keywords, automatically finds potentially relevant datasets on the Web. The system makes use of search engines to hop across Web sites, uses online learning to incrementally build a model to recognize sites that contain datasets, utilizes a set of discovery actions to broaden the search, and applies a multi-armed bandit based algorithm to balance the trade-offs of different discovery actions. We report the results of an extensive experimental evaluation over multiple domains, and demonstrate that our strategy is effective and outperforms state-of-the-art content discovery methods.

RxNet: Rx-refill Graph Neural Network for Overprescribing Detection

Jianfei Zhang
Ai-Te Kuo
Jianan Zhao
Qianlong Wen
Erin Winstanley
Chuxu Zhang
Yanfang Ye

Prescription (aka Rx) drugs can be easily overprescribed and lead to drug abuse or opioid overdose. Accordingly, a state-run prescription drug monitoring program (PDMP) in the United States has been developed to reduce Overprescribing. However, PDMP has limited capability in detecting patients' potential overprescribing behaviors, impairing its effectiveness in preventing drug abuse and overdose in patients. Despite a few machine-learning-based methods that have been proposed for detecting overprescribing, they usually ignore the patient prescribing behavior and their performances are not satisfying. In light of this, we propose a novel model RxNet for overprescribing detection in PDMP. RxNet builds a dynamic heterogeneous graph to model Rx refills that are essentially prescribing and dispensing (P&D) relationships among various Rx entries (e.g., patients) whose representations are encoded by graph neural network. In addition, to explore the dynamic Rx-refill behavior and medical condition variation of patients, an RxLSTM network is designed to update representations of patients. Based on the output of RxLSTM, a dosing-adaptive network is leveraged to extract and recalibrate dosing patterns and obtain the refined patient representations which are finally utilized for overprescribing detection. The extensive experimental results on a 1-year Ohio PDMP data demonstrate that RxNet consistently outperforms state-of-the-art methods in predicting patients at high risk of opioid overdose and drug abuse, with an average of 5.7% and 7.3% improvement on F1 score respectively.

Delve into the Performance Degradation of Differentiable Architecture Search

Jiuling Zhang
Zhiming Ding

Differentiable architecture search (DARTS) is widely considered to be easy to overfit the validation set which leads to performance degradation. We first employ a series of exploratory experiments to verify that neither high-strength architecture parameters regularization nor warmup training scheme can effectively solve this problem. Based on the insights from the experiments, we conjecture that the performance of DARTS does not depend on the well-trained supernet weights and argue that the architecture parameters should be trained by the gradients which are obtained in the early stage rather than the final stage of training. This argument is then verified by exchanging the learning rate schemes of weights and parameters. Experimental results show that the simple swap of the learning rates can effectively solve the degradation and achieve competitive performance. Further empirical evidence suggests that the degradation is not a simple problem of the validation set overfitting but exhibit some links between the degradation and the operation selection bias within bilevel optimization dynamics. We demonstrate the generalization of this bias and propose to utilize this bias to achieve an operation-magnitude-based selective stop.

Double-Scale Self-Supervised Hypergraph Learning for Group Recommendation

Junwei Zhang
Min Gao
Junliang Yu
Lei Guo
Jundong Li
Hongzhi Yin

With the prevalence of social media, there has recently been a proliferation of recommenders that shift their focus from individual modeling to group recommendation. Since the group preference is a mixture of various predilections from group members, the fundamental challenge of group recommendation is to model the correlations among members. Existing methods mostly adopt heuristic or attention-based preference aggregation strategies to synthesize group preferences. However, these models mainly focus on the pairwise connections of users and ignore the complex high-order interactions within and beyond groups. Besides, group recommendation suffers seriously from the problem of data sparsity due to severely sparse group-item interactions. In this paper, we propose a self-supervised hypergraph learning framework for group recommendation to achieve two goals: (1) capturing the intra- and inter-group interactions among users; (2) alleviating the data sparsity issue with the raw data itself. Technically, for (1), a hierarchical hypergraph convolutional network based on the user- and group-level hypergraphs is developed to model the complex tuplewise correlations among users within and beyond groups. For (2), we design a double-scale node dropout strategy to create self-supervision signals that can regularize user representations with different granularities against the sparsity issue. The experimental analysis on multiple benchmark datasets demonstrates the superiority of the proposed model and also elucidates the rationality of the hypergraph modeling and the double-scale self-supervision.

SNPR: A Serendipity-Oriented Next POI Recommendation Model

Mingwei Zhang
Yang Yang
Rizwan Abbas
Ke Deng
Jianxin Li
Bin Zhang

Next Point-of-Interest (POI) recommendation plays an important role in location-based services. The state-of-the-art methods utilize recurrent neural networks (RNNs) to model users' check-in sequences and have shown promising results. However, they tend to recommend POIs similar to those that the user has often visited. As a result, users become bored with obvious recommendations. To address this issue, we propose Serendipity-oriented Next POI Recommendation model (SNPR), a supervised multi-task learning problem, with objective to recommend unexpected and relevant POIs only. To this end, we define the quantitativeserendipity as a trade-off ofrelevance andunexpectedness in the context of next POI recommendation, and design a dedicated neural network with Transformer to capture complex interdependencies between POIs in user's check-in sequence. Extensive experimental results show that our model can improverelevance significantly while theunexpectedness outperforms the state-of-the-art serendipity-oriented recommendation methods.

Comprehensively Computing Link-based Similarities by Building A Random Surfer Graph

Mingxi Zhang
Xifeng Yan
Wei Wang

Link-based similarity computation arises in many real applications, including web search, clustering and recommender system. Lots of similarity measures are devoted recently, but there is one undesirable drawback, called ''path missing'' issue, i.e., the paths between objects are not fully considered for similarity computation. For example, SimRank considers only in-coming paths of equal length from a common ''center'' object, and a large portion of other paths are fully neglected. A comprehensive measure can be modeled by tallying all the possible paths between objects, but a large number of traverses would be required for these paths to fetch the similarities, which might increase the computational difficulty. In this paper, we propose a comprehensive similarity measure, namely RG-SimRank (Random surfer Graph-based SimRank), which resolves the "path missing'' issue with inheriting the philosophy of SimRank. We build a random surfer graph by allowing the surfer to stay at current object, go to other objects against in-links or along out-links. RG-SimRank adopts SimRank to compute similarities in random surfer graph instead of the original network, which has a same form of SimRank and hence inherits the optimization techniques on similarity computation. We prove that RG-SimRank considers all the possible paths of any direction and any length. And it provides a general solution to assess similarities, under which lots of existing similarity measures become its special cases. Other similarity measures besides SimRank can also be enhanced similarly using random surfer graph. Extensive experiments on real datasets demonstrate the performance of the proposed approach.

Multi-Factors Aware Dual-Attentional Knowledge Tracing

Moyu Zhang
Xinning Zhu
Chunhong Zhang
Yang Ji
Feng Pan
Changchuan Yin

With the increasing demands of personalized learning, knowledge tracing has become important which traces students' knowledge states based on their historical practices. Factor analysis methods mainly use two kinds of factors which are separately related to students and questions to model students' knowledge states. These methods use the total number of attempts of students to model students' learning progress and hardly highlight the impact of the most recent relevant practices. Besides, current factor analysis methods ignore rich information contained in questions. In this paper, we propose Multi-Factors Aware Dual-Attentional model (MF-DAKT) which enriches question representations and utilizes multiple factors to model students' learning progress based on a dual-attentional mechanism. More specifically, we propose a novel student-related factor which records the most recent attempts on relevant concepts of students to highlight the impact of recent exercises. To enrich questions representations, we use a pre-training method to incorporate two kinds of question information including questions' relation and difficulty level. We also add a regularization term about questions' difficulty level to restrict pre-trained question representations to fine-tuning during the process of predicting students' performance. Moreover, we apply a dual-attentional mechanism to differentiate contributions of factors and factor interactions to final prediction in different practice records. At last, we conduct experiments on several real-world datasets and results show that MF-DAKT can outperform existing knowledge tracing methods. We also conduct several studies to validate the effects of each component of MF-DAKT.

Desirable Companion for Vertical Federated Learning: New Zeroth-Order Gradient Based Algorithm

Qingsong Zhang
Bin Gu
Zhiyuan Dang
Cheng Deng
Heng Huang

Vertical federated learning (VFL) attracts increasing attention due to the emerging demands of multi-party collaborative modeling and concerns of privacy leakage. A complete list of metrics to evaluate VFL algorithms should include model applicability, privacy security, communication cost, and computation efficiency, where privacy security is especially important to VFL. However, to the best of our knowledge, there does not exist a VFL algorithm satisfying all these criteria very well. To address this challenging problem, in this paper, we reveal that zeroth-order optimization (ZOO) is a desirable companion for VFL. Specifically, ZOO can 1) improve the model applicability of VFL framework, 2) prevent VFL framework from privacy leakage under curious, colluding, and malicious threat models, 3) support inexpensive communication and efficient computation. Based on that, we propose a novel and practical VFL framework with black-box models, which is inseparably interconnected to the promising properties of ZOO. We believe that it takes one stride towards designing a practical VFL framework matching all the criteria. Under this framework, we raise two novel asynchronous zeroth-order algorithms for vertical federated learning (AsyREVEL) with different smoothing techniques. We theoretically drive the convergence rates of AsyREVEL algorithms under nonconvex condition. More importantly, we prove the privacy security of our proposed framework under existing VFL attacks on different levels. Extensive experiments on benchmark datasets demonstrate the favorable model applicability, satisfied privacy security, inexpensive communication, efficient computation, scalability and losslessness of our framework.

HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources

Taolin Zhang
Zerui Cai
Chengyu Wang
Peng Li
Yang Li
Minghui Qiu
Chengguang Tang
Xiaofeng He
Jun Huang

Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the language understanding abilities of deep language models by leveraging the rich semantic knowledge from knowledge graphs, other than plain pre-training texts. However, previous efforts mostly use homogeneous knowledge (especially structured relation triples in knowledge graphs) to enhance the context-aware representations of entity mentions, whose performance may be limited by the coverage of knowledge graphs. Also, it is unclear whether these KEPLMs truly understand the injected semantic knowledge due to the "black-box'' training mechanism. In this paper, we propose a novel KEPLM named HORNET, which integrates Heterogeneous knowledge from various structured and unstructured sources into the Roberta NETwork and hence takes full advantage of both linguistic and factual knowledge simultaneously. Specifically, we design a hybrid attention heterogeneous graph convolution network (HaHGCN) to learn heterogeneous knowledge representations based on the structured relation triplets from knowledge graphs and the unstructured entity description texts. Meanwhile, we propose the explicit dual knowledge understanding tasks to help induce a more effective infusion of the heterogeneous knowledge, promoting our model for learning the complicated mappings from the knowledge graph embedding space to the deep context-aware embedding space and vice versa. Experiments show that our HORNET model outperforms various KEPLM baselines on knowledge-aware tasks including knowledge probing, entity typing and relation extraction. Our model also achieves substantial improvement over several GLUE benchmark datasets, compared to other KEPLMs.

Adversarial Separation Network for Cross-Network Node Classification

Xiaowen Zhang
Yuntao Du
Rongbiao Xie
Chongjun Wang

Node classification is an important yet challenging task in various network applications, and many effective methods have been developed for a single network. While for cross-network scenarios, neither single network embedding nor traditional domain adaptation can directly solve the task. Existing approaches have been proposed to combine network embedding and domain adaptation for cross-network node classification. However, they only focus on domain-invariant features, ignoring the individual features of each network, and they only utilize 1-hop neighborhood information (local consistency), ignoring the global consistency information. To tackle the above problems, in this paper, we propose a novel model, Adversarial Separation Network(ASN), to learn effective node representations between source and target networks. We explicitly separate domain-private and domain-shared information. Two domain-private encoders are employed to extract the domain-specific features in each network and a shared encoder is employed to extract the domain-invariant shared features across networks. Moreover, in each encoder, we combine local and global consistency to capture network topology information more comprehensively. ASN integrates deep network embedding with adversarial domain adaptation to reduce the distribution discrepancy across domains. Extensive experiments on real-world datasets show that our proposed model achieves state-of-the-art performance in cross-network node classification tasks compared with existing algorithms.

CoPE: Modeling Continuous Propagation and Evolution on Interaction Graph

Yao Zhang
Yun Xiong
Dongsheng Li
Caihua Shan
Kan Ren
Yangyong Zhu

Human interactions with items are being constantly logged, which enables advanced representation learning and facilitates various tasks. Instead of generating static embeddings at the end of training, several temporal embedding methods were recently proposed to learn user and item embeddings as functions of time, where each entity has a trajectory of embedding vectors aiming to encode the full dynamics. However, these methods may not be optimal to encode the dynamical behaviors on the interaction graphs in that they can not generate "fully''-temporal embeddings and do not consider information propagation. In this paper, we tackle the issues and propose CoPE (Co ntinuous P ropagation and E volution). We use an ordinary differential equation based graph neural network to model information propagation and more sophisticated evolution patterns. We train CoPE on sequences of interactions with the help of meta-learning to ensure fast adaptation to the most recent interactions. We evaluate CoPE on three tasks and prove its effectiveness.

Meta-Learning Based Hyper-Relation Feature Modeling for Out-of-Knowledge-Base Embedding

Yufeng Zhang
Weiqing Wang
Wei Chen
Jiajie Xu
An Liu
Lei Zhao

Knowledge graph (KG) embedding aims to encode both entities and relations into a continuous vector space. Most existing methods require that all entities should be observed during training while ignoring the evolving nature of KG. Major recent efforts on this issue embed new entities by aggregating neighborhood information from existing entities and relations with Graph Neural Network (GNN). However, these methods rely on the neighbors seen during training and suffer from the embedding of new entities with insufficient triplets or triplets with the unseen-to-unseen form. To relieve this problem, we propose a two-stage learning model referred as Hyper-Relation Feature Learning Network (HRFN) for effective out-of-knowledge-base embedding. For the first stage, HRFN learns pre-representations for emerging entities using hyper-relation features meta-learned from the training set. A novel feature aggregating network that involves an entity-centered Graph Convolutional Network (GCN) and a relation-centered GCN is proposed to aggregate information from both new entities themselves and their neighbors. For stage two, a transductive learning network is employed to learn finer-grained embeddings based on above-mentioned pre-representations of new entities. Experimental results on the link prediction task demonstrate the superiority of our model. Further analysis is also done to validate the effectiveness and efficiency of pre-representing emerging entities with the hyper-relation feature.

Pareto-optimal Community Search on Large Bipartite Graphs

Yuting Zhang
Kai Wang
Wenjie Zhang
Xuemin Lin
Ying Zhang

In many real-world applications, bipartite graphs are naturally used to model relationships between two types of entities. Community discovery over bipartite graphs is a fundamental problem and has attracted much attention recently. However, all existing studies overlook the weight (e.g., influence or importance) of vertices in forming the community, thus missing useful properties of the community. In this paper, we propose a novel cohesive subgraph model named Pareto-optimal (α β), which is the first to consider both structure cohesiveness and weight of vertices on bipartite graphs. The proposed Pareto-optimal (α β) model follows the concept of (α, β)-core by imposing degree constraints for each type of vertices, and integrates the Pareto-optimality in modelling the weight information from two different types of vertices. An online query algorithm is developed to retrieve Pareto-optimal (α β) with the time complexity of O(p. m) where p is the number of resulting communities, and m is the number of edges in the bipartite graph G. To support efficient query processing over large graphs, we also develop index-based approaches. A complete index i is proposed, and the query algorithm based on i achieves linear query processing time regarding the result size (i.e., the algorithm is optimal). Nevertheless, the index i incurs prohibitively expensive space complexity. To strike a balance between query efficiency and space complexity, a space-efficient compact index 𝕀 is proposed. Computation-sharing strategies are devised to improve the efficiency of the index construction process for the index 𝕀. Extensive experiments on 9 real-world graphs validate both the effectiveness and the efficiency of our query processing algorithms and indexing techniques.

Minimizing Spectral Radius of Non-Backtracking Matrix by Edge Removal

Zuobai Zhang
Zhongzhi Zhang
Guanrong Chen

The spectral radius of the non-backtracking matrix for an undirected graph plays an important role in various dynamic processes running on the graph. For example, its reciprocal provides an excellent approximation of epidemic and edge percolation thresholds. In this paper, we study the problem of minimizing the spectral radius of the non-backtracking matrix of a graph with n nodes and m edges, by deleting k selected edges. We show that the objective function of this combinatorial optimization problem is not submodular, although it is monotone. Since any straightforward approach to solving the optimization problem is computationally infeasible, we present an effective, scalable approximation algorithm with complexity O (n+km). Extensive experiment results for a large set of real-world networks verify the effectiveness and efficiency of our algorithm, and demonstrate that our algorithm outperforms several baseline schemes.

Action Sequence Augmentation for Early Graph-based Anomaly Detection

Tong Zhao
Bo Ni
Wenhao Yu
Zhichun Guo
Neil Shah
Meng Jiang

The proliferation of web platforms has created incentives for online abuse. Many graph-based anomaly detection techniques are proposed to identify the suspicious accounts and behaviors. However, most of them detect the anomalies once the users have performed many such behaviors. Their performance is substantially hindered when the users' observed data is limited at an early stage, which needs to be improved to minimize financial loss. In this work, we propose Eland, a novel framework that uses action sequence augmentation for early anomaly detection. Eland utilizes a sequence predictor to predict next actions of every user and exploits the mutual enhancement between action sequence augmentation and user-action graph anomaly detection. Experiments on three real-world datasets show that Eland improves the performance of a variety of graph-based anomaly detection methods. With Eland, anomaly detection performance at an earlier stage is better than non-augmented methods that need significantly more observed data by up to 15% on the Area under the ROC curve.

k-sums Clustering: A Stochastic Optimization Approach

Wan-Lei Zhao
Shi-Ying Lan
Run-Qing Chen
Chong-Wah Ngo

In this paper, we revisit the decades-old clustering method k -means. The egg-chicken loop in traditional k -means has been replaced by a pure stochastic optimization procedure. The optimization is undertaken from the perspective of each individual sample. Different from existing incremental k -means, an individual sample is tentatively joined into a new cluster to evaluate its distance to the corresponding new centroid, in which the contribution from this sample is accounted. The sample is moved to this new cluster concretely only after we find the reallocation makes the sample closer to the new centroid than it is to the current one. Compared with traditional k -means and other variants, this new procedure allows the clustering to converge faster to a better local minimum. This fundamental modification over the k -means loop leads to the redefinition of a family of k -means variants, such as hierarchical k -means, and Sequential k -means. As an extension, a new target function that minimizes the summation of pairwise distances within clusters is presented. Under l2-norm, it could be solved under the same stochastic optimization procedure. The re-defined traditional k -means, hierarchical k -means, as well as Sequential k-means all show considerable performance improvement over their traditional counterparts under different settings and on various types of datasets.

When Hardness Makes a Difference: Multi-Hop Knowledge Graph Reasoning over Few-Shot Relations

Shangfei Zheng
Wei Chen
Pengpeng Zhao
An Liu
Junhua Fang
Lei Zhao

Knowledge graph (KG) reasoning is a significant method for KG completion. To enhance the explainability of KG reasoning, some studies adopt reinforcement learning (RL) to complete the multi-hop reasoning. However, RL-based reasoning methods are severely limited by few-shot relations (only contain few triplets). To tackle the problem, recent studies introduce meta-learning into RL-based methods to improve reasoning performance. However, the generalization abilities of their models are limited due to the problem of low reasoning accuracies over hard relations (e.g., language and title). To overcome this problem, we propose a novel model called THML (Two-level Hardness-aware Meta-reinforcement Learning). Specifically, the model contains the following two components: (1) A hardness-aware meta-reinforcement learning method is proposed to predict the missing element by training hardness-aware batches. (2) A two-level hardness-aware sampling is proposed to effectively generate new hardness-aware batches from relation level and relation-cluster level. The generalization ability of our model is significantly improved by repeating the process of these two components in an alternate way. The experimental results demonstrate that THML notably outperforms the state-of-the-art approaches in few-shot scenarios.

Automated Query Graph Generation for Querying Knowledge Graphs

Weiguo Zheng
Mei Zhang

Natural language question answering over knowledge graphs is an important and interesting task as it enables common users to gain accurate answers in an easy and intuitive manner. However, it remains a challenge to bridge the gap between unstructured questions and structured knowledge graphs. To address the problem, a natural discipline is building a structured query to represent the input question. Searching the structured query over the knowledge graph can produce answers to the question. Distinct from the existing methods that are based on semantic parsing or templates, we propose an effective approach qaSQP powered by a novel notion, structural query pattern, in this paper. Given an input question, we first generate its query sketch that is compatible with the underlying structure of the knowledge graph. Then, we complete the query graph by labeling the nodes and edges under the guidance of the structural query pattern. Finally, answers can be retrieved by executing the constructed query graph over the knowledge graph. In order to improve the overall performance of answering questions, we propose the mutual optimization technique. Evaluations on three question-answering benchmarks show that our proposed approach outperforms state-of-the-art methods significantly.

Understanding the Property of Long Term Memory for the LSTM with Attention Mechanism

Wendong Zheng
Putian Zhao
Kai Huang
Gang Chen

Recent trends of incorporating LSTM network with different attention mechanisms in time series forecasting have led researchers to consider the attention module as an essential component. While existing studies revealed the effectiveness of attention mechanism with some visualization experiments, the underlying rationale behind their outstanding performance on learning long-term dependencies remains hitherto obscure. In this paper, we aim to elaborate on this fundamental question by conducting a thorough investigation of the memory property for LSTM network with attention mechanism. We present a theoretical analysis of LSTM integrated with attention mechanism, and demonstrate that it is capable of generating an adaptive decay rate which dynamically controls the memory decay according to the obtained attention score. In particular, our theory shows that attention mechanism brings significantly slower decays than the exponential decay rate of a standard LSTM. Experimental results on four real-world time series datasets demonstrate the superiority of the attention mechanism for maintaining long-term memory when compared to the state-of-the-art methods, and further corroborate our theoretical analysis.

Relation Network and Causal Reasoning for Image Captioning

Dongming Zhou
Jing Yang

Image captioning is a cross-modal problem combining computer vision and natural language processing. A typical image captioning model uses a convolutional neural network to extract the features of an image and then uses an Long Short-Term Memory network to transform the representations of the features. However, this method has problems such as not including high-level semantics in the visual network and exposure bias in the language network. To overcome these problems, this paper proposes a novel image captioning model that combines relationship-aware and reinforcement learning. First, we design a relational awareness network as the visual network to mine the latent relationships between objects in an image. Then, a context semantic relational network is proposed to improve the accuracy of image captioning. The context semantic network can generate feature representations for arbitrary pixel positions in an image without association with any specific visual concepts. Subsequently, the high-level context semantics are used as external knowledge to guide the language network in generating sentences. Finally, a policy gradient training algorithm is designed to simplify the state value function in reinforcement learning. We have verified the effectiveness of the model on the MS-COCO and Flickr 30K datasets. The experimental results show that the model proposed in this paper achieves state-of-the-art results.

Understanding and Resolving Performance Degradation in Deep Graph Convolutional Networks

Kuangqi Zhou
Yanfei Dong
Kaixin Wang
Wee Sun Lee
Bryan Hooi
Huan Xu
Jiashi Feng

A Graph Convolutional Network (GCN) stacks several layers and in each layer performs a PROPagation operation~(PROP) and a TRANsformation operation~(TRAN) for learning node representations over graph-structured data. Though powerful, GCNs tend to suffer performance drop when the model gets deep. Previous works focus on PROPs to study and mitigate this issue, but the role of TRANs is barely investigated. In this work, we study performance degradation of GCNs by experimentally examining how stacking only TRANs or PROPs works. We find that TRANs contribute significantly, or even more than PROPs, to declining performance, and moreover that they tend to amplify node-wise feature variance in GCNs, causing variance inflammation that we identify as a key factor for causing performance drop. Motivated by such observations, we propose a variance-controlling technique termed Node Normalization (NodeNorm), which scales each node's features using its own standard deviation. Experimental results validate the effectiveness of NodeNorm on addressing performance degradation of GCNs. Specifically, it enables deep GCNs to outperform shallow ones in cases where deep models are needed, and to achieve comparable results with shallow ones on 6 benchmark datasets. NodeNorm is a generic plug-in and can well generalize to other GNN architectures. Code is publicly available at https://github.com/miafei/NodeNorm.

#StayHome or #Marathon?: Social Media Enhanced Pandemic Surveillance on Spatial-temporal Dynamic Graphs

Yichao Zhou
Jyun-Yu Jiang
Xiusi Chen
Wei Wang

COVID-19 has caused lasting damage to almost every domain in public health, society, and economy. To monitor the pandemic trend, existing studies rely on the aggregation of traditional statistical models and epidemic spread theory. In other words, historical statistics of COVID-19, as well as the population mobility data, become the essential knowledge for monitoring the pandemic trend. However, these solutions can barely provide precise prediction and satisfactory explanations on the long-term disease surveillance while the ubiquitous social media resources can be the key enabler for solving this problem. For example, serious discussions may occur on social media before and after some breaking events take place. To take advantage of the social media data, we propose a novel framework, Social Media enhAnced pandemic suRveillance Technique (SMART), which is composed of two modules: (i) information extraction module to construct heterogeneous knowledge graphs based on the extracted events and relationships among them; (ii) time series prediction module to provide both short-term and long-term forecasts of the confirmed cases and fatality at the state-level in the United States and to discover risk factors for COVID-19 interventions. Extensive experiments show that our method largely outperforms the state-of-the-art baselines by 7.3% and 7.4% in confirmed case/fatality prediction, respectively.

PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling

Yujia Zhou
Zhicheng Dou
Yutao Zhu
Ji-Rong Wen

Personalized search plays a crucial role in improving user search experience owing to its ability to build user profiles based on historical behaviors. Previous studies have made great progress in extracting personal signals from the query log and learning user representations. However, neural personalized search is extremely dependent on sufficient data to train the user model. Data sparsity is an inevitable challenge for existing methods to learn high-quality user representations. Moreover, the overemphasis on final ranking quality leads to rough data representations and impairs the generalizability of the model. To tackle these issues, we propose a Personalized Search framework with Self-supervised Learning (PSSL) to enhance data representations. Specifically, we adopt a contrastive sampling method to extract paired self-supervised information from sequences of user behaviors in query logs. Four auxiliary tasks are designed to pre-train the sentence encoder and the sequence encoder used in the ranking model. They are optimized by contrastive loss which aims to close the distance between similar user sequences, queries, and documents. Experimental results on two datasets demonstrate that our proposed model PSSL achieves state-of-the-art performance compared with existing baselines.

Open Benchmarking for Click-Through Rate Prediction

Jieming Zhu
Jinyang Liu
Shuai Yang
Qi Zhang
Xiuqiang He

Click-through rate (CTR) prediction is a critical task for many applications, as its accuracy has a direct impact on user experience and platform revenue. In recent years, CTR prediction has been widely studied in both academia and industry, resulting in a wide variety of CTR prediction models. Unfortunately, there is still a lack of standardized benchmarks and uniform evaluation protocols for CTR prediction research. This leads to non-reproducible or even inconsistent experimental results among existing studies, which largely limit the practical value and potential impact of their research. In this work, we aim to perform open benchmarking for CTR prediction and present a rigorous comparison of different models in a reproducible manner. To this end, we ran over 7,000 experiments for more than 12,000 GPU hours in total to re-evaluate 24 existing models on multiple dataset settings. Surprisingly, our experiments show that with sufficient hyper-parameter search and model tuning, many deep models have smaller differences than expected. The results also reveal that making real progress on the modeling of CTR prediction is indeed a very challenging research task. We believe that our benchmarking work could not only allow researchers to gauge the effectiveness of new models conveniently but also make them fairly compare with the state of the arts. We have publicly released the benchmarking tools, evaluation protocols, and experimental settings of our work to promote reproducible research in this field.

Summarizing Long-Form Document with Rich Discourse Information

Tianyu Zhu
Wen Hua
Jianfeng Qu
Xiaofang Zhou

The development of existing extractive summarization models for long-form document summarization is hindered by two factors: 1) the computation of the summarization model will dramatically increase due to the sheer size of the input long document; 2) the discourse structural information in the long-form document has not been fully exploited. To address the two deficiencies, we propose HEROES, a novel extractive summarization model for summarizing long-form documents with rich discourse structural information. In particular, the HEROES model consists of two modules: 1) a content ranking module that ranks and selects salient sections and sentences to compose a short digest that empowers complex summarization models and serves as its input; 2) an extractive summarization module based on a heterogeneous graph with nodes from different discourse levels and elaborately designed edge connections to reflect the discourse hierarchy of the document and restrain the semantic drifts across section boundaries. Experimental results on benchmark datasets show that HEROES can achieve significantly better performance compared with various strong baselines.

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking

Yutao Zhu
Jian-Yun Nie
Zhicheng Dou
Zhengyi Ma
Xinyu Zhang
Pan Du
Xiaochen Zuo
Hao Jiang

Context information in search sessions has proven to be useful for capturing user search intent. Existing studies explored user behavior sequences in sessions in different ways to enhance query suggestion or document ranking. However, a user behavior sequence has often been viewed as a definite and exact signal reflecting a user's behavior. In reality, it is highly variable: user's queries for the same intent can vary, and different documents can be clicked. To learn a more robust representation of the user behavior sequence, we propose a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences. Specifically, we propose three data augmentation strategies to generate similar variants of user behavior sequences and contrast them with other sequences. In so doing, the model is forced to be more robust regarding the possible variations. The optimized sequence representation is incorporated into document ranking. Experiments on two real query log datasets show that our proposed model outperforms the state-of-the-art methods significantly, which demonstrates the effectiveness of our method for context-aware document ranking.

SESSION: Short Paper Track

Discovering Time-invariant Causal Structure from Temporal Data

Saima Absar
Lu Zhang

Discovering causal structure from temporal data is an important problem in many fields in science. Existing methods usually suffer from several limitations such as assuming linear dependencies among features, limiting to discrete time series, and/or assuming stationarity, i.e., causal dependencies are repeated with the same time lag and strength at all time points. In this paper, we propose an algorithm called the μ-PC that addresses these limitations. It is based on the theory of μ-separation and extends the well-known PC algorithm to the time domain. To be applicable to both discrete and continuous time series, we develop a conditional independence testing technique for time series by leveraging the Recurrent Marked Temporal Point Process (RMTPP) model. Experiments using both synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithm.

VerSaChI: Finding Statistically Significant Subgraph Matches using Chebyshev's Inequality

Shubhangi Agarwal
Sourav Dutta
Arnab Bhattacharya

Approximate subgraph matching, an important primitive for many applications like question answering, community detection, and motif discovery, often involves large labeled graphs such as knowledge graphs, social networks, and protein sequences. Effective methods for extracting matching subgraphs, in terms of label and structural similarities to a query, should depict accuracy, computational efficiency, and robustness to noise. In this paper, we propose VerSaChI for finding the top-k most similar subgraphs based on 2-hop label and structural overlap similarity with the query. The similarity is characterized using Chebyshev's inequality to compute the chi-square statistical significance for measuring the degree of matching of the subgraphs. Experiments on real-life graph datasets showcase significant improvements in terms of accuracy compared to state-of-the-art methods, as well as robustness to noise.

Goal-Directed Extractive Summarization of Financial Reports

Yash Agrawal
Vivek Anand
Manish Gupta
S Arunachalam
Vasudeva Varma

Financial reports filed by various companies discuss compliance, risks, and future plans, such as goals and new projects, which directly impact their stock price. Quick consumption of such information is critical for financial analysts and investors to make stock buy/sell decisions and for equity evaluations. Hence, we study the problem of extractive summarization of 10-K reports. Recently, Transformer-based summarization models have become very popular. However, lack of in-domain labeled summarization data is a major roadblock to train such finance-specific summarization models. We also show that zero-shot inference on such pretrained models is not as effective either. In this paper, we address this challenge by modeling 10-K report summarization using a goal-directed setting where we leverage summaries with labeled goal-related data for the stock buy/sell classification goal. Further, we provide improvements by considering a multi-task learning method with an industry classification auxiliary task. Intrinsic evaluation as well as extrinsic evaluation for the stock buy/sell classification and portfolio construction tasks shows that our proposed method significantly outperforms strong baselines.

Accurate Online Tensor Factorization for Temporal Tensor Streams with Missing Values

Dawon Ahn
Seyun Kim
U Kang

Given a time-evolving tensor stream with missing values, how can we accurately discover latent factors in an online manner to predict missing values? Online tensor factorization is a crucial task with many important applications including the analysis of climate, network traffic, and epidemic disease. However, existing online methods have disregarded temporal locality and thus have limited accuracy.

In this paper, we propose STF (Streaming Tensor Factorization), an accurate online tensor factorization method for real-world temporal tensor streams with missing values. We exploit an attention-based temporal regularization to learn inherent temporal patterns of the streams. We also propose an efficient online learning algorithm which allows each row of the temporal factor matrix to be updated from past and future information. Extensive experiments show that the proposed method gives the state-of-the-art accuracy, and quickly processes each tensor slice.

Variational Graph Normalized AutoEncoders

Seong Jin Ahn
MyoungHo Kim

Link prediction is one of the key problems for graph-structured data. With the advancement of graph neural networks, graph autoencoders (GAEs) and variational graph autoencoders (VGAEs) have been proposed to learn graph embeddings in an unsupervised way. It has been shown that these methods are effective for link prediction tasks. However, they do not work well in link predictions when a node whose degree is zero (i.g., isolated node) is involved. We have found that GAEs/VGAEs make embeddings of isolated nodes close to zero regardless of their content features. In this paper, we propose a novel Variational Graph Normalized AutoEncoder (VGNAE) that utilize L2-normalization to derive better embeddings for isolated nodes. We show that our VGNAEs outperform the existing state-of-the-art models for link prediction tasks. The code is available at https://github.com/SeongJinAhn/VGNAE.

Structure Aware Experience Replay for Incremental Learning in Graph-based Recommender Systems

Kian Ahrabian
Yishi Xu
Yingxue Zhang
Jiapeng Wu
Yuening Wang
Mark Coates

Large-scale recommender systems are integral parts of many services. With the recent rapid growth of accessible data, the need for efficient training methods has arisen. Given the high computational cost of training state-of-the-art graph neural network (GNN) based models, it is infeasible to train them from scratch with every new set of interactions. In this work, we present a novel framework for incrementally training GNN-based models. Our framework takes advantage of an experience reply technique built on top of a structurally aware reservoir sampling method tailored for this setting. This framework addresses catastrophic forgetting, allowing the model to preserve its understanding of users' long-term behavioral patterns while adapting to new trends. Our experiments demonstrate the superior performance of our framework on numerous datasets when combined with state-of-the-art GNN-based models.

How to Leverage a Multi-layered Transformer Language Model for Text Clustering: an Ensemble Approach

Mira Ait-Saada
François Role
Mohamed Nadif

Pre-trained Transformer-based word embeddings are now widely used in text mining where they are known to significantly improve supervised tasks such as text classification, named entity recognition and question answering. Since the Transformer models create several different embeddings for the same input, one at each layer of their architecture, various studies have already tried to identify those of these embeddings that most contribute to the success of the above-mentioned tasks. In contrast the same performance analysis has not yet been carried out in the unsupervised setting. In this paper we evaluate the effectiveness of Transformer models on the important task of text clustering. In particular, we present a clustering ensemble approach that harnesses all the network's layers. Numerical experiments carried out on real datasets with different Transformer models show the effectiveness of the proposed method compared to several baselines.

Scalable Contrast Pattern Mining over Data Streams

Elaheh Alipourchavary
Sarah M. Erfani
Christopher Leckie

Incremental contrast pattern mining (CPM) is an important task in various fields such as network traffic analysis, medical diagnosis, and customer behavior analysis. Due to increases in the speed and dimension of data streams, a major challenge for CPM is to deal with the huge number of generated candidate patterns. While there are some works on incremental CPM, their approaches are not scalable in dense and high dimensional data streams, and the problem of CPM over an evolving dataset is an open challenge. In this work we focus on extracting the most specific set of contrast patterns (CPs) to discover significant changes between two data streams. We devise a novel algorithm to extract CPs using previously mined patterns instead of generating all patterns in each window from scratch. Our experimental results on a wide variety of datasets demonstrate the advantages of our approach over the state of the art in terms of efficiency.

XPL-CF: Explainable Embeddings for Feature-based Collaborative Filtering

Faisal M. Almutairi
Nicholas D. Sidiropoulos
Bo Yang

Collaborative filtering (CF) methods are making an impact on our daily lives in a wide range of applications, including recommender systems and personalization. Latent factor methods, e.g., matrix factorization (MF), have been the state-of-the-art in CF, however they lack interpretability and do not provide a straightforward explanation for their predictions. Explainability is gaining momentum in recommender systems for accountability, and because a good explanation can swing an undecided user. Most recent explainable recommendation methods require auxiliary data such as review text or item content on top of item ratings. In this paper, we address the case where no additional data are available and propose augmenting the classical MF framework for CF with a prior that encodes each user's embedding as a sparse linear combination of item embeddings, and vice versa for each item embedding. Our XPL-CF approach automatically reveals these user-item relationships, which underpin the latent factors and explain how the resulting recommendations are formed. We showcase the effectiveness of XPL-CF on real data from various application domains. We also evaluate the explainability of the user-item relationship obtained from XPL-CF through numeric evaluation and case study examples.

A Formal Analysis of Recommendation Quality of Adversarially-trained Recommenders

Vito Walter Anelli
Yashar Deldjoo
Tommaso Di Noia
Felice Antonio Merra

Recommender systems (RSs) employ user-item feedback, e.g., ratings, to match customers to personalized lists of products. Approaches to top-k recommendation mainly rely on Learning-To-Rank algorithms and, among them, the most widely adopted is Bayesian Personalized Ranking (BPR), which bases on a pair-wise optimization approach. Recently, BPR has been found vulnerable against adversarial perturbations of its model parameters. Adversarial Personalized Ranking (APR) mitigates this issue by robustifying BPR via an adversarial training procedure. The empirical improvements of APR's accuracy performance on BPR have led to its wide use in several recommender models. However, a key overlooked aspect has been the beyond-accuracy performance of APR, i.e., novelty, coverage, and amplification of popularity bias, considering that recent results suggest that BPR, the building block of APR, is sensitive to the intensification of biases and reduction of recommendation novelty. In this work, we model the learning characteristics of the BPR and APR optimization frameworks to give mathematical evidence that, when the feedback data have a tailed distribution, APR amplifies the popularity bias more than BPR due to an unbalanced number of received positive updates from short-head items. Using matrix factorization (MF), we empirically validate the theoretical results by performing preliminary experiments on two public datasets to compare BPR-MF and APR-MF performance on accuracy and beyond-accuracy metrics. The experimental results consistently show the degradation of novelty and coverage measures and a worrying amplification of bias.

BERT-QPP: Contextualized Pre-trained transformers for Query Performance Prediction

Negar Arabzadeh
Maryam Khodabakhsh
Ebrahim Bagheri

Query Performance Prediction (QPP) is focused on estimating the difficulty of satisfying a user query for a certain retrieval method. While most state of the art QPP methods are based on term frequency and corpus statistics, more recent work in this area have started to explore the utility of pretrained neural embeddings, neural architectures and contextual embeddings. Such approaches extract features from pretrained or contextual embeddings for the sake of training a supervised performance predictor. In this paper, we adopt contextual embeddings to perform performance prediction, but distinguish ourselves from the state of the art by proposing to directly fine-tune a contextual embedding, i.e., BERT, specifically for the task of query performance prediction. As such, our work allows the fine-tuned contextual representations to estimate the performance of a query based on the association between the representation of the query and the retrieved documents. We compare the performance of our approach with the state-of-the-art based on the MS MARCO passage retrieval corpus and its three associated query sets: (1) MS MARCO development set, (2) TREC DL 2019, and (3) TREC DL 2020. We show that our approach not only shows significant improved prediction performance compared to all the state-of-the-art methods, but also, unlike past neural predictors, it shows significantly lower latency, making it possible to use in practice.

Predicting Efficiency/Effectiveness Trade-offs for Dense vs. Sparse Retrieval Strategy Selection

Negar Arabzadeh
Xinyi Yan
Charles L. A. Clarke

Over the last few years, contextualized pre-trained transformer models such as BERT have provided substantial improvements on information retrieval tasks. Traditional sparse retrieval methods such as BM25 rely on high-dimensional, sparse, bag-of-words query representations to retrieve documents. On the other hand, recent approaches based on pre-trained transformer models such as BERT, fine-tune dense low-dimensional contextualized representations of queries and documents in embedding space. While these dense retrievers enjoy substantial retrieval effectiveness improvements compared to sparse retrievers, they are computationally intensive, requiring substantial GPU resources, and dense retrievers are known to be more expensive from both time and resource perspectives. In addition, sparse retrievers have been shown to retrieve complementary information with respect to dense retrievers, leading to proposals for hybrid retrievers. These hybrid retrievers leverage low-cost, exact-matching based sparse retrievers along with dense retrievers to bridge the semantic gaps between query and documents. In this work, we address this trade-off between the cost and utility of sparse vs dense retrievers by proposing a classifier to select a suitable retrieval strategy (i.e., sparse vs. dense vs. hybrid) for individual queries. Leveraging sparse retrievers for queries which can be answered with sparse retrievers decreases the number of calls to GPUs. Consequently, while utility is maintained, query latency decreases. Although we use less computational resources and spend less time, we still achieve improved performance. Our classifier can select between sparse and dense retrieval strategies based on the query alone. We conduct experiments on the MS MARCO passage dataset demonstrating an improved range of efficiency/effectiveness trade-offs between purely sparse, purely dense or hybrid retrieval strategies, allowing an appropriate strategy to be selected based on a target latency and resource budget.

Understanding Multi-channel Customer Behavior in Retail

Mozhdeh Ariannezhad
Sami Jullien
Pim Nauts
Min Fang
Sebastian Schelter
Maarten de Rijke

Online shopping is gaining popularity. Traditional retailers with physical stores adjust to this trend by allowing their customers to shop online as well as offline, in-store. Increasingly, customers can browse and purchase products across multiple shopping channels. Understanding how customer behavior relates to the availability of multiple shopping channels is an important prerequisite for many downstream machine learning tasks, such as recommendation and purchase prediction. However, previous work in this domain is limited to analyzing single-channel behavior only.

In this paper, we provide the first insights into multi-channel customer behavior in retail based on a large sample of 2.8 million transactions originating from 300,000 customers of a food retailer in Europe. Our analysis reveals significant differences in customer behavior across online and offline channels, for example with respect to the repeat ratio of item purchases and basket size. Based on these findings, we investigate the performance of a next basket recommendation model under multi-channel settings. We find that the recommendation performance differs significantly for customers based on their choice of shopping channel, which strongly indicates that future research on recommenders in this area should take into account the particular characteristics of multi-channel retail shopping.

Time-Aware Recommender System via Continuous-Time Modeling

Jianghan Bao
Yu Zhang

The overload of information on the Internet becomes ubiquitous nowadays, which makes the role of recommender systems more important. In recommender systems, the interest of users and popularity of items are not static, but can change drastically. Thus modeling the temporal dynamic of user-item interactions is crucial in recommender systems. The newly proposed Neural Ordinary Differential Equation (NODE) method is able to modeling the temporal mechanism of a system with neural networks. By using the ODE-LSTM method, which unites the ability of NODE to handle continuous time and that of LSTM to address sequential data, in this paper we achieve significant improvements for the recommendation task on several real-world datasets with the time irregularity. To handle sessions with different timestamps in ODE-LSTM, we propose a collective timeline technique that contributes a lot to the performance improvement. Moreover, we find that reducing the scale of time intervals in sessions significantly improves the recommendation performance.

Anchor-based Collaborative Filtering

Oren Barkan
Roy Hirsch
Ori Katz
Avi Caciularu
Noam Koenigstein

Modern-day recommender systems are often based on learning representations in a latent vector space that encode user and item preferences. In these models, each user/item is represented by a single vector and user-item interactions are modeled by some function over the corresponding vectors. This paradigm is common to a large body of collaborative filtering models that repeatedly demonstrated superior results. In this work, we break away from this paradigm and present ACF: Anchor-based Collaborative Filtering. Instead of learning unique vectors for each user and each item, ACF learns a spanning set of anchor-vectors that commonly serve both users and items. In ACF, each anchor corresponds to a unique "taste'' and users/items are represented as a convex combination over the spanning set of anchors. Additionally, ACF employs two novel constraints: (1) exclusiveness constraint on item-to-anchor relations that encourages each item to pick a single representative anchor, and (2) an inclusiveness constraint on anchors-to-items relations that encourages full utilization of all the anchors. We compare ACF with other state-of-the-art alternatives and demonstrate its effectiveness on multiple datasets.

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Oren Barkan
Edan Hauon
Avi Caciularu
Ori Katz
Itzik Malkiel
Omri Armstrong
Noam Koenigstein

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

Boosting Few-shot Abstractive Summarization with Auxiliary Tasks

Qiwei Bi
Haoyuan Li
Hanfang Yang

For summarization in niche domains, data is not enough to fine-tune the large pre-trained model. In order to alleviate the few-shot problem, we design several auxiliary tasks to assist the main task---abstractive summarization. In this paper, we employ BART as the base sequence-to-sequence model and incorporate the main and auxiliary tasks under the multi-task framework. We transform all the tasks in the format of machine reading comprehension [19]. Moreover, we utilize the task-specific adapter to effectively share knowledge across tasks and the adaptive weight mechanism to adjust the contribution of auxiliary tasks to the main task. Experiments show the effectiveness of our method for few-shot datasets. We also propose to firstly pre-train the model on unlabeled datasets, and the methods proposed in this paper can further improve the model performance.

Misbeliefs and Biases in Health-Related Searches

Alexander Bondarenko
Ekaterina Shirshakova
Marina Driker
Matthias Hagen
Pavel Braslavski

Quality of search engine results returned to health-related questions is very critical, since a searcher may directly trust any suggestion in the top results. We analyze search questions that mention diseases / symptoms and remedies that are potential health-related misbeliefs. Using lists of medical and alternative medicine terms, we extract health-related search questions from 1.5~billion questions submitted to Yandex. As an initial study, we sample 30 frequent questions that contain a disease--remedy pair like "Can hepatitis be cured with milk thistle?". For each question, we carefully identify a ground truth answer in the medical literature and annotate the top-10 Yandex search result snippets as confirming the belief, rejecting it, or giving no answer. Our analysis shows that about 44%~of the snippets (that users may simply interpret as definitive answers!) confirm some untrue beliefs and are wrong, and only few include health risk warnings about using toxic plants.

Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction

Pengfei Cao
Xinyu Zuo
Yubo Chen
Kang Liu
Jun Zhao
Wei Bi

Extracting event temporal relations is an important task for natural language understanding. Many works have been proposed for supervised event temporal relation extraction, which typically requires a large amount of human-annotated data for model training. However, the data annotation for this task is very time-consuming and challenging. To this end, we study the problem of semi-supervised event temporal relation extraction. Self-training as a widely used semi-supervised learning method can be utilized for this problem. However, it suffers from the noisy pseudo-labeling problem. In this paper, we propose the use of uncertainty-aware self-training framework (UAST) to quantify the model uncertainty for coping with pseudo-labeling errors. Specifically, UAST utilizes (1) Uncertainty Estimation module to compute the model uncertainty for pseudo-labeling unlabeled data; (2) Sample Selection with Exploration module to select informative samples based on uncertainty estimates; and (3) Uncertainty-Aware Learning module to explicitly incorporate the model uncertainty into the self-training process. Experimental results indicate that our approach significantly outperforms previous state-of-the-art methods.

Spectral Graph Attention Network with Fast Eigen-approximation

Heng Chang
Yu Rong
Tingyang Xu
Wenbing Huang
Somayeh Sojoudi
Junzhou Huang
Wenwu Zhu

Variants of Graph Neural Networks (GNNs) for representation learning have been proposed recently and achieved fruitful results in various fields. Among them, Graph Attention Network (GAT) first employs a self-attention strategy to learn attention weights for each edge in the spatial domain. However, learning the attentions over edges can only focus on the local information of graphs and greatly increases the computational costs. In this paper, we first introduce the attention mechanism in the spectral domain of graphs and present Spectral Graph Attention Network (SpGAT) that learns representations for different frequency components regarding weighted filters and graph wavelets bases. In this way, SpGAT can better capture global patterns of graphs in an efficient manner with much fewer learned parameters than that of GAT. Further, to reduce the computational cost of SpGAT brought by the eigen-decomposition, we propose a fast approximation variant SpGAT-Cheby. We thoroughly evaluate the performance of SpGAT and SpGAT-Cheby in semi-supervised node classification tasks and verify the effectiveness of the learned attentions in the spectral domain.

New Tight Relaxations of Rank Minimization for Multi-Task Learning

Wei Chang
Feiping Nie
Rong Wang
Xuelong Li

Multi-task learning has been observed by many researchers, which supposes that different tasks can share a low-rank common yet latent subspace. It means learning multiple tasks jointly is better than learning them independently. In this paper, we propose two novel multi-task learning formulations based on two regularization terms, which can learn the optimal shared latent subspace by minimizing the exactly k minimal singular values. The proposed regularization terms are the more tight approximations of rank minimization than trace norm. But it's an NP-hard problem to solve the exact rank minimization problem. Therefore, we design a novel re-weighted based iterative strategy to solve our models, which can tactically handle the exact rank minimization problem by setting a large penalizing parameter. Experimental results on benchmark datasets demonstrate that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.

Constructing Noise Free Economic Policy Uncertainty Index

Chung-Chi Chen
Hen-Hsen Huang
Yu-Lieh Huang
Hsin-Hsi Chen

The economic policy uncertainty (EPU) index is one of the important text-based indexes in finance and economics fields. The EPU indexes of more than 26 countries have been constructed to reflect the policy uncertainty on country-level economic environments and serve as an important economic leading indicator. The EPU indexes are calculated based on the number of news articles with some manually-selected keywords related to economic, uncertainty, and policy. We find that the keyword-based EPU indexes contain noise, which will influence their explainability and predictability. In our experimental dataset, over 40% of news articles with the selected keywords are not related to the EPU. Instead of using keywords only, our proposed models take contextual information into account and get good performance on identifying the articles unrelated to EPU. The noise free EPU index performs better than the keyword-based EPU index in both explainability and predictability.

Distilling Numeral Information for Volatility Forecasting

Chung-Chi Chen
Hen-Hsen Huang
Yu-Lieh Huang
Hsin-Hsi Chen

The volatility of stock price reflects the risk of stock and influences the risk of investor's portfolio. It is also a crucial part of pricing derivative securities. Researchers have paid their attention to predict the stock volatility with different kinds of textual data. However, most of them focus on using word information only. Few touch on capturing the numeral information in textual data, providing fine-grained clues for financial document understanding. In this paper, we present a novel dataset, ECNum, for understanding the numerals in the transcript of earnings conference calls. We propose a simple but efficient method, Numeral-Aware Model (NAM), for enhancing the capacity of numeral understanding of neural network models. We employ the distilled information in the stock volatility forecasting task and achieve the best performance compared to the previous works in short-term scenarios.

NQuAD: 70,000+ Questions for Machine Comprehension of the Numerals in Text

Chung-Chi Chen
Hen-Hsen Huang
Hsin-Hsi Chen

Numeral information plays an important role in narratives of several domains such as medicine, engineering, and finance. Previous works focus on the foundation exploration toward numeracy and show that fine-grained numeracy is a challenging task. In machine reading comprehension, our statistics show that only a few numeral-related questions appear in previous datasets. It indicates that few benchmark datasets are designed for numeracy learning. In this paper, we present a Numeral-related Question Answering Dataset, NQuAD, for fine-grained numeracy, and propose several baselines for future works. We compare NQuAD with three machine reading comprehension datasets and show that NQuAD is more challenging than the numeral-related questions in other datasets. NQuAD is published under the CC BY-NC-SA 4.0 license for academic purposes.

Mitigating Deep Double Descent by Concatenating Inputs

John Chen
Qihan Wang
Anastasios Kyrillidis

The double descent curve is one of the most intriguing properties of deep neural networks. It contrasts the classical bias-variance curve with the behavior of modern neural networks, occurring where the number of samples nears the number of parameters. In this work, we explore the connection between the double descent phenomena and the number of samples in the deep neural network setting. In particular, we propose a construction which augments the existing dataset by artificially increasing the number of samples. This construction empirically mitigates the double descent curve in this setting. We reproduce existing work on deep double descent, and observe a smooth descent into the overparameterized region for our construction. This occurs both with respect to the model size, and with respect to the number epochs.

Adversarial Reprogramming of Pretrained Neural Networks for Fraud Detection

Lingwei Chen
Yujie Fan
Yanfang Ye

Machine learning models have been widely used for fraud detection, while developing and maintaining these models often suffers from significant limitations in terms of training data scarcity and constrained resources. To address these issues, in this paper, we leverage machine learning vulnerability to adversarial attacks, and design a novel model AdvRFD that Adversarially Reprograms an ImageNet classification neural network for Fraud Detection task. AdvRFD first embeds transaction features into a host image to construct new ImageNet data, and then learns a universal perturbation to be added to all inputs, such that the outputs of the pretrained model can be accordingly mapped to the final detection decisions for all transactions. Extensive experiments on two transaction datasets made over Ethereum and credit cards have demonstrated that AdvRFD is effective to detect fraud using limited data and resources.

Adversarial Learning for Incentive Optimization in Mobile Payment Marketing

Xuanying Chen
Zhining Liu
Li Yu
Sen Li
Lihong Gu
Xiaodong Zeng
Yize Tan
Jinjie Gu

Many payment platforms hold large-scale marketing campaigns, which allocate incentives to encourage users to pay through their applications. To maximize the return on investment, incentive allocations are commonly solved in a two-stage procedure. After training a response estimation model to estimate the users' mobile payment probabilities (MPP), a linear programming process is applied to obtain the optimal incentive allocation. However, the large amount of biased data in the training set, generated by the previous biased allocation policy, causes a biased estimation. This bias deteriorates the performance of the response model and misleads the linear programming process, dramatically degrading the performance of the resulting allocation policy. To overcome this obstacle, we propose a bias correction adversarial network. Our method leverages the small set of unbiased data obtained under a full-randomized allocation policy to train an unbiased model and then uses it to reduce the bias with adversarial learning. Offline and online experimental results demonstrate that our method outperforms state-of-the-art approaches and significantly improves the performance of the resulting allocation policy in a real-world marketing campaign.

MGNETS: Multi-Graph Neural Networks for Table Search

Zhiyu Chen
Mohamed Trabelsi
Jeff Heflin
Dawei Yin
Brian D. Davison

Table search aims to retrieve a list of tables given a user's query. Previous methods only consider the textual information of tables and the structural information is rarely used. In this paper, we propose to model the complex relations in the table corpus as one or more graphs and then utilize graph neural networks to learn representations of queries and tables. We show that the text-based table retrieval methods can be further improved by graph-based predictions which fuse multiple field-level information.

Using Topic Modeling and Adversarial Neural Networks for Fake News Video Detection

Hyewon Choi
Youngjoong Ko

Fake news videos are being actively produced and uploaded on YouTube to attract public attention. In this paper,we propose a topic-agnostic fake news video detection model based on adversarial learning and topic modeling. The proposed model estimates the topic distribution of a video using its title/description and comments by topic modeling and tries to identify the differences in stance by the topic distribution difference between title/description and comments. Then, it constructs an adversarial neural network to extract topic-agnostic features effectively. The proposed model can effectively detect topic changes for stance analysis and easily shift among various topics. In this study, it achieves an F1-score 2.68% point greater than previous models in fake news video detection.

Variational Cross-Network Embedding for Anonymized User Identity Linkage

Xiaokai Chu
Xinxin Fan
Zhihua Zhu
Jingping Bi

User identity linkage (UIL) task aims to infer the identical users between different social networks/platforms. Existing models leverage the labeled inter-linkages or high-quality user attributes to make predictions. Nevertheless, it is often difficult or even impossible to obtain such information in real-world applications. To this end, we in this paper focus on studying an Anonymized User Identity Linkage (AUIL) problem wherein neither labeled anchor users nor attributes are available. To handle such a practical and challenging task, we propose a novel and concise unsupervised embedding method, VCNE, by utilizing the network structural information. Concretely, considering the inherent properties of structural diversity in the AUIL problem, we introduce a variational cross-network embedding learning framework to jointly study the Gaussian embeddings instead of the existing deterministic embedding from the angle of vector space. The multi-facet experiments on both real-world and synthetic datasets demonstrate that VCNE not only outperforms all baselines to a large extent but also be more robust to the different-level diversities and sparsities of the networks.

ST-PIL: Spatial-Temporal Periodic Interest Learning for Next Point-of-Interest Recommendation

Qiang Cui
Chenrui Zhang
Yafeng Zhang
Jinpeng Wang
Mingchen Cai

Point-of-Interest (POI) recommendation is an important task in location-based social networks. It facilitates the relation modeling between users and locations. Recently, researchers recommend POIs by long- and short-term interests and achieve success. However, they fail to well capture the periodic interest. People tend to visit similar places at similar times or in similar areas. Existing models try to acquire such kind of periodicity by user's mobility status or time slot, which limits the performance of periodic interest. To this end, we propose to learn spatial-temporal periodic interest. Specifically, in the long-term module, we learn the temporal periodic interest of daily granularity, then utilize intra-level attention to form long-term interest. In the short-term module, we construct various short-term sequences to acquire the spatial-temporal periodic interest of hourly, areal, and hourly-areal granularities, respectively. Finally, we apply inter-level attention to automatically integrate multiple interests. Experiments on two real-world datasets demonstrate the state-of-the-art performance of our method.

Historical Inertia: A Neglected but Powerful Baseline for Long Sequence Time-series Forecasting

Yue Cui
Jiandong Xie
Kai Zheng

Long sequence time-series forecasting (LSTF) has become increasingly popular for its wide range of applications. Though superior models have been proposed to enhance the prediction effectiveness and efficiency, it is reckless to neglect or underestimate one of the most natural and basic temporal properties of time series: history has inertia. In this paper, we introduce a new baseline for LSTF, named historical inertia (HI). In HI, the most recent historical data points in the input time series are adopted as the prediction results. We experimentally evaluate HI on 4 public real-world datasets and 2 LSTF tasks. The results demonstrate that up to 82% relative improvement over state-of-the-art works can be achieved. We further discuss why HI works and potential ways of benefiting from it.

Does Adversarial Oversampling Help us?

Tanmoy Dam
Md Meftahul Ferdaus
Sreenatha G. Anavatti
Senthilnath Jayavelu
Hussein A Abbass

Traditional oversampling methods are generally employed to handle class imbalance in datasets. This oversampling approach is independent of the classifier; thus, it does not offer an end-to-end solution. To overcome this, we propose a three-player adversarial game-based end-to-end method, where a domain-constraints mixture of generators, a discriminator, and a multi-class classifier are used. Rather than adversarial minority oversampling, we propose an adversarial oversampling (AO) and a data-space oversampling (DO) approach. In AO, the generator updates by fooling both the classifier and discriminator, however, in DO, it updates by favoring the classifier and fooling the discriminator. While updating the classifier, it considers both the real and synthetically generated samples in AO. But, in DO, it favors the real samples and fools the subset class-specific generated samples. To mitigate the biases of a classifier towards the majority class, minority samples are over-sampled at a fractional rate. Such implementation is shown to provide more robust classification boundaries. The effectiveness of our proposed method has been validated with high-dimensional, highly imbalanced and large-scale multi-class tabular datasets. The results as measured by average class specific accuracy (ACSA) clearly indicate that the proposed method provides better classification accuracy (improvement in the range of 0.7% to 49.27%) as compared to the baseline classifier

Question Rewriting for Open-Domain Conversational QA: Best Practices and Limitations

Marco Del Tredici
Gianni Barlacchi
Xiaoyu Shen
Weiwei Cheng
Adriá de Gispert

Open-domain conversational QA (ODCQA) calls for effective question rewriting (QR), as the questions in a conversation typically lack proper context for the QA model to interpret. In this paper, we compare two types of QR approaches, generative and expansive QR, in end-to-end ODCQA systems with recently released QReCC and OR-QuAC benchmarks. While it is common practice to apply the same QR approach for both the retriever and the reader in the QA system, our results show such strategy is generally suboptimal and suggest expansive QR is better for the sparse retriever and generative QR is better for the reader. Furthermore, while conversation history modeling with dense representations outperforms QR, we show the advantages to apply both jointly, as QR boosts the performance especially when limited history turns are considered.

Towards Anomaly-resistant Graph Neural Networks via Reinforcement Learning

Kaize Ding
Xuan Shan
Huan Liu

In general, graph neural networks (GNNs) adopt the message-passing scheme to capture the information of a node (i.e., nodal attributes, and local graph structure) by iteratively transforming, aggregating the features of its neighbors. Nonetheless, recent studies show that the performance of GNNs can be easily hampered by the existence of abnormal or malicious nodes due to the vulnerability of neighborhood aggregation. Thus it is necessary to learn anomaly-resistant GNNs without the prior knowledge of ground-truth anomalies, given the fact that labeling anomalies is costly and requires intensive domain knowledge. Though removing anomalies through unsupervised anomaly detection methods could be a possible solution, it may render unreasonable GNN model performance on target tasks due to the non-differentiable gap between the two learning procedures. In order to keep the effectiveness of GNNs on anomaly-contaminated graphs, in this paper, we propose a new framework named RARE-GNN (Reinforced Anomaly-REsistant Graph Neural Networks) which can detect anomalies from the input graph and learn anomaly-resistant GNNs simultaneously. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed framework.

Simulated Annealing for Emotional Dialogue Systems

Chengzhang Dong
Chenyang Huang
Osmar Zaïane
Lili Mou

Explicitly modeling emotions in dialogue generation has important applications, such as building empathetic personal companions. In this study, we consider the task of expressing a specific emotion for dialogue generation. Previous approaches take the emotion as a training signal, which may be ignored during inference. Here, we propose a search-based emotional dialogue system by simulated annealing (SA). Specifically, we first define a scoring function that combines contextual coherence and emotional correctness. Then, SA iteratively edits a general response, and search for a generation with a high score. In this way, we enforce the presence of the desired emotion. We evaluate our system on the NLPCC2017 dataset. The proposed method shows about 12% improvements in emotion accuracy compared with the previous state-of-the-art method, without hurting the generation quality (measured by BLEU).

Reevaluating the Change Point Detection Problem with Segment-based Bayesian Online Detection

Erick Draayer
Huiping Cao
Yifan Hao

Change point detection is widely used for finding transitions between states of data generation within a time series. Methods for change point detection currently assume this transition is instantaneous and therefore focus on finding a single point of data to classify as a change point. However, this assumption is flawed because many time series actually display short periods of transitions between different states of data generation. Previous work has shown Bayesian Online Change Point Detection (BOCPD) to be the most effective method for change point detection on a wide range of different time series. This paper explores adapting the change point detection algorithms to detect abrupt changes over short periods of time. We design a segment-based mechanism to examine a window of data points within a time series, rather than a single data point, to determine if the window captures abrupt change. We test our segment-based Bayesian change detection algorithm on 36 different time series and compare it to the original BOCPD algorithm. Our results show that, for some of these 36 time series, the segment-based approach for detecting abrupt changes can much more accurately identify change points based on standard metrics.

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search

Lun Du
Xiaozhou Shi
Yanlin Wang
Ensheng Shi
Shi Han
Dongmei Zhang

Recently, deep learning methods have become mainstream in code search since they do better at capturing semantic correlations between code snippets and search queries and have promising performance. However, code snippets have diverse information from different dimensions, such as business logic, specific algorithm, and hardware communication, so it is hard for a single code representation module to cover all the perspectives. On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents. In this paper, we propose MuCoS, a multi-model ensemble learning architecture for semantic code search. It combines several individual learners, each of which emphasizes a specific perspective of code snippets. We train the individual learners on different datasets which contain different perspectives of code information, and we use a data augmentation strategy to get these different datasets. Then we ensemble the learners to capture comprehensive features of code snippets. The experiments show that MuCoS has better results than the existing state-of-the-art methods. Our source code and data are anonymously available at https://github.com/Xzh0u/MuCoS.

Fair and Robust Classification Under Sample Selection Bias

Wei Du
Xintao Wu

To address the sample selection bias between the training and test data, previous research works focus on reweighing biased training data to match the test data and then building classification models on the reweighed training data. However, how to achieve fairness in the built classification models is under-explored. In this paper, we propose a framework for robust and fair learning under sample selection bias. Our framework adopts the reweighing estimation approach for bias correction and the minimax robust estimation approach for achieving robustness on prediction accuracy. Moreover, during the minimax optimization, the fairness is achieved under the worst case, which guarantees the model's fairness on test data. We further develop two algorithms to handle sample selection bias when test data is both available and unavailable.

FairER: Entity Resolution With Fairness Constraints

Vasilis Efthymiou
Kostas Stefanidis
Evaggelia Pitoura
Vassilis Christophides

There is an urgent call to detect and prevent "biased data" at the earliest possible stage of the data pipelines used to build automated decision-making systems. In this paper, we are focusing on controlling the data bias in entity resolution (ER) tasks aiming to discover and unify records/descriptions from different data sources that refer to the same real-world entity. We formally define the ER problem with fairness constraints ensuring that all groups of entities have similar chances to be resolved. Then, we introduce FairER, a greedy algorithm for solving this problem for fairness criteria based on equal matching decisions. Our experiments show that FairER achieves similar or higher accuracy against two baseline methods over 7 datasets, while guaranteeing minimal bias.

Collaborative Experts Discovery in Social Coding Platforms

Roohollah Etemadi
Morteza Zihayat
Kuan Feng
Jason Adelman
Ebrahim Bagheri

The popularity of online social coding (SC) platforms such as GitHub is growing due to their social functionalities and tremendous support during the product development lifecycle. The rich information of experts' contributions on repositories can be leveraged to recruit experts for new/existing projects. In this paper, we define the problem of collaborative experts finding in SC platforms. Given a project, we model an SC platform as an attributed heterogeneous network, learn latent representations of network entities in an end-to-end manner and utilize them to discover collaborative experts to complete a project. Extensive experiments on real-world datasets from GitHub indicate the superiority of the proposed approach over the state-of-the-art in terms of a range of performance measures.

Recommending Datasets for Scientific Problem Descriptions

Michael Färber
Ann-Kathrin Leisinger

The steadily rising number of datasets is making it increasingly difficult for researchers and practitioners to be aware of all datasets, particularly of the most relevant datasets for a given research problem. To this end, dataset search engines have been proposed. However, they are based on user's keywords and, thus, have difficulty determining precisely fitting datasets for complex research problems. In this paper, we propose a system that recommends suitable datasets based on a given research problem description. The recommendation task is designed as a domain-specific text classification task. As shown in a comprehensive offline evaluation using various state-of-the-art models, as well as 88,000 paper abstracts and 265,000 citation contexts as research problem descriptions, we obtain an F1-score of 0.75. In an additional user study, we show that users in real-world settings are 88% satisfied in all test cases. We therefore see promising future directions for dataset recommendation.

Modeling Sequences as Distributions with Uncertainty for Sequential Recommendation

Ziwei Fan
Zhiwei Liu
Shen Wang
Lei Zheng
Philip S. Yu

The sequential patterns within the user interactions are pivotal for representing the user's preference and capturing latent relationships among items. The recent advancements of sequence modeling by Transformers advocate the community to devise more effective encoders for the sequential recommendation. Most existing sequential methods assume users are deterministic. However, item-item transitions might fluctuate significantly in several item aspects and exhibit randomness of user interests. This stochastic characteristics brings up a solid demand to include uncertainties in representing sequences and items. Additionally, modeling sequences and items with uncertainties expands users' and items' interaction spaces, thus further alleviating cold-start problems.

In this work, we propose a Distribution-based Transformer for Sequential Recommendation (DT4SR), which injects uncertainties into sequential modeling. We use Elliptical Gaussian distributions to describe items and sequences with uncertainty. We describe the uncertainty in items and sequences as Elliptical Gaussian distribution. And we adopt Wasserstein distance to measure the similarity between distributions. We devise two novel Transformers for modeling mean and covariance, which guarantees the positive-definite property of distributions. The proposed method significantly outperforms the state-of-the-art methods. The experiments on three benchmark datasets also demonstrate its effectiveness in alleviating cold-start issues. The code is available in https://github.com/DyGRec/DT4SR.

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Siddhant Garg
Goutham Ramakrishnan
Varun Thumbe

Large datasets in NLP tend to suffer from noisy labels due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a clean or noisy label, using a two-component beta mixture model fitted on the training losses at an early epoch. Using this, we jointly train the classifier and the noise model through a novel de-noising loss having two components: (i) cross-entropy of the noise model prediction with the input label, and (ii) cross-entropy of the classifier prediction with the input label, weighted by the probability of the sample having a clean label. Our empirical evaluation on two text classification tasks and two types of label noise: random and input-conditional, shows that our approach can improve classification accuracy, and prevent over-fitting to the noise.

EasyFlinkCEP: Big Event Data Analytics for Everyone

Nikos Giatrakos
Eleni Kougioumtzi
Antonios Kontaxakis
Antonios Deligiannakis
Yannis Kotidis

FlinkCEP is the Complex Event Processing (CEP) API of the Flink Big Data platform. The high expressive power of the language of FlinkCEP comes at the cost of cumbersome parameterization of the queried patterns, acting as a barrier for FlinkCEP's adoption. Moreover, properly configuring a FlinkCEP program to run over a computer cluster requires advanced skills on modern hardware administration which non-expert programmers do not possess. In this work (i) we build a novel, logical CEP operator that receives CEP pattern queries in the form of extended regular expressions and seamlessly re-writes them to FlinkCEP programs, (ii) we build a CEP Optimizer that automatically decides good job configurations for these FlinkCEP programs. We also present an experimental evaluation which demonstrates the significant benefits of our approach.

Density-Based Dynamic Curriculum Learning for Intent Detection

Yantao Gong
Cao Liu
Jiazhen Yuan
Fan Yang
Xunliang Cai
Guanglu Wan
Jiansong Chen
Ruiyao Niu
Houfeng Wang

Pre-trained language models have achieved noticeable performance on the intent detection task. However, due to assigning an identical weight to each sample, they suffer from the overfitting of simple samples and the failure to learn complex samples well. To handle this problem, we propose a density-based dynamic curriculum learning model. Our model defines the sample's difficulty level according to their eigenvectors' density. In this way, we exploit the overall distribution of all samples' eigenvectors simultaneously. Then we apply a dynamic curriculum learning strategy, which pays distinct attention to samples of various difficulty levels and alters the proportion of samples during the training process. Through the above operation, simple samples are well-trained, and complex samples are enhanced. Experiments on three open datasets verify that the proposed density-based algorithm can distinguish simple and complex samples significantly. Besides, our model obtains obvious improvement over the strong baselines.

Using Neighborhood Context to Improve Information Extraction from Visual Documents Captured on Mobile Phones

Kalpa Gunaratna
Vijay Srinivasan
Sandeep Nama
Hongxia Jin

Information Extraction from visual documents enables convenient and intelligent assistance to end users. We present a Neighborhood-based Information Extraction (NIE) approach that uses contextual language models and pays attention to the local neighborhood context in the visual documents to improve information extraction accuracy. We collect two different visual document datasets and show that our approach outperforms the state-of-the-art global context-based IE technique. In fact, NIE outperforms existing approaches in both small and large model sizes. Our on-device implementation of NIE on a mobile platform that generally requires small models showcases NIE's usefulness in practical real-world applications.

Integrating Transductive and Inductive Embeddings Improves Link Prediction Accuracy

Chitrank Gupta
Yash Jain
Abir De
Soumen Chakrabarti

In recent years, inductive graph embedding models, viz., graph neural networks (GNNs) have become increasingly accurate at link prediction (LP) in online social networks. The performance of such networks depends strongly on the input node features, which vary across networks and applications. Selecting appropriate node features remains application-dependent and generally an open question. Moreover, owing to privacy and ethical issues, use of personalized node features is often restricted. In fact, many publicly available data from online social network do not contain any node features (e.g., demography). In this work, we provide a comprehensive experimental analysis which shows that harnessing a transductive technique (e.g., Node2Vec) for obtaining initial node representations, after which an inductive node embedding technique takes over, leads to substantial improvements in link prediction accuracy. We demonstrate that, for a wide variety of GNN variants, node representation vectors obtained from Node2Vec serve as high quality input features to GNNs, thereby improving LP performance.

CauSeR: Causal Session-based Recommendations for Handling Popularity Bias

Priyanka Gupta
Ankit Sharma
Pankaj Malhotra
Lovekesh Vig
Gautam Shroff

Recommender Systems (RS) tend to recommend more popular items instead of the relevant long-tail items. Mitigating such popularity bias is crucial to ensure that less popular but relevant items are part of the recommendation list shown to the user. In this work, we study the phenomenon of popularity bias in session-based RS (SRS) obtained via deep learning (DL) models. We observe that DL models trained on the historical user-item interactions in session logs (having long-tailed item-click distributions) tend to amplify popularity bias. To understand the source of this bias amplification, we consider potential sources of bias at two distinct stages in the modeling process: i. the data-generation stage (user-item interactions captured as session logs), ii. the DL model training stage. We highlight that the popularity of an item has a causal effect on i. user-item interactions via conformity bias, as well as ii. item ranking from DL models via biased training process due to class (target item) imbalance. While most existing approaches in literature address only one of these effects, we consider a comprehensive causal inference framework that identifies and mitigates the effects at both stages. Through extensive empirical evaluation on simulated and real-world datasets, we show that our approach improves upon several strong baselines from literature for popularity bias and long-tailed classification. Ablation studies show the advantage of our comprehensive causal analysis to identify and handle bias in data generation as well as training stages.

Region Invariant Normalizing Flows for Mobility Transfer

Vinayak Gupta
Srikanta Bedathur

There exists a high variability in mobility data volumes across different regions, which deteriorates the performance of spatial recommender systems that rely on region-specific data. In this paper, we propose a novel transfer learning framework called Reformd, for continuous-time location prediction for regions with sparse checkin data. Specifically, we model user-specific checkin-sequences in a region using a marked temporal point process (MTPP) with normalizing flows to learn the inter-checkin time and geo-distributions. Later, we transfer the model parameters of spatial and temporal flows trained on a data-rich origin region for the next check-in and time prediction in a target region with scarce checkin data. We capture the evolving region-specific checkin dynamics for MTPP and spatial-temporal flows by maximizing the joint likelihood of next checkin with three channels (1) checkin-category prediction, (2) checkin-time prediction, and (3) travel distance prediction. Extensive experiments on different user mobility datasets across the U.S. and Japan show that our model significantly outperforms state-of-the-art methods for modeling continuous-time sequences. Moreover, we also show that Reformd can be easily adapted for product recommendations i.e., sequences without any spatial component.

Counterfactual Generative Smoothing for Imbalanced Natural Language Classification

Hojae Han
Seungtaek Choi
Myeongho Jeong
Jin-woo Park
Seung-won Hwang

Classification datasets are often biased in observations, leaving onlya few observations for minority classes. Our key contribution is de-tecting and reducing Under-represented (U-) and Over-represented(O-) artifacts from dataset imbalance, by proposing a Counterfac-tual Generative Smoothing approach on both feature-space anddata-space, namely CGS_f and CGS_d. Our technical contribution issmoothing majority and minority observations, by sampling a ma-jority seed and transferring to minority. Our proposed approachesnot only outperform state-of-the-arts in both synthetic and real-lifedatasets, they effectively reduce both artifact types.

GLocal-K: Global and Local Kernels for Recommender Systems

Soyeon Caren Han
Taejun Lim
Siqu Long
Bernd Burgstaller
Josiah Poon

Recommender systems typically operate on high-dimensional sparse user-item matrices. Matrix completion is a very challenging task to predict one's interest based on millions of other users having each seen a small subset of thousands of items. We propose a G lobal-Local Kernel-based matrix completion framework, named GLocal-K, that aims to generalise and represent a high-dimensional sparse user-item matrix entry into a low dimensional space with a small number of important features. Our GLocal-K can be divided into two major stages. First, we pre-train an auto encoder with the local kernelised weight matrix, which transforms the data from one space into the feature space by using a 2d-RBF kernel. Then, the pre-trained auto encoder is fine-tuned with the rating matrix, produced by a convolution-based global kernel, which captures the characteristics of each item. We apply our GLocal-K model under the extreme low-resource setting, which includes only a user-item rating matrix, with no side information. Our model outperforms the state-of-the-art baselines on three collaborative filtering benchmarks: ML-100K, ML-1M, and Douban.

Unsupervised Cross-system Log Anomaly Detection via Domain Adaptation

Xiao Han
Shuhan Yuan

Log anomaly detection, which focuses on detecting anomalous log records, becomes an active research problem because of its importance in developing stable and sustainable systems. Currently, many unsupervised log anomaly detection approaches are developed to address the challenge of limited anomalous samples. However, collecting enough data to train an unsupervised model is not practical when the system is newly deployed online. To tackle this challenge, we propose a transferable log anomaly detection (LogTAD) framework that leverages the adversarial domain adaptation technique to make log data from different systems have a similar distribution so that the detection model is able to detect anomalies from multiple systems. Experimental results show that LogTAD can achieve high accuracy on cross-system anomaly detection by using a small number of logs from the new system.

Automatic Error Correction Using the Wikipedia Page Revision History

Md Kamrul Hasan
Mohammad Mahdavi

Error correction is one of the most crucial and time-consuming steps of data preprocessing. State-of-the-art error correction systems leverage various signals, such as predefined data constraints or user-provided correction examples, to fix erroneous values in a semi-supervised manner. While these approaches reduce human involvement to a few labeled tuples, they still need supervision to fix data errors. In this paper, we propose a novel error correction approach to automatically fix data errors of dirty datasets. Our approach pretrains a set of error corrector models on correction examples extracted from the Wikipedia page revision history. It then fine-tunes these models on the dirty dataset at hand without any required user labels. Finally, our approach aggregates the fine-tuned error corrector models to find the actual correction of each data error. As our experiments show, our approach automatically fixes a large portion of data errors of various dirty datasets with high precision.

Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

Arezoo Hatefi
Xuan-Son Vu
Monowar Bhuyan
Frank Drewes

We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples.

To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer

Centerpoint Query Authentication

Magnus Haxen
Morten Raeburn
Peyman Afshani
Panagiotis Karras

The rise of online map services drives data owners to outsource spatial data to potentially untrusted database providers. Query results are provided along with verification objects that allow confirming their authenticity. Such authentication schemes have been proposed for several spatial and geometric queries, as well as for median queries in one dimension. However, to date, no authentication mechanism exists for centerpoint queries, which return a point lying in the middle of other points in multidimensional space. In this paper, we propose an authentication scheme for centerpoint queries, grounded on the algorithm for centerpoint queries on a finite planar set of points and authenticated aggregation R-trees and accompanying authenticated aggregation queries. We also provide methods for finding the centerpoint of a subset of the complete data set, and implement a range-based method. Our solution has a worst-case time-complexity of O(n log n) and space-complexity of O(n). Our experimental study confirms these claims.

Locker: Locally Constrained Self-Attentive Sequential Recommendation

Zhankui He
Handong Zhao
Zhe Lin
Zhaowen Wang
Ajinkya Kale
Julian Mcauley

Recently, self-attentive models have shown promise in sequential recommendation, given their potential to capture user long-term preferences and short-term dynamics simultaneously. Despite their success, we argue that self-attention modules, as a non-local operator, often fail to capture short-term user dynamics accurately due to a lack of inductive local bias. To examine our hypothesis, we conduct an analytical experiment on controlled 'short-term' scenarios. We observe a significant performance gap between self-attentive recommenders with and without local constraints, which implies that short-term user dynamics are not sufficiently learned by existing self-attentive recommenders. Motivated by this observation, we propose a simple framework, (Locker) for self-attentive recommenders in a plug-and-play fashion. By combining the proposed local encoders with existing global attention heads, Locker enhances short-term user dynamics modeling, while retaining the long-term semantics captured by standard self-attentive encoders. We investigate Locker with five different local methods, outperforming state-of-the-art self-attentive recom- menders on three datasets by 17.19% (NDCG@20) on average.

LTPHM: Long-term Traffic Prediction based on Hybrid Model

Chuyin Huang
Weiyang Kong
Genan Dai
Yubao Liu

Traffic prediction is a classical spaial-temporal prediction problem with many real-world applications.In general, existing traffic prediction methods capture the complex spatial-temporal features by iterative mechanism or non-iterative mechanism. However, the iterative mechanism often causes the prediction error accumulation and the non-iterative mechanism is hard to capture the dynamic propagation information. The shortcomings of both mechanisms lead to their poor performance in long-term prediction tasks. Target at the shortcomings of existing methods, in this paper, we propose a novel deep learning framework called Long-term Traffic Prediction based on Hybrid Model (LTPHM), which is designed to simulate the dynamic transmission process of traffic information on the road network by connecting the prediction values of the current step with the next step. Each spatial-temporal module uses graph convolution (GCN) with an adaptive matrix to capture spatial dependence. Besides, we use Gated Dilated Convolution Networks (GDCN) and Gated Linear Unit convolution networks (GLU) to capture temporal dependence. Since LTPHM integrates the advantages of both iterative and non-iterative prediction, it can efficiently capture the complex and dynamic spatial-temporal features, especially the long-range temporal sequences. Experiments with three real-world traffic datasets demonstrate the effectiveness of our proposed model.

Entity-aware Collaborative Relation Network with Knowledge Graph for Recommendation

Ruoran Huang
Chuanqi Han
Li Cui

As the source of side information, knowledge graph (KG) plays a critical role in recommender systems. Recently, graph neural networks (GNN) have shown their technical advancements at boosting recommendation performances. Existing GNN-based models mainly focus on aggregation technique and regularization allocation, ignoring the rich entity-aware information hidden in the relation network of KG. In this paper, we explore the relational semantics at the granularity of entities behind a user-item interaction by leveraging knowledge graph, named Entity-aware Collaborative Relation Network (ECRN). Technically, we construct multiple meta-paths from users to entities based on the user-item interaction and item-entity connectivity to obtain user representation, while designing a relation-aware self-attention mechanism to aggregate collaborative signals of items. Empirical results on three benchmarks show that ECRN significantly outperforms state-of-the-art baselines.

When is Nearest Neighbor Meaningful: Sequential Data

Aaron Hui
Byron J. Gao

Nearest neighbor search is a fundamental problem in data management and analytics with vast applications. However, a seminal paper by Beyer et al demonstrated the curse of dimensionality, where under certain conditions with high dimensionality, all the data points tend to be equidistant and thus the nearest neighbor problem is meaningless. This influential work has spawned a series of investigations of the concentration phenomenon, which, for the most part, are limited to the vector space. In this paper, we extend this investigation to sequence data, which do not have an inherent notion of dimensions or attributes. For similarity measures we consider the commonly used edit distance and longest common subsequence. We perform theoretical analysis and prove conditions under which sequences will concentrate. We also conduct experiments on synthetic data to verify the theoretical findings. Rather than the curse of dimensionality as previous studies demonstrate, we attempt to demonstrate the curse of length for sequential data.

The Effect of News Article Quality on Ad Consumption

Kojiro Iizuka
Yoshifumi Seki
Makoto P. Kato

Practical news feed platforms generate a hybrid list of news articles and advertising items (e.g., products, services, or information) and many platforms optimize the position of news articles and advertisements independently. However, they should be arranged with careful consideration of each other, as we show in this study, since user behaviors toward advertisements are significantly affected by the news articles. This paper investigates the effect of news articles on users' ad consumption and shows the dependency between news and ad effectiveness. We conducted a service log analysis and showed that sessions with high-quality news article exposure had more ad consumption than those with low-quality news article exposure. Based on this result, we hypothesized that exposure to high-quality articles will lead to a high ad consumption rate. Thus, we conducted million-scale A/B testing to investigate the effect of high-quality articles on ad consumption, in which we prioritized high-quality articles in the ranking for the treatment group. The A/B test showed that the treatment group's ad consumption, such as the number of clicks, conversions, and sales, increased significantly while the number of article clicks decreased. We also found that users who prefer a social or economic topic had more ad consumption by stratified analysis. These insights regarding news articles and advertisements will help optimize news and ad effectiveness in rankings considering their mutual influence.

CANCN-BERT: A Joint Pre-Trained Language Model for Classical and Modern Chinese

Zijing Ji
Xin Wang
Yuxin Shen
Guozheng Rao

Pre-Trained Models (PTMs) can learn general knowledge representations and perform well in Natural Language Processing (NLP) tasks. For the Chinese language, several PTMs are developed, however, most existing methods concentrate on modern Chinese and are not ideal for processing classical Chinese due to the differences in grammars and semantics between these two forms. In this paper, in order to process two forms of Chinese uniformly, we propose a novel Classical and Modern Chinese pre-trained language model (CANCN-BERT), with the advantage of effectively processing both classical and modern Chinese, which is an extension of BERT. Form-aware pre-training tasks are elaborately designed to train our model, so as to better adapt it to classical and modern Chinese corpus. Moreover, we define a joint model, proposing dedicated optimization methods through different paths with the control of the switch mechanism. Our model merges characteristics of both classical and modern Chinese, which can adequately and efficiently enhance the representation ability for both forms. Extensive experiments show that our model outperforms baseline models on processing classical and modern Chinese and achieves significant and consistent improvements. Also, the results of ablation experiments demonstrate the effectiveness of each module.

Robust Adaptive-weighting Multi-view Classification

Bingbing Jiang
Junhao Xiang
Xingyu Wu
Wenda He
Libin Hong
Weiguo Sheng

As data sources become ever more numerous, classification for multi-view data represented by heterogeneous features has been involved in many data mining applications. Most existing methods either directly concatenate all views or separately tackle each view, neglecting the correlation and diversity among views. Moreover, they often encounter an extra hyper-parameter that needs to be manually tuned, degenerating the applicability of models. In this paper, we present a robust supervised learning framework for multi-view classification, seeking a better representation and fusion of multiple views. Specifically, our framework discriminates different views with adaptively optimized view-wise weight factors and coalesces them to learn a joint projection subspace compatible across multiple views in an adaptive-weighting manner, thereby avoiding the intractable hyper-parameter. Meanwhile, the consensus and complementary information of original views can be naturally integrated into the learned subspace, in turn enhancing the discrimination of the subspace for subsequent classification. An efficient convergent algorithm is developed to iteratively optimize the formulated framework. Experiments on real datasets demonstrate the effectiveness and superiority of the proposed method.

ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning

Ming Jin
Yixin Liu
Yu Zheng
Lianhua Chi
Yuan-Fang Li
Shirui Pan

Anomaly detection on graphs plays a significant role in various domains, including cybersecurity, e-commerce, and financial fraud detection. However, existing methods on graph anomaly detection usually consider the view in a single scale of graphs, which results in their limited capability to capture the anomalous patterns from different perspectives. Towards this end, we introduce a novel graph anomaly detection framework, namely ANEMONE, to simultaneously identify the anomalies in multiple graph scales. Concretely, ANEMONE first leverages a graph neural network backbone encoder with multi-scale contrastive learning objectives to capture the pattern distribution of graph data by learning the agreements between instances at the patch and context levels concurrently. Then, our method employs a statistical anomaly estimator to evaluate the abnormality of each node according to the degree of agreement from multiple perspectives. Experiments on three benchmark datasets demonstrate the superiority of our method.

RABERT: Relation-Aware BERT for Target-Oriented Opinion Words Extraction

Taegwan Kang
Minwoo Lee
Nakyeong Yang
Kyomin Jung

Targeted Opinion Word Extraction (TOWE) is a subtask of aspect-based sentiment analysis, which aims to identify the correspondingopinion terms for given opinion targets in a review. To solve theTOWE task, recent works mainly focus on learning the target-aware context representation that infuses target information intocontext representation by using various neural networks. However,it has been unclear how to encode the target information to BERT,a powerful pre-trained language model. In this paper, we proposea novel TOWE model, RABERT (Relation-Aware BERT), that canfully utilize BERT to obtain target-aware context representations.To introduce the target information into BERT layers clearly, wedesign a simple but effective encoding method that adds targetmarkers indicating the opinion targets to the sentence. In addi-tion, we find that the neighbor word information is also importantfor extracting the opinion terms. Therefore, RABERT employs thetarget-sentence relation network and the neighbor-aware relationnetwork to consider both the opinion target and the neighbor wordsinformation. Our experimental results on four benchmark datasetsshow that RABERT significantly outperforms the other baselinesand achieves state-of-the-art performance. We also demonstrate theeffectiveness of each component of RABERT in further analysis

Question Answering using Web Lists

Anoop R. Katti
Kai Hui
Adria de Gispert
Hagen Fuerstenau

There are many natural questions that are best answered with a list. We address the problem of answering such questions using lists that occur on the Web, i.e. List Question Answering (ListQA). The diverse formats of lists on the Web makes this task challenging. We describe state-of-the-art methods for list extraction and ranking, that also consider the text surrounding the lists as context. Due to the lack of realistic public datasets for ListQA, we present three novel datasets that together are realistic, reproducible and test out-of-domain generalization. We benchmark the above steps on these datasets, with and without context. On the hardest setting (realistic and out-of-domain), we achieve an end-to-end Precision@1 of 51.28% and HITs@5 of 79.38%, effectively demonstrating the difficulty of the task and quantifying the immediate opportunity for improvement. We highlight some future directions through error analysis and release the datasets for further research.

CoSEM: Contextual and Semantic Embedding for App Usage Prediction

Yonchanok Khaokaew
Mohammad Saiedur Rahaman
Ryen W. White
Flora D. Salim

App usage prediction is important for smartphone system optimization to enhance user experience. Existing modeling approaches utilize historical app usage logs along with a wide range of semantic information to predict the app usage; however, they are only effective in certain scenarios and cannot be generalized across different situations. This paper address this problem by developing a model called Contextual and Semantic Embedding model for App Usage Prediction (CoSEM) for app usage prediction that leverages integration of 1) semantic information embedding and 2) contextual information embedding based on historical app usage of individuals. Extensive experiments show that the combination of semantic information and history app usage information enables our model to outperform the baselines on three real-world datasets, achieving an MRR score over 0.55,0.57,0.86 and Hit rate scores of more than 0.71, 0.75, and 0.95, respectively.

Self-supervised Fine-tuning for Efficient Passage Re-ranking

Meoungjun Kim
Youngjoong Ko

Passage retrievers based on neural language models have recently achieved significant performance improvements in ranking tasks. Such ranking models have the advantage of finding the contextual features of queries and documents better than traditional keyword based methods. However, these deep learning-based models are limited by the large amounts of training data required. We propose a new fine-tuning method based on a masked language model (MLM) that is typically used in pre-trained language models. Our model improves the ranking performance using the MLM while efficiently utilizing less training data via data augmentation. The proposed approach applies self-supervised learning to information retrieval without needing additional expensive labeled data. In addition, because masking important terms during the fine-tuning stage can undermine ranking performance, the importance values of each term and sentence in a passage are calculated using the BM25 scheme and applied to the fine-tuning task such that the more important terms are masked less often. Our model is trained with dataset from MS MARCO re-ranking leaderboard and achieves the state-of-the-art MRR@10 performance in the leaderboard except for the ensemble-based method.

Query-driven Segment Selection for Ranking Long Documents

Youngwoo Kim
Razieh Rahimi
Hamed Bonab
James Allan

Transformer-based rankers have shown state-of-the-art performance. However, their self-attention operation is mostly unable to process long sequences. One of the common approaches to train these rankers is to heuristically select some segments of each document, such as the first segment, as training data. However, these segments may not contain the query-related parts of documents. To address this problem, we propose query-driven segment selection from long documents to build training data. The segment selector provides relevant samples with more accurate labels and non-relevant samples which are harder to be predicted. The experimental results show that the basic BERT-based ranker trained with the proposed segment selector significantly outperforms that trained by the heuristically selected segments, and performs equally to the state-of-the-art model with localized self-attention that can process longer input sequences. Our findings open up new direction to design efficient transformer-based rankers.

UQJG: Identifying Transactions that Collaborate to Violate an SQL Assertion

Toon Koppelaars
Xavier Oriol
Ernest Teniente
Sergi Curto
Eduard Pujol

An SQL assertion is a declarative statement about data that must always be satisfied in any database state. Assertions were introduced in the SQL92 standard but no commercial DBMS has implemented them so far. Some approaches have been proposed to incrementally determine whether a transaction violates an SQL assertion, but they assume that transactions are applied in isolation, hence not considering the problem of concurrent transaction executions that collaborate to violate an assertion. This is the main stopper for its commercial implementation. To handle this problem, we have developed a technique for efficiently serializing concurrent transactions that might interact to violate an SQL assertion.

Exploratory Search of GANs with Contextual Bandits

Ivan Kropotov
Alan Medlar
Dorota Glowacka

Interactive image retrieval involves users searching a collection of images to satisfy their subjective information needs. However, even large image collections are finite and therefore may not be able to satisfy users. An alternate approach would be to explore a generative adversarial network (GAN) and model users' search intents directly in terms of the latent space used by the GAN to generate images. In this article, we present a simulation study exploring the performance of Gaussian Process bandits in the context of interactive GAN exploration. We used recent advances in interpretable GAN controls to investigate the scalability of different approaches in terms of image space dimensionality. While we present several experiments with promising results, none of the approaches tested scale sufficiently well to explore the entire GAN image space.

Asterisk-Shaped Features for Tabular Data

Yuki Kurauchi
Yoshiaki Takimoto
Shuhei Yamamoto
Shunichi Seko
Hiroyuki Toda

Data often accumulates in tabular format with many attribute items, and prediction using machine learning adds value to data for business. However, studies on machine learning for tabular data only input attribute values, which reduces accuracy. Therefore, we propose an inference method that inputs attribute values and values from aggregated tabular data that has varying attribute values for each attribute item. In an experiment, we compared our proposed method with AutoGluon-Tabular using AutoML benchmark datasets. Our proposed method achieved the highest accuracy for 21 out of 39 datasets.

Boosting Graph Alignment Algorithms

Alexander Frederiksen Kyster
Simon Daugaard Nielsen
Judith Hermanns
Davide Mottin
Panagiotis Karras

The problem of graph alignment is to find corresponding nodes between a pair of graphs. Past work has treated the problem in a monolithic fashion, with the graph as input and the alignment as output, offering limited opportunities to adapt the algorithm to task requirements or input graph characteristics. Recently, node embedding techniques are utilized for graph alignment. In this paper, we study two state-of-the-art graph alignment algorithms utilizing node representations, CONE-Align and GRASP, and describe them in terms of an overarching modular framework. In a targeted experimental study, we exploit this modularity to develop enhanced algorithm variants that are more effective in the alignment task.

Attention Based Subgraph Classification for Link Prediction by Network Re-weighting

Darong Lai
Zheyi Liu
Junyao Huang
Zhihong Chong
Weiwei Wu
Christine Nardini

Supervised link prediction aims at finding missing links in a network by learning directly from the data suitable criteria for classifying link types into existent or non-existent. Recently, along this line, subgraph-based methods learning a function that maps subgraph patterns to link existence have witnessed great successes. However, these approaches still have drawbacks. First, the construction of the subgraph relies on an arbitrary nodes selection, often ineffective. Second, the inability of such approaches to evaluate adaptively nodes importance reduces flexibility in nodes features aggregation, an important step in subgraph classification. To address these issues, a novel graph-classification based link-prediction model is proposed: Attention and Re-weighting based subgraph Classification for Link prediction (ARCLink). ARCLink first extracts a subgraph around the two nodes whose link should be predicted, by network reweighting, i.e. attributing a weight in the range 0-1 to all links of the original network, and then learns a function to map the subgraph to a continuous vector for classification, thus revealing the nature (non-existence/existence) of the unknown link. For leaning the mapping function, ARCLink generates a vector representation of the extracted subgraph by hierarchically aggregating nodes features according to nodes importance. In contrast to previous studies that either fully ignore or use fixed schemes to compute nodes importance, ARCLink instead learns nodes importance adaptively by employing attention mechanism. Through extensive experiments, ARCLink was validated on a series of real-world networks against state-of-the-art link prediction methods, consistently demonstrating its superior performances

SCOPA: Soft Code-Switching and Pairwise Alignment for Zero-Shot Cross-lingual Transfer

Dohyeon Lee
Jaeseong Lee
Gyewon Lee
Byung-gon Chun
Seung-won Hwang

The recent advent of cross-lingual embeddings, such as multilingual BERT (mBERT), provides a strong baseline for zero-shot cross-lingual transfer. There also exists increasing research attention to reduce the alignment discrepancy of cross-lingual embeddings between source and target languages, via generating code-switched sentences by substituting randomly selected words in the source languages with their counterparts of the target languages. Although these approaches improve the performance, naively code-switched sentences can have inherent limitations. In this paper, we propose SCOPA, a novel technique to improve the performance of zero-shot cross-lingual transfer. Instead of using the embeddings of code-switched sentences directly, SCOPA mixes them softly with the embeddings of original sentences. In addition, SCOPA utilizes an additional pairwise alignment objective, which aligns the vector differences of word pairs instead of word-level embeddings, in order to transfer contextualized information between different languages while preserving language-specific information. Experiments on the PAWS-X and MLDoc dataset show the effectiveness of SCOPA.

CrossAug: A Contrastive Data Augmentation Method for Debiasing Fact Verification Models

Minwoo Lee
Seungpil Won
Juae Kim
Hwanhee Lee
Cheoneum Park
Kyomin Jung

Fact verification datasets are typically constructed using crowdsourcing techniques due to the lack of text sources with veracity labels. However, the crowdsourcing process often produces undesired biases in data that cause models to learn spurious patterns. In this paper, we propose CrossAug, a contrastive data augmentation method for debiasing fact verification models. Specifically, we employ a two-stage augmentation pipeline to generate new claims and evidences from existing samples. The generated samples are then paired cross-wise with the original pair, forming contrastive samples that facilitate the model to rely less on spurious patterns and learn more robust representations. Experimental results show that our method outperforms the previous state-of-the-art debiasing technique by 3.6% on the debiased extension of the FEVER dataset, with a total performance boost of 10.13% from the baseline. Furthermore, we evaluate our approach in data-scarce settings, where models can be more susceptible to biases due to the lack of training data. Experimental results demonstrate that our approach is also effective at debiasing in these low-resource conditions, exceeding the baseline performance on the Symmetric dataset with just 1% of the original data.

Dual Correction Strategy for Ranking Distillation in Top-N Recommender System

Youngjune Lee
Kee-Eung Kim

Knowledge Distillation (KD), which transfers the knowledge of a well-trained large model (teacher) to a small model (student), has become an important area of research for practical deployment of recommender systems. Recently, Relaxed Ranking Distillation (RRD) has shown that distilling the ranking information in the recommendation list significantly improves the performance. However, the method still has limitations in that 1) it does not fully utilize the prediction errors of the student model, which makes the training not fully efficient, and 2) it only distills the user-side ranking information, which provides an insufficient view under the sparse implicit feedback. This paper presents Dual Correction strategy for Distillation (DCD), which transfers the ranking information from the teacher model to the student model in a more efficient manner. Most importantly, DCD uses the discrepancy between the teacher model and the student model predictions to decide which knowledge to be distilled. By doing so, DCD essentially provides the learning guidance tailored to "correcting" what the student model has failed to accurately predict. This process is applied for transferring the ranking information from the user-side as well as the item-side to address sparse implicit user feedback. Our experiments show that the proposed method outperforms the state-of-the-art baselines, and ablation studies validate the effectiveness of each component.

Capsule Graph Neural Networks with EM Routing

Yu Lei
Jing Zhang

To effectively classify graph instances, graph neural networks need to have the capability to capture the part-whole relationship existing in a graph. A capsule is a group of neurons representing complicated properties of entities, which has shown its advantages in traditional convolutional neural networks. This paper proposed novel Capsule Graph Neural Networks that use the EM routing mechanism (CapsGNNEM) to generate high-quality graph embeddings. Experimental results on a number of real-world graph datasets demonstrate that the proposed CapsGNNEM outperforms nine state-of-the-art models in graph classification tasks.

Hubness-aware User Identity Linkage

Chaozhuo Li
Senzhang Wang
Feiran Huang
Jie Xu
Philip Yu

Nowadays, it is common for one natural person to join multiple social networks to enjoy different types of services. User identity linkage (UIL), which aims to link identical identities across different social platforms, has attracted increasing research interests recently. Most existing approaches focus on the sophisticated architecture engineering of the linkage model but ignore the challenge of hubness in the post-processing nearest neighbor search phase. Hubness appears as some identities in a social platform, called hubs, being extra-ordinary close to the identities in the other platform, which will degrade the alignment performance. Different from existing heuristic methods, in this paper we propose a hubness-aware user identity linkage model HAUIL to smoothly learn hubless linkage signals. A carefully-designed objective function is presented to explicitly mitigate the hubness information from the pre-learned linkage guidance. HAUIL can be easily adapted to most existing UIL models. Empirically, we evaluate HAUIL over multiple publicly available datasets, and the experimental results demonstrate its superiority.

Graph-based Semi-Supervised Learning by Strengthening Local Label Consistency

Chen Li
Xutan Peng
Hao Peng
Jia Wu
Lihong Wang
Philip S. Yu
Jianxin Li
Lichao Sun

Graph-based algorithms have drawn much attention thanks to their impressive success in semi-supervised setups. For better model performance, previous studies have learned to transform the topology of the input graph. However, these works only focus on optimizing the original nodes and edges, leaving the direction of augmenting existing data insufficiently explored. In this paper, we propose a novel heuristic pre-processing technique, namelyLocal Label Consistency Strengthening (ŁLCS), which automatically expands new nodes and edges to refine the label consistency within a dense subgraph. Our framework can effectively benefit downstream models by substantially enlarging the original training set with high-quality generated labeled data and refining the original graph topology. To justify the generality and practicality of ŁLCS, we couple it with the popular graph convolution network and graph attention network to perform extensive evaluations on three standard datasets. In all setups tested, our method boosts the average accuracy by a large margin of 4.7% and consistently outperforms the state-of-the-art.

AdaptiveGCN: Efficient GCN Through Adaptively Sparsifying Graphs

Dongyue Li
Tao Yang
Lun Du
Zhezhi He
Li Jiang

Graph Convolutional Networks (GCNs) have become the prevailing approach to efficiently learn representations from graph-structured data. Current GCN models adopt a neighborhood aggregation mechanism based on two primary operations, aggregation and combination. The workload of these two processes is determined by the input graph structure, making the graph input the bottleneck of processing GCN. Meanwhile, a large amount of task-irrelevant information in the graphs would hurt the model generalization performance. This brings the opportunity of studying how to remove the redundancy in the graphs. In this paper, we aim to accelerate GCN models by removing the task-irrelevant edges in the graph. We present AdaptiveGCN, an efficient and supervised graph sparsification framework. AdaptiveGCN adopts an edge predictor module to get edge selection strategies by learning the downstream task feedback signals for each GCN layer separately and adaptively in the training stage, then only inference with the selected edges in the test stage to speed up the GCN computation. The experimental results indicate that AdaptiveGCN could yield 43% (on CPU) and 39% (on GPU) GCN model speed-up averagely with comparable model performance on public graph learning benchmarks.

Multi-subspace Implicit Alignment for Cross-modal Retrieval on Cooking Recipes and Food Images

Lin Li
Ming Li
Zichen Zan
Qing Xie
Jianquan Liu

Cross-modal retrieval technology can help people quickly achieve mutual information between cooking recipes and food images. Both the embeddings of the image and the recipe consist of multiple representation subspaces. We argue that multiple aspects in the recipe are related to multiple regions in the food image. It is challenging to improve the cross-modal retrieval quality by making full use of the implicit connection between multiple subspaces of recipes and images. In this paper, we propose a multi-subspace implicit alignment cross-modal retrieval framework of recipes and images. Our framework learns multi-subspace information about cooking recipes and food images with multi-head attention networks; the implicit alignment at the subspace level promotes narrowing the semantic gap between recipe embeddings and food image embeddings; triple loss and adversarial loss are combined to help our framework for cross-modal learning. The experimental results show that our framework significantly outperforms to state-of-the-art methods in terms of MedR and R@K on Recipe 1M.

OTCMR: Bridging Heterogeneity Gap with Optimal Transport for Cross-modal Retrieval

Mingyang Li
Shao-Lun Huang
Lin Zhang

Cross-modal retrieval is a classic task in the multimedia community, which aims to search for semantically similar results from different modalities. The core of cross-modal retrieval is to learn the most correlated features in a common feature space for the multi-modal data so that the similarity can be directly measured. In this paper, we propose a novel model using optimal transport for bridging the heterogeneity gap in cross-modal retrieval tasks. Specifically, we calculate the optimal transport plans between feature distributions of different modalities and then minimize the transport cost by optimizing the feature embedding functions. In this way, the feature distributions of multi-modal data can be well aligned in the common feature space. In addition, our model combines the complementary losses in different levels: 1) semantic level, 2) distributional level, and 3) pairwise level for improving cross-modal retrieval performance. In extensive experiments, our method outperforms many other cross-modal retrieval methods, which proves the efficacy of using optimal transport in cross-modal retrieval tasks.

Span-Level Emotion Cause Analysis by BERT-based Graph Attention Network

Xiangju Li
Wei Gao
Shi Feng
Daling Wang
Shafiq Joty

We study the task of span-level emotion cause analysis (SECA), which is focused on identifying the specific emotion cause span(s) triggering a certain emotion in the text. Compared to the popular clause-level emotion cause analysis (CECA), it is a finer-grained emotion cause analysis (ECA) task. In this paper, we design a BERT-based graph attention network for emotion cause span(s) identification. The proposed model takes advantage of the structure of BERT to capture the relationship information between emotion and text, and utilizes graph attention network to model the structure information of the text. Our SECA method can be easily used for extracting clause-level emotion causes for CECA as well. Experimental results show that the proposed method consistently outperforms the state-of-the-art ECA methods on benchmark emotion cause dataset.

Span-level Emotion Cause Analysis with Neural Sequence Tagging

Xiangju Li
Wei Gao
Shi Feng
Daling Wang
Shafiq Joty

This paper addresses the task of span-level emotion cause analysis (SECA). It is a finer-grained emotion cause analysis (ECA) task, which aims to identify the specific emotion cause span(s) behind certain emotions in text. In this paper, we formalize SECA as a sequence tagging task for which several variants of neural network-based sequence tagging models to extract specific emotion cause span(s) in the given context. These models combine different types of encoding and decoding approaches. Furthermore, to make our models more "emotionally sensitive'', we utilize the multi-head attention mechanism to enhance the representation of context. Experimental evaluations conducted on two benchmark datasets demonstrate the effectiveness of the proposed models.

Vandalism Detection in OpenStreetMap via User Embeddings

Yinxiao Li
Jennings Anderson
Yiqi Niu

OpenStreetMap (OSM) is a free and openly-editable database of geographic information. Over the years, OSM has evolved into the world's largest open knowledge base of geospatial data, and protecting OSM from the risk of vandalized and falsified information has become paramount to ensuring its continued success. However, despite the increasing usage of OSM and a wide interest in vandalism detection on open knowledge bases such as Wikipedia and Wikidata, OSM has not attracted as much attention from the research community, partially due to a lack of publicly available vandalism corpus. In this paper, we report on the construction of the first OSM vandalism corpus, and release it publicly. We describe a user embedding approach to create OSM user embeddings and add embedding features to a machine learning model to improve vandalism detection in OSM. We validate the model against our vandalism corpus, and observe solid improvements in key metrics. The validated model is deployed into production for vandalism detection on Daylight Map.

Graph Representation Learning via Adversarial Variational Bayes

Yunhe Li
Yaochen Hu
Yingxue Zhang

Methods that learn representations of nodes in a graph play an important role in network analysis. Most of the existing methods of graph representation learning have focused on embedding each node in a graph as a single vector in a low-dimensional continuous space. However, these methods have a crucial limitation: the lack of modeling the uncertainty about the representation. In this work, inspired by Adversarial Variational Bayes (AVB) [22], we propose GraphAVB, a probabilistic generative model to learn node representations that preserve connectivity patterns and capture the uncertainties in the graph. Unlike Graph2Gauss [3] deep which embeds each node as a Gaussian distribution, we represent each node as an implicit distribution parameterized by a neural network in the latent space, which is more flexible and expressive to capture the complex uncertainties in real-world graph-structured datasets. To perform the designed variational inference algorithm with neural samplers, we introduce an auxiliary discriminative network that is used to infer the log probability ratio terms in the objective function and allows us to cast maximizing the objective function as a two-player game. Experimental results on multiple real-world graph datasets demonstrate the effectiveness of our proposed method GraphAVB, outperforming many competitive baselines on the task of link prediction. The superior performances of our proposed method GraphAVB also demonstrate that the downstream tasks can benefit from the captured uncertainty.

Enhancing Aspect-Based Sentiment Analysis with Supervised Contrastive Learning

Bin Liang
Wangda Luo
Xiang Li
Lin Gui
Min Yang
Xiaoqi Yu
Ruifeng Xu

Most existing aspect-based sentiment analysis (ABSA) research efforts are devoted to extracting the aspect-dependent sentiment features from the sentence towards the given aspect. However, it is observed that about 60% of the testing aspects in commonly used public datasets are unknown to the training set. That is, some sentiment features carry the same polarity regardless of the aspects they are associated with (aspect-invariant sentiment), which props up the high accuracy of existing ABSA models when inevitably inferring sentiment polarities for those unknown testing aspects. Therefore, in this paper, we revisit ABSA from a novel perspective by deploying a novel supervised contrastive learning framework to leverage the correlation and difference among different sentiment polarities and between different sentiment patterns (aspect-invariant/-dependent). This allows improving the sentiment prediction for (unknown) testing aspects in the light of distinguishing the roles of valuable sentiment features. Experimental results on 5 benchmark datasets show that our proposed approach substantially outperforms state-of-the-art baselines in ABSA. We further extend existing neural network-based ABSA models with our proposed framework and achieve improved performance.

Multivariate and Propagation Graph Attention Network for Spatial-Temporal Prediction with Outdoor Cellular Traffic

Chung-Yi Lin
Hung-Ting Su
Shen-Lung Tung
Winston H. Hsu

Spatial-temporal prediction is a critical problem for intelligent transportation, which is helpful for tasks such as traffic control and accident prevention. Previous studies rely on large-scale traffic data collected from sensors. However, it is unlikely to deploy sensors in all regions due to the device and maintenance costs. This paper addresses the problem via outdoor cellular traffic distilled from over two billion records per day in a telecom company, because outdoor cellular traffic induced by user mobility is highly related to transportation traffic. We study road intersections in urban and aim to predict future outdoor cellular traffic of all intersections given historic outdoor cellular traffic. Furthermore, we propose a new model for multivariate spatial-temporal prediction, mainly consisting of two extending graph attention networks (GAT). First GAT is used to explore correlations among multivariate cellular traffic. Another GAT leverages the attention mechanism into graph propagation to increase the efficiency of capturing spatial dependency. Experiments show that the proposed model significantly outperforms the state-of-the-art methods on our dataset.

GGP: A Graph-based Grouping Planner for Explicit Control of Long Text Generation

Xuming Lin
Shaobo Cui
Zhongzhou Zhao
Wei Zhou
Ji Zhang
Haiqing Chen

Existing data-driven methods can well handle short text generation. However, when applied to the long-text generation scenarios such as story generation or advertising text generation in the commercial scenario, these methods may generate illogical and uncontrollable texts. To address these aforementioned issues, we propose a graph-based grouping planner~(GGP) following the idea of first-plan-then-generate. Specifically, given a collection of key phrases, GGP firstly encodes these phrases into a instance-level sequential representation and a corpus-level graph-based representation separately. With these two synergic representations, we then regroup these phrases into a fine-grained plan, based on which we generate the final long text. We conduct our experiments on three long text generation datasets and the experimental results reveal that GGP significantly outperforms baselines, which proves that GGP can control the long text generation with knowing how to say and in what order.

State-Aware Meta-Evaluation of Evaluation Metrics in Interactive Information Retrieval

Jiqun Liu
Ran Yu

In interactive IR (IIR), users often seek to achieve different goals (e.g. exploring a new topic, finding a specific known item) at different search iterations and thus may evaluate system performances differently. Without state-aware approach, it would be extremely difficult to simulate and achieve real-time adaptive search evaluation and recommendation. To address this gap, our work identifies users' task states from interactive search sessions and meta-evaluates a series of online and offline evaluation metrics under varying states based on a user study dataset consisting of 1548 unique query segments from 450 search sessions. Our results indicate that: 1) users' individual task states can be identified and predicted from search behaviors and implicit feedback; 2) the effectiveness of mainstream evaluation measures (measured based upon their respective correlations with user satisfaction) vary significantly across task states. This study demonstrates the implicit heterogeneity in user-oriented IR evaluation and connects studies on complex search tasks with evaluation techniques. It also informs future research on the design of state-specific, adaptive user models and evaluation metrics.

Deep Active Learning for Text Classification with Diverse Interpretations

Qiang Liu
Yanqiao Zhu
Zhaocheng Liu
Yufeng Zhang
Shu Wu

Recently, Deep Neural Networks (DNNs) have made remarkable progress for text classification, which, however, still require a large number of labeled data. To train high-performing models with the minimal annotation cost, active learning is proposed to select and label the most informative samples, yet it is still challenging to measure informativeness of samples used in DNNs. In this paper, inspired by piece-wise linear interpretability of DNNs, we propose a novel Active Learning with DivErse iNterpretations (ALDEN) approach. With local interpretations in DNNs, ALDEN identifies linearly separable regions of samples. Then, it selects samples according to their diversity of local interpretations and queries their labels. To tackle the text classification problem, we choose the word with the most diverse interpretations to represent the whole sentence. Extensive experiments demonstrate that ALDEN consistently outperforms several state-of-the-art deep active learning methods.

Fine-Grained Element Identification in Complaint Text of Internet Fraud

Tong Liu
Siyuan Wang
Jingchao Fu
Lei Chen
Zhongyu Wei
Yaqi Liu
Heng Ye
Liaosa Xu
Weiqiang Wang
Xuanjing Huang

Existing system dealing with online complaint provides a final decision without explanations. We propose to analyse the complaint text of internet fraud in a fine-grained manner. Considering the complaint text includes multiple clauses with various functions, we propose to identify the role of each clause and classify them into different types of fraud element. We construct a large labeled dataset originated from a real finance service platform. We build an element identification model on top of BERT and propose additional two modules to utilize the context of complaint text for better element label classification, namely, global context encoder and label refiner. Experimental results show the effectiveness of our model.

Age Inference Using A Hierarchical Attention Neural Network

Yaguang Liu
Lisa Singh

While demographic attributes, such as age, gender, and location, have been extensively studied, most previous studies usually combine different sources of data, such as the user's biography, pictures, posts, and the user's network to obtain reasonable inference accuracies. However, it is not always practical to collect all those different forms of data. Therefore, in this paper, we consider methods for inferring age that only use Twitter posts (tweet text and emojis). We propose a hierarchical attention neural model that integrates independent linguistic knowledge gained from text and emojis when making a prediction. This hierarchical model is able to capture the intra-post relationship between these different post components, as well as the inter-post relationships of a user's posts. Our empirical evaluation using a data set generated from Wikidata demonstrates that our model achieves better performance than the state-of-the-art models, and still performs well when the number of posts per user is reduced in the training data set.

Learning Representations of Inactive Users: A Cross Domain Approach with Graph Neural Networks

Ziqi Liu
Yue Shen
Xiaocheng Cheng
Qiang Li
Jianping Wei
Zhiqiang Zhang
Dong Wang
Xiaodong Zeng
Jinjie Gu
Jun Zhou

Understanding inactive users is the key to user growth and engagement for many Internet companies. However, learning inactive users' representations and their preferences is still challenging because the features available are missing and the positive responses or labels are insufficient. In this paper, we propose a cross domain learning approach to exclusively recommend customized items to inactive users by leveraging the knowledge of active users. Particularly, we represent users, no matter active or inactive users, by their friends' browsing behaviors based on a graph neural network (GNN) layer atop of a heterogeneous graph defined on social networks (user-user friendships) and browsing behaviors (user-page clicks). We jointly optimize the learning tasks of active users in source domain and inactive users in target domain based on the domain invariant features extracted from the embedding of our GNN layer, where the domain invariant features that are learned to benefit both tasks on active/inactive users, and are indiscriminate with respect to the shift between the domains. Extensive experiments show that our approach can well capture the preference of inactive users using both public data and real-world data at Alipay.

FedSkel: Efficient Federated Learning on Heterogeneous Systems with Skeleton Gradients Update

Junyu Luo
Jianlei Yang
Xucheng Ye
Xin Guo
Weisheng Zhao

Federated learning aims to protect users' privacy while performing data analysis from different participants. However, it is challenging to guarantee the training efficiency on heterogeneous systems due to the various computational capabilities and communication bottlenecks. In this work, we propose FedSkel to enable computation-efficient and communication-efficient federated learning on edge devices by only updating the model's essential parts, named skeleton networks. FedSkel is evaluated on real edge devices with imbalanced datasets. Experimental results show that it could achieve up to 5.52x speedups for CONV layers' back-propagation, 1.82x speedups for the whole training process, and reduce 64.8% communication cost, with negligible accuracy loss.

Operation Diagnosis on Procedure Graph: The Task and Dataset

Ruipu Luo
Qi Zhu
Qin Chen
Siyuan Wang
Zhongyu Wei
Weijian Sun
Shuang Tang

Users usually consult the manufacturers or the internet when they encounter operation questions with an electronics product. In this paper, we explore to represent an operation question as a procedure graph and formulate the problem of operation diagnosis as two sub-tasks, namely error node detection, and correction, on top of the graph. We construct the first benchmark for this task and propose a transformer-based model to integrate external knowledge and context information to enhance the performance. Experimental results show the effectiveness of our proposed model.

Review-Aware Neural Recommendation with Cross-Modality Mutual Attention

Songyin Luo
Xiangkui Lu
Jun Wu
Jianbo Yuan

Two-tower neural networks are popularly used in review-aware recommender systems, in which two encoders are separately employed to learn representations for users and items from reviews. However, such an architecture isolates the information exchange between two encoders, resulting in suboptimal recommendation accuracy. To this end, we propose a novel two-tower style Neural Recommendation with Cross-modality Mutual Attention (NRCMA), which bridges user encoder and item encoder crossing reviews and ratings, in order to select informative words and reviews to learn better representation for users and items. Extensive experiments on three benchmark datasets demonstrate that the cross-modality mutual attention is beneficial to two-tower neural networks, and NRCMA consistently outperforms state-of-the-art review-aware item recommendation techniques.

Leveraging Domain Information to Classify Financial Documents via Unsupervised Graph Momentum Contrast

Xueni Luo
Dawei Cheng
Haorui Ma
Junhao Wang
Mengzhen Fan
Yifeng Luo

Financial documents often contain rich domain information, such as named entities, which could be used to indicate the documents' classification categories. Existing classification methods either ignore such contained financial domain information, achieving less optimal performances, or train document representations in supervised ways, with expensive data labeling costs. In this paper, we propose to leverage domain information to improve classification performance for financial documents, via a graph representation learning model, namely G-MoCo, based on unsupervised graph momentum contrast. With G-MoCo, we could extract latent features from massive unlabeled raw data, and then further use the learned representations for document classification. Compared with the state-of-the-art baselines, representations learned by our method could improve performances by significant margins on a financial document dataset and three non-financial public graph datasets.

Smoothing with Fake Label

Ziyang Luo
Yadong Xi
Xiaoxi Mao

Label Smoothing is a widely used technique in many areas. It can prevent the network from being over-confident. However, it hypotheses that the prior distribution of all classes is uniform. Here, we decide to abandon this hypothesis and propose a new smoothing method, called Smoothing with Fake Label. It shares a part of the prediction probability to a new fake class. Our experiment results show that the method can increase the performance of the models on most tasks and outperform the Label Smoothing on text classification and cross-lingual transfer tasks.

Learning Sparse Binary Code for Maximum Inner Product Search

Changyi Ma
Fangchen Yu
Yueyao Yu
Wenye Li

Maximum inner product search (MIPS), combined with the hashing method, has become a standard solution to similarity search problems. It often achieves an order of magnitude speedup over nearest neighbor search (NNS) under similar settings. Motivated by the work and achievements along this line, in this paper, we developed a sparse binary hashing method for MIPS to preserve the pairwise similarities with the support of two asymmetric hash functions. We proposed a simple and efficient algorithm that learns two hash functions for the query database and the search database respectively. We conducted experiments to evaluate the proposed method, relying on image retrieval tasks on four benchmark datasets. The empirical results clearly demonstrated the algorithm's promising potential on practical applications in terms of search accuracy and scalability.

Temporal Network Embedding via Tensor Factorization

Jing Ma
Qiuchen Zhang
Jian Lou
Li Xiong
Joyce C. Ho

Representation learning on static graph-structured data has shown a significant impact on many real-world applications. However, less attention has been paid to the evolving nature of temporal networks, in which the edges are often changing over time. The embeddings of such temporal networks should encode both graph-structured information and the temporally evolving pattern. Existing approaches in learning temporally evolving network representations fail to capture the temporal interdependence. In this paper, we propose Toffee, a novel approach for temporal network representation learning based on tensor decomposition. Our method exploits the tensor-tensor product operator to encode the cross-time information, so that the periodic changes in the evolving networks can be captured. Experimental results demonstrate that Toffee outperforms existing methods on multiple real-world temporal networks in generating effective embeddings for the link prediction tasks.

On Approximate Nearest Neighbour Selection for Multi-Stage Dense Retrieval

Craig Macdonald
Nicola Tonellotto

Dense retrieval, which describes the use of contextualised language models such as BERT to identify documents from a collection by leveraging approximate nearest neighbour (ANN) techniques, has been increasing in popularity. Two families of approaches have emerged, depending on whether documents and queries are represented by single or multiple embeddings. ColBERT, the exemplar of the latter, uses an ANN index and approximate scores to identify a set of candidate documents for each query embedding, which are then re-ranked using accurate document representations. In this manner, a large number of documents can be retrieved for each query, hindering the efficiency of the approach. In this work, we investigate the use of ANN scores for ranking the candidate documents, in order to decrease the number of candidate documents being fully scored. Experiments conducted on the MSMARCO passage ranking corpus demonstrate that, by cutting of the candidate set by using the approximate scores to only 200 documents, we can still obtain an effective ranking without statistically significant differences in effectiveness, and resulting in a 2x speedup in efficiency.

GDFM: Gene Vectors Embodied Deep Attentional Factorization Machines for Interaction prediction

Sameen Mansha
Tayyab Khalid
Faisal Kamiran
Masroor Hussain
Syed Fawad Hussain
Hongzhi Yin

Gene Network Graphs (GNGs) are comprised of biomedical data. Deriving structural information from these graphs remains a prime area of research in the domain of biomedical and health informatics. In this paper, we propose Gene Vectors Embodied Deep Attentional Factorization Machines (GDFMs) for the gene to gene interaction prediction. We first initialize GDFM with vector embeddings learned from gene locality configuration and an expression equivalence criterion that preserves their innate similar traits. GDFM uses an attention-based mechanism that manipulates different positions, to learn the representation of sequence, before calculating the pairwise factorized interactions. We further use hidden layers, batch normalization, and dropout to stabilize the performance of our deep structured architecture. An extensive comparison with several state-of-the-art approaches, using Ecoli and Yeast datasets for gene-gene interaction prediction shows the significance of our proposed framework.

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

Haitao Mao
Xu Chen
Qiang Fu
Lun Du
Shi Han
Dongmei Zhang

Initialization plays a critical role in the training of deep neural networks (DNN). Existing initialization strategies mainly focus on stabilizing the training process to mitigate gradient vanish/explosion problems. However, these initialization methods are lacking in consideration about how to enhance generalization ability. The Information Bottleneck (IB) theory is a well-known understanding framework to provide an explanation about the generalization of DNN. Guided by the insights provided by IB theory, we design two criteria for better initializing DNN. And we further design a neuron campaign initialization algorithm to efficiently select a good initialization for a neural network on a given dataset. The experiments on MNIST dataset show that our method can lead to a better generalization performance with faster convergence.

On Skipping Behaviour Types in Music Streaming Sessions

Francesco Meggetto
Crawford Revie
John Levine
Yashar Moshfeghi

The ability to skip songs is a core feature in modern online streaming services. Its introduction has led to a new music listening paradigm and has changed the way users interact with the underlying services. Thus, understanding their skipping activity during listening sessions has acquired considerable importance. This is because such implicit feedback signal can be considered a measure of users' satisfaction (dissatisfaction or lack of interest), affecting their engagement with the platforms. Prior work has mainly focused on analysing the skipping activity at an individual song level. In this work, we investigate different behaviours during entire listening sessions with regards to the users' session-based skipping activity. To this end, we propose a data transformation and clustering-based approach to identify and categorise skipping types. Experimental results on the real-world music streaming dataset (Spotify) indicate four main types of session skipping behaviour. A subsequent analysis of short, medium, and long listening sessions demonstrate that these session skipping types are consistent across sessions of varying length. Furthermore, we discuss their distributional differences under various listening context information, i.e. day types (i.e. weekday and weekend), times of the day, and playlist types.

Multi-objective Few-shot Learning for Fair Classification

Ishani Mondal
Procheta Sen
Debasis Ganguly

In this paper, we propose a general framework for mitigating the disparities of the predicted classes with respect to secondary attributes within the data (e.g., race, gender etc.). Our proposed method involves learning a multi-objective function that in addition to learning the primary objective of predicting the primary class labels from the data, also employs a clustering-based heuristic to minimize the disparities of the class label distribution with respect to the cluster memberships, with the assumption that each cluster should ideally map to a distinct combination of attribute values. Experiments demonstrate effective mitigation of cognitive biases on a benchmark dataset without the use of annotations of secondary attribute values (the zero-shot case) or with the use of a small number of attribute value annotations (the few-shot case).

MDFEND: Multi-domain Fake News Detection

Qiong Nan
Juan Cao
Yongchun Zhu
Yanyan Wang
Jintao Li

Fake news spread widely on social media in various domains, which lead to real-world threats in many aspects like politics, disasters, and finance. Most existing approaches focus on single-domain fake news detection (SFND), which leads to unsatisfying performance when these methods are applied to multi-domain fake news detection. As an emerging field, multi-domain fake news detection (MFND) is increasingly attracting attention. However, data distributions, such as word frequency and propagation patterns, vary from domain to domain, namely domain shift. Facing the challenge of serious domain shift, existing fake news detection techniques perform poorly for multi-domain scenarios. Therefore, it is demanding to design a specialized model for MFND. In this paper, we first design a benchmark of fake news dataset for MFDN with domain label annotated, namely Weibo21, which consists of 4,488 fake news and 4,640 real news from 9 different domains. We further propose an effective Multi-domain Fake News Detection Model (MDFEND) by utilizing domain gate to aggregate multiple representations extracted by a mixture of experts. The experiments show that MDFEND can significantly improve the performance of multi-domain fake news detection. Our dataset and code are available at https://github.com/kennqiang/MDFEND-Weibo21.

Discovery of Temporal Graph Functional Dependencies

Levin Noronha
Fei Chiang

Temporal Graph Functional Dependencies (TGFDs) are a class of data quality rules imposing topological, attribute dependency constraints over a period of time. To make TGFDs usable in practice, we study the TGFD discovery problem, and show the satisfiability, implication, and validation problems for k-bounded TGFDs are in PTIME. We introduce the TGFDMiner algorithm, which discovers minimal, frequent TGFDs. Our evaluation shows the efficiency and effectiveness of TGFDMiner, and the utility of TGFDs.

Lightweight Visual Question Answering using Scene Graphs

Sai Vidyaranya Nuthalapati
Ramraj Chandradevan
Eleonora Giunchiglia
Bowen Li
Maxime Kayser
Thomas Lukasiewicz
Carl Yang

Visual question answering (VQA) is a challenging problem in machine perception, which requires a deep joint understanding of both visual and textual data. Recent research has advanced the automatic generation of high-quality scene graphs from images, while powerful yet elegant models like graph neural networks (GNNs) have shown great power in reasoning over graph-structured data. In this work, we propose to bridge the gap between scene graph generation and VQA by leveraging GNNs. In particular, we design a new model called Conditional Enhanced Graph ATtention network (CE-GAT) to encode pairs of visual and semantic scene graphs with both node and edge features, which is seamlessly integrated with a textual question encoder to generate answers through question-graph conditioning. Moreover, to alleviate the training difficulties of CE-GAT towards VQA, we enforce more useful inductive biases in the scene graphs through novel question-guided graph enriching and pruning. Finally, we evaluate the framework on one of the largest available VQA datasets (namely, GQA) with ground-truth scene graphs, achieving the accuracy of 77.87%, compared with the state of the art (namely, the neural state machine (NSM)), which gives 63.17%. Notably, by leveraging existing scene graphs, our framework is much lighter compared with end-to-end VQA methods (e.g., about 95.3% less parameters than a typical NSM).

Training Neural Networks with Random Noise Images for Adversarial Robustness

Ji-Young Park
Lin Liu
Jiuyong Li
Jixue Liu

Despite their high accuracy, deep neural networks (DNNs) are vulnerable to adversarial examples. Currently, adversarial training is the mainstream defense approach against adversarial examples. However, given the unknown nature of adversarial attacks in real life, this approach has fundamental limitations in practical use, as it is impossible to obtain sufficient adversarial examples for the training. In this paper, we propose RanTrain, a simple training approach which employs a background class with random noise images to augment the original DNN model and training data, without requiring any adversarial examples. Experiments have shown that RanTrain works effectively with different datasets and various DNN structures, and it significantly increases the robustness of DNNs to adversarial examples.

Evaluating Fairness in Argument Retrieval

Sachin Pathiyan Cherumanal
Damiano Spina
Falk Scholer
W. Bruce Croft

Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argument retrieval systems is typically evaluated based on topical relevance and argument quality, without taking into account the often differing number of documents shown for the argument stances (PRO or CON). Therefore, systems may retrieve relevant passages, but with a biased exposure of arguments. In this work, we analyze a range of non-stochastic fairness-aware ranking and diversity metrics to evaluate the extent to which argument stances are fairly exposed in argument retrieval systems.

Using the official runs of the argument retrieval task Ttouché at CLEF 2020, as well as synthetic data to control the amount and order of argument stances in the rankings, we show that systems with the best effectiveness in terms of topical relevance are not necessarily the most fair or the most diverse in terms of argument stance. The relationships we found between (un)fairness and diversity metrics shed light on how to evaluate group fairness -- in addition to topical relevance -- in argument retrieval settings.

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification

Luca Piras
Ludovico Boratto
Guilherme Ramos

Prediction bias is a well-known problem in classification algorithms, which tend to be skewed towards more represented classes. This phenomenon is even more remarkable in multi-label scenarios, where the number of underrepresented classes is usually larger. In light of this, we hereby present the Prediction Bias Coefficient (PBC), a novel measure that aims to assess the bias induced by label imbalance in multi-label classification. The approach leverages Spearman's rank correlation coefficient between the label frequencies and the F-scores obtained for each label individually. After describing the theoretical properties of the proposed indicator, we illustrate its behaviour on a classification task performed with state-of-the-art methods on two real-world datasets, and we compare it experimentally with other metrics described in the literature.

Trilateral Spatiotemporal Attention Network for User Behavior Modeling in Location-based Search

Yi Qi
Ke Hu
Bo Zhang
Jia Cheng
Jun Lei

In location-based search, user's click behavior is naturally bonded with trilateral spatiotemporal information, i.e., the locations of historical user requests, the locations of corresponding clicked items and the occurring time of historical clicks. Appropriate modeling of the trilateral spatiotemporal user click behavior sequence is key to the success of any location-based search service. Though abundant and helpful, existing user behavior modeling methods are insufficient for modeling the rich patterns in trilateral spatiotemporal sequence in that they ignore the interplay among request's geo- graphic information, item's geographic information and the click time. In this work, we study the user behavior modeling problem in location-based search systematically. We propose TRISAN, short for Trilateral Spatiotemporal Attention Network, a novel attention- based neural model that incorporates temporal relatedness into both the modeling of item's geographic closeness and the modeling of request's geographic closeness through a fusion mechanism. In addition, we propose to model the geographic closeness both by distance and by semantic similarity. Extensive experiments demonstrate that the proposed method outperforms existing methods by a large margin and every part of our modeling strategy contributes to its final success.

Reputation Equity in Ranking Systems

Guilherme Ramos
Ludovico Boratto
Mirko Marras

The impact of ranking systems on humans is an aspect that is getting a lot of attention. In this paper, we consider a class of algorithms, known as reputation-based ranking systems, which rank the items based on a reputation score automatically computed for each user. Recent literature introduced the concept of reputation independence, which considers a sensitive attribute of the users (such as gender or age) and makes the reputation scores independent from that attribute. Here, we show that if we consider a different sensitive attribute w.r.t. a user to introduce independence, reputation scores are still biased. To overcome this issue, we propose an approach to attain equity in the reputation scores computation, independently of any sensitive attribute that characterizes the users.

Student Can Also be a Good Teacher: Extracting Knowledge from Vision-and-Language Model for Cross-Modal Retrieval

Jun Rao
Tao Qian
Shuhan Qi
Yulin Wu
Qing Liao
Xuan Wang

Astounding results from transformer models with Vision-and Language Pretraining (VLP) on joint vision-and-language downstream tasks have intrigued the multi-modal community. On the one hand, these models are usually so huge that make us more difficult to fine-tune and serve real-time online applications. On the other hand, the compression of the original transformer block will ignore the difference in information between modalities, which leads to the sharp decline of retrieval accuracy.

In this work, we present a very light and effective cross-modal retrieval model compression method. With this method, by adopting a novel random replacement strategy and knowledge distillation, our module can learn the knowledge of the teacher with nearly the half number of parameters reduction. Furthermore, our compression method achieves nearly 130x acceleration with acceptable accuracy. To overcome the sharp decline in retrieval tasks because of compression, we introduce the co-attention interaction module to reflect the different information and interaction information. Experiments show that a multi-modal co-attention block is more suitable for cross-modal retrieval tasks rather than the source transformer encoder block.

Accelerating Variant Calling on Human Genomes Using a Commodity Cluster

Praveen Rao
Arun Zachariah
Deepthi Rao
Peter Tonellato
Wesley Warren
Eduardo Simoes

Variant calling is a fundamental task that is performed to identify variants in an individual's genome compared to a reference human genome. This task can enable better understanding of an individual's risk to diseases and eventually lead to new innovations in precision medicine and drug discovery. However, variant calling on a large number of human genome sequences requires significant computing and storage resources. While access to such resources is possible today (e.g., through cloud computing), reducing the cost of analyzing genomes has become a major challenge. Motivated by these reasons, we address the problem of accelerating the variant calling pipeline on a large number of human genome sequences using a commodity cluster. We propose a novel approach that synergistically combines data and task parallelism for different stages of the variant calling pipeline across different sequences with minimal synchronization. Our approach employs futures to enable asynchronous computations in order to improve the overall cluster utilization and thereby, accelerate the variant calling pipeline. On a 16-node cluster, we observed that our approach was 3X-4.7X faster than the state-of-the-art Big Data Genomics software.

A Conditional Cascade Model for Relational Triple Extraction

Feiliang Ren
Longhui Zhang
Shujuan Yin
Xiaofeng Zhao
Shilei Liu
Bochao Li

Tagging based methods are one of the mainstream methods in relational triple extraction. However, most of them suffer from the class imbalance issue greatly. Here we propose a novel tagging based model that addresses this issue from following two aspects. First, at the model level, we propose a three-step extraction framework that can reduce the total number of samples greatly, which implicitly decreases the severity of the mentioned issue. Second, at the intra-model level, we propose a confidence threshold based cross entropy loss that can directly neglect some samples in the major classes. We evaluate the proposed model on NYT and WebNLG. Extensive experiments show that it can address the mentioned issue effectively and achieves state-of-the-art results on both datasets. The source code of our model is available at: https://github.com/neukg/ConCasRTE.

Knowledge-Aware Neural Networks for Medical Forum Question Classification

Soumyadeep Roy
Sudip Chakraborty
Aishik Mandal
Gunjan Balde
Prakhar Sharma
Anandhavelu Natarajan
Megha Khosla
Shamik Sural
Niloy Ganguly

Online medical forums have become a predominant platform for answering health-related information needs of consumers. However, with a significant rise in the number of queries and the limited availability of experts, it is necessary to automatically classify medical queries based on a consumer's intention, so that these questions may be directed to the right set of medical experts. Here, we develop a novel medical knowledge-aware BERT-based model (MedBERT) that explicitly gives more weightage to medical concept-bearing words, and utilize domain-specific side information obtained from a popular medical knowledge base. We also contribute a multi-label dataset for the Medical Forum Question Classification (MFQC) task. MedBERT achieves state-of-the-art performance on two benchmark datasets and performs very well in low resource settings.

Will Sorafenib Help?: Treatment-aware Reranking in Precision Medicine Search

Maciej Rybinski
Sarvnaz Karimi

High-quality evidence from the biomedical literature is crucial for decision making of oncologists who treat cancer patients. Search for evidence on a specific treatment for a patient is the challenge set by the precision medicine track of TREC in 2020. To address this challenge, we propose a two-step method to incorporate treatment into the query formulation and ranking. Training of such ranking function uses a zero-shot setup to incorporate the novel focus on treatments which did not exist in any of the previous TREC tracks. Our treatment-aware neural reranking approach, FAT, achieves state-of-the-art effectiveness for TREC Precision Medicine 2020. Our analysis indicates that the BERT-based rerankers automatically learn to score documents through identifying concepts relevant to precision medicine, similar to hand-crafted heuristics successful in the earlier studies.

DeepGroup: Group Recommendation with Implicit Feedback

Sarina Sajjadi Ghaemmaghami
Amirali Salehi-Abari

We focus on making recommendations for a new group of users whose preferences are unknown, but we are given the decisions of other groups. By formulating this problem as group recommendation from group implicit feedback, we focus on two of its practical instances: group decision prediction and reverse social choice. Given a set of groups and their observed decisions, group decision prediction intends to predict the decision of a new group of users, whereas reverse social choice aims to infer the preferences of those users involved in observed group decisions. These two problems are of interest to not only group recommendation, but also to personal privacy when the users intend to conceal their personal preferences but have participated in group decisions. To tackle these two problems, we propose and study DeepGroup---a deep learning approach for group recommendation with group implicit data. We empirically assess the predictive power of DeepGroup on various real-world datasets and group decision rules. Our extensive experiments not only demonstrate the efficacy of DeepGroup but also shed light on the privacy-leakage concerns of some decision-making processes.

Locate Who You Are: Matching Geo-location to Text for User Identity Linkage

Jiangli Shao
Yongqing Wang
Hao Gao
Huawei Shen
Yangyang Li
Xueqi Cheng

Nowadays, users are encouraged to activate across multiple online social networks simultaneously. User identity linkage, which aims to reveal the correspondence among different accounts across networks, has been regarded as a fundamental problem for user profiling, marketing, cybersecurity, and recommendation. Existing methods mainly address the prediction problem by utilizing profile, content, or structural features of users in symmetric ways. However, encouraged by online services, information from different social platforms may also be asymmetric, such as geo-locations and texts. It leads to an emerged challenge in aligning users with asymmetric information across networks. Instead of similarity evaluation applied in previous works, we formalize correlation between geo-locations and texts and propose a novel user identity linkage framework for matching users across networks. Moreover, our model can alleviate the label scarcity problem by introducing external text-location pairs. Experimental results on real-world datasets show that our approach outperforms existing methods and achieves state-of-the-art results.

Determining Subjective Bias in Text through Linguistically Informed Transformer based Multi-Task Network

Manjira Sinha
Tirthankar Dasgupta

The predominance of biased articles and its consumption by the readers is becoming a considerable issue. Researchers across domains have made efforts to mitigate biases in language. However, due to the subjective nature of the problem, it is not trivial to detect bias embedded in a text. In this paper, we propose a deep linguistically informed multi-task transformer-based model to automatically detect bias in written text. The model is fine-tuned with a domain-specific corpus and further trained for learning the objectives. We evaluate the performance of the proposed model with respect to baseline systems across multiple datasets. We observed that augmenting linguistic features along with contextual embedding improves the performance of the neural network model to automatically detect bias in text.

Predicting Success of a Persuasion through Joint Modeling of Utterance Categorization

Manjira Sinha
Tirthankar Dasgupta

Persuasive conversation leverages conversational strategies by the persuader to change the attitude or behavior of a persuadee towards achieving a specific goal. It involves understanding the linguistic and cognitive principles underlying the organization of strategic disclosures and appeals employed in human persuasion. One of the main challenges of such conversation is the inability of a persuader to detect the outcome of their conversation during the interaction. Such prior knowledge can help a persuader to change their conversation strategy and pre-empt possible conversation failures. In this paper, we propose a technique that analyses conversations to predict whether the persuader is going to successfully persuade the persuadee. We propose a joint model of latent utterance categorization to predict the success or the failure of a persuasive conversation. This latent categorization allows the model to identify high-level conversational contexts that influence patterns of language in a persuasive conversation. We evaluate the performance of our model on an openly available dataset. Our preliminary results demonstrate that the proposed model outperforms competitive baselines.

Self-Supervised Learning based on Sentiment Analysis with Word Weight Calculation

Dongcheol Son
Youngjoong Ko

Learning domain information for a downstream task is important to improve the performance of sentiment analysis. However, the labeling task to obtain a sufficient amount of training data in an application domain tends to be highly time-consuming and tedious. To solve this problem, we propose a novel method to effectively learn domain information and improve sentiment analysis performance with a small amount of training data. We use the masked language model (MLM), which is a self-supervised learning model, to calculate word weights and improve a downstream fine-tuning task for sentiment analysis. In particular, the MLM with the calculated word weights is executed simultaneously with the fine-tuning task. The results show that the proposed model achieves better performances than previous models in four different datasets for sentiment analysis.

Location-Aware Named Entity Disambiguation

Maithrreye Srinivasan
Davood Rafiei

Named Entity Disambiguation (NED) and linking has been traditionally evaluated on natural language content that is both well-written and contextually rich. However, many NED approaches display poor performance on text sources that are short and noisy. In this paper, we study the problem of entity disambiguation for short text and propose a location-aware NED framework that resolves ambiguities in text with little other contextual cues. We show that the spatial dimension is crucial in disambiguating named entities and that the location inference is less utilized in many NED systems. Our proposed framework integrates (in an unsupervised manner) spatial signals that are readily available for many sources that emit short text (e.g., micro-blogs, search queries, and news streams). Our evaluation on news headlines and tweets reveals that a simple spatial embedding improves the accuracy of competitive baseline NED approaches from the literature by 8% for the news headlines and by 4% on tweets.

AGCNT: Adaptive Graph Convolutional Network for Transformer-based Long Sequence Time-Series Forecasting

Hongyang Su
Xiaolong Wang
Yang Qin

Long sequence time-series forecasting(LSTF) plays an important role in a variety of real-world application scenarios, such as electricity forecasting, weather forecasting, and traffic flow forecasting. It has previously been observed that transformer-based models have achieved outstanding results on LSTF tasks, which can reduce the complexity of the model and maintain stable prediction accuracy. Nevertheless, there are still some issues that limit the performance of transformer-based models for LSTF tasks: (i) the potential correlation between sequences is not considered; (ii) the inherent structure of encoder-decoder is difficult to expand after being optimized from the aspect of complexity. In order to solve these two problems, we propose a transformer-based model, named AGCNT, which is efficient and can capture the correlation between the sequences in the multivariate LSTF task without causing the memory bottleneck. Specifically, AGCNT has several characteristics: (i) a probsparse adaptive graph self-attention, which maps long sequences into a low-dimensional dense graph structure with an adaptive graph generation and captures the relationships between sequences with an adaptive graph convolution; (ii) the stacked encoder with distilling probsparse graph self-attention integrates the graph attention mechanism and retains the dominant attention of the cascade layer, which preserves the correlation between sparse queries from long sequences; (iii) the stacked decoder with generative inference generates all prediction values in one forward operation, which can improve the inference speed of long-term predictions. Experimental results on 4 large-scale datasets demonstrate the AGCNT outperforms state-of-the-art baselines.

Talking Face Generation Based on Information Bottleneck and Complementary Representations

Jie Tang
Yiling Wu
Minglei Li
Zhu Wang

Audio-driven talking face generation is an active research direction in the field of virtual reality. The main challenge is that the generated lip shape of the speaker is out of sync with the input audio. To address this challenge, we propose a novel solution to synthesize lip-synchronized, high-quality, realistic video given input audio. We first decompose the target person's video frames into 3D face model parameters, and the information bottleneck is inserted into the audio-to-expression network to learn the mapping between audio features and expression parameters. Then, we replace the expression parameters in the target video frame with the extracted expression parameters from audio and re-render the face. Finally, we add high-level audio embedding extracted from the raw audio and lip landmarks embedding in the neural rendering network. The 3D face shapes, 2D landmarks, and audio embedding provide complementary information for the neural rendering network which guarantees the generation of lip-synchronized high-quality video portraits from the synthesized rendered faces. Experimental results show that compared with other talking face generation methods, our method is the best concerning lip synchronization with high video definition.

HiCoVA: Hierarchical Conditional Variational Autoencoder for Keyphrase Generation

Tokala Yaswanth Sri Sai Santosh
Nikhil Reddy Varimalla
Anoop Vallabhajosyula
Debarshi Kumar Sanyal
Partha Pratim Das

The task of keyphrase generation, unlike extraction, aims to generate the phrases which succinctly capture the key information of the source text, that are even absent in the document (i.e., do not match any contiguous sub-sequence of source text). Despite the significant progress achieved by sequence-to-sequence (seq2seq) models in modelling such high entropy task, they are limited by their deterministic modelling capability which limits the generation of a diverse set of keyphrases. To address the above limitation, in this paper, we propose to incorporate Conditional Variational Autoencoder (CoVA) into seq2seq models for its ability to represent a set of keyphrases as a probabilistic distribution which improves the diversity of the generated keyphrases. We model the probabilistic distribution using a hierarchical latent structure where a global latent variable tries to model the diversity among the keyphrases and local latent variables control the generation of each keyphrase to make them coherent. Experimental results on four benchmark datasets of research papers demonstrate the effectiveness of our proposed approach in achieving a large improvement in diversity along with modest gains in quality with respect to previous models.

Query Embedding Pruning for Dense Retrieval

Nicola Tonellotto
Craig Macdonald

Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in the first place. However, when using dense retrieval approaches that use multiple embedded representations for each query, a large number of documents can be retrieved for each query, hindering the efficiency of the method. Hence, this work is the first to consider efficiency improvements in the context of a dense retrieval approach (namely ColBERT), by pruning query term embeddings that are estimated not to be useful for retrieving relevant documents. Our proposed query embeddings pruning reduces the cost of the dense retrieval operation, as well as reducing the number of documents that are retrieved and hence require to be fully scored. Experiments conducted on the MSMARCO passage ranking corpus demonstrate that, when reducing the number of query embeddings used from 32 to 3 based on the collection frequency of the corresponding tokens, query embedding pruning results in no statistically significant differences in effectiveness, while reducing the number of documents retrieved by 70%. In terms of mean response time for the end-to-end to end system, this results in a 2.65x speedup.

Vector-Quantized Autoencoder With Copula for Collaborative Filtering

Guanyu Wang
Ting Zhong
Xovee Xu
Kunpeng Zhang
Fan Zhou
Yong Wang

In theory, the variational auto-encoder (VAE) is not suitable for recommendation tasks, although it has been successfully utilized for collaborative filtering (CF) models. In this paper, we propose a Gaussian Copula-Vector Quantized Autoencoder (GC-VQAE) model that differs prior arts in two key ways: (1) Gaussian Copula helps to model the dependencies among latent variables which are used to construct a more complex distribution compared with the mean-field theory; and (2) by incorporating a vector quantisation method into encoders our model can learn discrete representations which are consistent with the observed data rather than directly sampling from the simple Gaussian distributions. Our approach is able to circumvent the "posterior collapse'' issue and break the prior constraint to improve the flexibility of latent vector encoding and learning ability. Empirically, GC-VQAE can significantly improve the recommendation performance compared to existing state-of-the-art methods.

SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary

Jiaan Wang
Zhixu Li
Qiang Yang
Jianfeng Qu
Zhigang Chen
Qingsheng Liu
Guoping Hu

Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. Extensive experiments show that our model outperforms the state-of-the-art baseline.

Template-guided Clarifying Question Generation for Web Search Clarification

Jian Wang
Wenjie Li

Clarification has attracted much attention because of its many potential applications especially in Web search. Since search queries are very short, the underlying user intents are often ambiguous. This makes it challenging for search engines to return the appropriate results that pertain to the users' actual information needs. To address this issue, asking clarifying questions has been recognized as a critical technique. Although previous studies have analyzed the importance of asking to clarify, generating clarifying questions for Web search remains under-explored. In this paper, we tackle this problem in a template-guided manner. Our objective is jointly learning to select question templates and fill question slots, using Transformer-based networks. We conduct experiments on MIMICS, a collection of datasets containing real Web search queries sampled from Bing's search logs. Our method is demonstrated to achieve significant improvements over various competitive baselines.

Embedding Node Structural Role Identity Using Stress Majorization

Lili Wang
Chenghan Huang
Weicheng Ma
Ying Lu
Soroush Vosoughi

Nodes in networks may have one or more functions that determine their role in the system. As opposed to local proximity, which captures the local context of nodes, the role identity captures the functional "role" that nodes play in a network, such as being the center of a group, or the bridge between two groups. This means that nodes far apart in a network can have similar structural role identities. Several recent works have explored methods for embedding the roles of nodes in networks. However, these methods all rely on either approximating or indirect modeling of structural equivalence. In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space. Our method is also flexible, in that it does not rely on specific structural similarity definitions. We evaluated our method on the tasks of node classification, clustering, and visualization, using three real-world and five synthetic networks. Our experiments show that our framework achieves superior results than existing methods in learning node role representations.

Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization

Lili Wang
Chenghan Huang
Weicheng Ma
Xinyuan Cao
Soroush Vosoughi

Recent years have seen a rise in the development of representational learning methods for graph data. Most of these methods, however, focus on node-level representation learning at various scales (e.g., microscopic, mesoscopic, and macroscopic node embedding). In comparison, methods for representation learning on whole graphs are currently relatively sparse. In this paper, we propose a novel unsupervised whole graph embedding method. Our method uses spectral graph wavelets to capture topological similarities on each k-hop sub-graph between nodes and uses them to learn embeddings for the whole graph. We evaluate our method against 12 well-known baselines on 4 real-world datasets and show that our method achieves the best performance across all experiments, outperforming the current state-of-the-art by a considerable margin.

Fully Hyperbolic Graph Convolution Network for Recommendation

Liping Wang
Fenyu Hu
Shu Wu
Liang Wang

Recently, Graph Convolution Network (GCN) based methods have achieved outstanding performance for recommendation. These methods embed users and items in Euclidean space, and perform graph convolution on user-item interaction graphs. However, real-world datasets usually exhibit tree-like hierarchical structures, which make Euclidean space less effective in capturing user-item relationship. In contrast, hyperbolic space, as a continuous analogue of a tree-graph, provides a promising alternative. In this paper, we propose a fully hyperbolic GCN model for recommendation, where all operations are performed in hyperbolic space. Utilizing the advantage of hyperbolic space, our method is able to embed users/items with less distortion and capture user-item interaction relationship more accurately. Extensive experiments on public benchmark datasets show that our method outperforms both Euclidean and hyperbolic counterparts and requires far lower embedding dimensionality to achieve comparable performance.

Label-informed Graph Structure Learning for Node Classification

Liping Wang
Fenyu Hu
Shu Wu
Liang Wang

Graph Neural Networks (GNNs) have achieved great success among various domains. Nevertheless, most GNN methods are sensitive to the quality of graph structures. To tackle this problem, some studies exploit different graph structure learning strategies to refine the original graph structure. However, these methods only consider feature information while ignoring available label information. In this paper, we propose a novel label-informed graph structure learning framework which incorporates label information explicitly through a class transition matrix. We conduct extensive experiments on seven node classification benchmark datasets and the results show that our method outperforms or matches the state-of-the-art baselines.

BiCMTS: Bidirectional Coupled Multivariate Learning of Irregular Time Series with Missing Values

Qinfen Wang
Siyuan Ren
Yong Xia
Longbing Cao

Multivariate time series (MTS) such as multiple medical measures in intensive care units (ICU) are irregularly acquired and hold missing values. Conducting learning tasks on such irregular MTS with missing values, e.g., predicting the mortality of ICU patients, poses significant challenge to existing MTS forecasting models and recurrent neural networks (RNNs), which capture the temporal dependencies within a time series. This work proposes a bidirectional coupled MTS learning (BiCMTS) method to represent both forward and backward value couplings within a time series by RNNs and between MTS by self-attention networks; the learned bidirectional intra- and inter-time series coupling representations are fused to estimate missing values. We test BiCMTS on both data imputation and mortality prediction for ICU patients, showing a great potential of leveraging the deep and hidden relations captured in RNNs by the BiCMTS-learned intra- and inter-time series value couplings in MTS.

Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

Runchuan Wang
Zhao Zhang
Fuzhen Zhuang
Dehong Gao
Yi Wei
Qing He

Transformer-based language models (e.g. BERT, RoBERT, GPT, etc) have shown remarkable performance in many natural language processing tasks and their multilingual variants make it easier to handle cross-lingual tasks without using machine translation system. In this paper, we apply multilingual BERT in cross-lingual information retrieval (CLIR) task with triplet loss to learn the relevance between queries and documents written in different languages. Moreover, we align the token embeddings from different languages via adversarial networks to help the language model to learn cross-lingual sentence representation. We achieve the state-of-the-art result on the newly published CLIR dataset: CLIRMatrix. Furthermore, we show that the adversarial multilingual BERT can also get the competitive result in the zero-shot setting in some specific languages when we are lack of CLIR training data in a specific language.

Modeling Inter-Claim Interactions for Verifying Multiple Claims

Shuai Wang
Wenji Mao

To inhibit the spread of rumorous information, fact checking aims at retrieving evidence to verify the truthfulness of a given statement. Fact checking methods typically use knowledge graphs (KGs) as external repositories and develop reasoning methods to retrieve evidence from KGs. As real-world statement is often complex and contains multiple claims, multi-claim fact verification is not only necessary but more important for practical applications. However, existing methods only focus on verifying a single claim (i.e. a single-claim statement). Multiple claims imply rich context information and modeling the interrelations between claims can facilitate better verification of a multi-claim statement as a whole. In this paper, we propose a computational method to model inter-claim interactions for multi-claim fact checking. To focus on relevant claims within a statement, our method first extracts topics from the statement and connects the triple claims in the statement to form a claim graph. It then learns a policy-based agent to sequentially select topic-related triples from the claim graph. To fully exploit information from the statement, our method further employs multiple agents and develops a hierarchical attention mechanism to verify multiple claims as a whole. Experimental results on two real-world datasets show the effectiveness of our method for multi-claim fact verification.

Low-dimensional Alignment for Cross-Domain Recommendation

Tianxin Wang
Fuzhen Zhuang
Zhiqiang Zhang
Daixin Wang
Jun Zhou
Qing He

Cold start problem is one of the most challenging and long-standing problems in recommender systems, and cross-domain recommendation (CDR) methods are effective for tackling it. Most cold-start related CDR methods require training a mapping function between high-dimensional embedding space using overlapping user data. However, the overlapping data is scarce in many recommendation tasks, which makes it difficult to train the mapping function. In this paper, we propose a new approach for CDR, which aims to alleviate the training difficulty. The proposed method can be viewed as a special parameterization of the mapping function without hurting expressiveness, which makes use of non-overlapping user data and leads to effective optimization. Extensive experiments on two real-world CDR tasks are performed to evaluate the proposed method. In the case that there are few overlapping data, the proposed method outperforms the existed state-of-the-art method by 14% (relative improvement).

DSKReG: Differentiable Sampling on Knowledge Graph for Recommendation with Relational GNN

Yu Wang
Zhiwei Liu
Ziwei Fan
Lichao Sun
Philip S. Yu

In the information explosion era, recommender systems (RSs) are widely studied and applied to discover user-preferred information. A RS performs poorly when suffering from the cold-start issue, which can be alleviated if incorporating Knowledge Graphs (KGs) as side information. However, most existing works neglect the facts that node degrees in KGs are skewed and massive amount of interactions in KGs are recommendation-irrelevant. To address these problems, in this paper, we propose Differentiable Sampling on Knowledge Graph for Recommendation with Relational GNN (DSKReG) that learns the relevance distribution of connected items from KGs and samples suitable items for recommendation following this distribution. We devise a differentiable sampling strategy, which enables the selection of relevant items to be jointly optimized with the model training procedure. The experimental results demonstrate that our model outperforms state-of-the-art KG-based recommender systems. The code is available online at https://github.com/YuWang-1024/DSKReG.

Graph Structure Aware Contrastive Knowledge Distillation for Incremental Learning in Recommender Systems

Yuening Wang
Yingxue Zhang
Mark Coates

Personalized recommender systems are playing an increasingly important role for online services. Graph Neural Network (GNN) based recommender models have demonstrated a superior capability to model users' interests thanks to rich relational information encoded in graphs. However, with the ever-growing volume of online information and the high computational complexity of training GNNs, it is difficult to perform frequent updates to provide the most up-to-date recommendations. There have been several attempts towards training GNN models in an incremental fashion to enable faster training times and permit more frequent model updates using the latest training data. The main technique is knowledge distillation, which aims to allow model updates while preserving key aspects of the model that were learned from the historical data. In this work, we develop a novel Graph Structure Aware Contrastive Knowledge Distillation for Incremental Learning in recommender systems, which is tailored to focus on the rich relational information in the recommendation context. We combine the contrastive distillation formulation with intermediate layer distillation to inject layer-level supervision. We demonstrate the effectiveness of our proposed distillation framework for GNN based recommendation systems on four commonly used datasets, showing consistent improvement over state-of-the-art alternatives.

Improving Irregularly Sampled Time Series Learning with Time-Aware Dual-Attention Memory-Augmented Networks

Zhen Wang
Yang Zhang
Ai Jiang
Ji Zhang
Zhao Li
Jun Gao
Ke Li
Chenhao Lu
Zujie Ren

Irregularly, asynchronously and sparsely sampled multivariate time series (IASS-MTS) are characterized by sparse non-uniform time intervals between successive observations and different sampling rates amongst series. Those properties pose substantial challenges to mainstream machine learning models for learning complicated relations within and across IASS-MTS. This is because that most of the models assume that the time series in question are even, complete (fixed-dimensional features) and synchronous. To address these challenges, we present a novel time-aware Dual-Attention and Memory-Augmented Network (DAMA-Net). The proposed model can leverage both time irregularity, multi-sampling rates and global temporal patterns information inherent in IASS-MTS so as to learn more effective representations for improving prediction performance. Comprehensive experiments on real datasets show that the DAMA-Net outperforms the state-of-the-art methods in multivariate time series classification task.

AutoHERI: Automated Hierarchical Representation Integration for Post-Click Conversion Rate Estimation

Penghui Wei
Weimin Zhang
Zixuan Xu
Shaoguo Liu
Kuang-chih Lee
Bo Zheng

Post-click conversion rate (CVR) estimation is a crucial task in online advertising and recommendation systems. To address the sample selection bias problem in traditional CVR models trained in click space, recent studies perform entire space multi-task learning based on the probability of events in user behavior funnels like "impression-click-conversion". However, those models learn the feature representation of each task independently, and omit potential inter-task correlations that can help improve the CVR estimation performance. In this paper, we propose AutoHERI, an entire space CVR model with automated hierarchical representation integration, which leverages the interplay across multi-tasks' representation learning. It performs neural architecture search to learn optimal connections between layer-wise representations of different tasks. Besides, AutoHERI achieves better search efficiency with one-shot search algorithm, and thus it can be easily extended to new scenarios that have more complex user behaviors. Both offline and online experimental results on large-scale real-world datasets verify that AutoHERI outperforms previous entire space models significantly.

Evidential Relational-Graph Convolutional Networks for Entity Classification in Knowledge Graphs

Tobias Weller
Heiko Paulheim

Despite the vast amount of information encoded in knowledge graphs, they often remain incomplete. Neural networks, in particular Graph Convolutional Neural Networks, have been shown to be effective predictors to complete information about the class affiliation of entities in knowledge graphs. However, these models remain ignorant to their predictions confidence due to their used point estimate of a softmax output. In this paper, we combine Graph Convolutional Neural Networks with recent developments in the field of Evidential Learning by placing a Dirichlet distribution on the class probabilities to overcome this problem. We use the continuous output of a Graph Convolutional Neural Network as parameters for a Dirichlet distribution. In this way, the predictions of the model are represented as a distribution over possible softmax outputs, rather than a point estimate of a softmax output. The experiments show that a better performance in predicting class affiliations can be achieved compared to recent models. In addition, the experiments show that this approach overcomes the well-known problem of overconfident prediction of deterministic neural networks.

Structural Deep Incomplete Multi-view Clustering Network

Jie Wen
Zhihao Wu
Zheng Zhang
Lunke Fei
Bob Zhang
Yong Xu

In recent years, incomplete multi-view clustering has drawn increasing attention due to the existence of large amounts of unlabeled incomplete data whose views are not fully observed in the practical applications. Although many traditional methods have been extended to address the incomplete learning problem, most of them exploit the shallow models and ignore the geometric structure. To address these issues, we proposed a structural deep incomplete multi-view clustering network. Specifically, the proposed method can simultaneously explore the high-level features and high-order geometric structure information of data with several view-specific graph convolutional encoder networks and can directly obtain the optimal clustering indicator matrix in one stage. Experimental results on several datasets with the comparison of state-of-the-art methods validate the superiority of the proposed method.

SMAD: Scalable Multi-view Ad Retrieval System for E-Commerce Sponsored Search

Shiyang Wen
Yiran Chen
Zhi Yang
Yan Zhang
Di Zhang
Liang Wang
Bo Zheng

Ad retrieval in sponsored search aims to understand user search intentions (user queries) and retrieves a set of ads inferred as being relevant to the queries. Due to the huge amount of search traffic and multiple views of relevance (such as co-clicking, co-bidding or textual similar), it is highly desirable but remain challenging to achieve a large-scale, multi-view matching between queries and ads, particularly in industrial settings. In this paper, we propose a scalable multi-view ad retrieval engine SMAD that we developed and deployed at Taobao, the largest e-commerce platform in China. We construct a multi-relation query-item-ad graph capturing different views of query-ad relevance, which is of large scale and with complex structure. Since in e-commerce platform, the queries and products are organized into a category tree, to deal with the large scale of the graph, we propose a category constrained graph sampling and partition method to enable distributed parallel offline training. To tackle the complex multi-view structure, we propose a multi-view parallel deep neural network (DNN) model to combine the information from different views in a principled way. According to offline experiments and online A/B tests, our framework significantly outperforms baselines in terms of relevance, coverage, and revenue.

Fairness-Aware Unsupervised Feature Selection

Xiaoying Xing
Hongfu Liu
Chen Chen
Jundong Li

Feature selection is a prevalent data preprocessing paradigm for various learning tasks. Due to the expensive cost of acquiring supervision information, unsupervised feature selection sparks great interests recently. However, existing unsupervised feature selection algorithms do not have fairness considerations and suffer from a high risk of amplifying discrimination by selecting features that are over associated with protected attributes such as gender, race, and ethnicity. In this paper, we make an initial investigation of the fairness-aware unsupervised feature selection problem and develop a principled framework, which leverages kernel alignment to find a subset of high-quality features that can best preserve the information in the original feature space while being minimally correlated with protected attributes. Specifically, different from the mainstream in-processing debiasing methods, our proposed framework can be regarded as a model-agnostic debiasing strategy that eliminates biases and discrimination before downstream learning algorithms are involved. Experimental results on real-world datasets demonstrate that our framework achieves a good trade-off between feature utility and promoting feature fairness.

Disentangled Self-Attentive Neural Networks for Click-Through Rate Prediction

Yichen Xu
Yanqiao Zhu
Feng Yu
Qiang Liu
Shu Wu

Click-Through Rate (CTR) prediction, whose aim is to predict the probability of whether a user will click on an item, is an essential task for many online applications. Due to the nature of data sparsity and high dimensionality of CTR prediction, a key to making effective prediction is to model high-order feature interaction. An efficient way to do this is to perform inner product of feature embeddings with self-attentive neural networks. To better model complex feature interaction, in this paper we propose a novel DisentanglEd Self-atTentIve NEtwork (DESTINE) framework for CTR prediction that explicitly decouples the computation of unary feature importance from pairwise interaction. Specifically, the unary term models the general importance of one feature on all other features, whereas the pairwise interaction term contributes to learning the pure impact for each feature pair. We conduct extensive experiments using two real-world benchmark datasets. The results show that DESTINE not only maintains computational efficiency but achieves consistent improvements over state-of-the-art baselines.

DESTINE: Dense Subgraph Detection on Multi-Layered Networks

Zhe Xu
Si Zhang
Yinglong Xia
Liang Xiong
Jiejun Xu
Hanghang Tong

Dense subgraph detection is a fundamental building block for a variety of applications. Most of the existing methods aim to discover dense subgraphs within either a single network or a multi-view network while ignoring the informative node dependencies across multiple layers of networks in a complex system. To date, it largely remains a daunting task to detect dense subgraphs on multi-layered networks. In this paper, we formulate the problem of dense subgraph detection on multi-layered networks based on cross-layer consistency principle. We further propose a novel algorithm DESTINE based on projected gradient descent with the following advantages. First, armed with the cross-layer dependencies, DESTINE is able to detect significantly more accurate and meaningful dense subgraphs at each layer. Second, it scales linearly w.r.t. the number of links in the multi-layered network. Extensive experiments demonstrate the efficacy of the proposed DESTINE algorithm in various cases.

Binary Code based Hash Embedding for Web-scale Applications

Bencheng Yan
Pengjie Wang
Jinquan Liu
Wei Lin
Kuang-Chih Lee
Jian Xu
Bo Zheng

Nowadays, deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. In these models, a standard method is that each categorical feature value is assigned a unique embedding vector which can be learned and optimized. Although this method can well capture the characteristics of the categorical features and promise good performance, it can incur a huge memory cost to store the embedding table, especially for those web-scale applications. Such a huge memory cost significantly holds back the effectiveness and usability of EDRMs. In this paper, we propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance. Experimental evaluation results show that one can still achieve 99% performance even if the embedding table size is reduced 1000× smaller than the original one with our proposed method.

Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer

Bencheng Yan
Pengjie Wang
Kai Zhang
Wei Lin
Kuang-Chih Lee
Jian Xu
Bo Zheng

Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed and uniform embedding size to all feature values from the same feature field. However, such a configuration is not only sub-optimal for embedding learning but also memory costly. Existing methods that attempt to resolve these problems, either rule-based or neural architecture search (NAS)-based, need extensive efforts on the human design or network training. They are also not flexible in embedding size selection or in warm-start-based applications. In this paper, we propose a novel and effective embedding size selection scheme. Specifically, we design an Adaptively-Masked Twins-based Layer (AMTL) behind the standard embedding layer. AMTL generates a mask vector to mask the undesired dimensions for each embedding vector. The mask vector brings flexibility in selecting the dimensions and the proposed layer can be easily added to either untrained or trained DLRMs. Extensive experimental evaluations show that the proposed scheme outperforms competitive baselines on all the benchmark tasks, and is also memory-efficient, saving 60% memory usage without compromising any performance metrics.

Relation-aware Heterogeneous Graph for User Profiling

Qilong Yan
Yufeng Zhang
Qiang Liu
Shu Wu
Liang Wang

User profiling has long been an important problem that investigates user interests in many real applications. Some recent works regard users and their interacted objects as entities of a graph and turn the problem into a node classification task. However, they neglect the difference of distinct interaction types, e.g. user clicks an item v.s. user purchases an item, and thus cannot incorporate such information well. To solve these issues, we propose to leverage the relation-aware heterogeneous graph method for user profiling, which also allows capturing significant meta relations. We adopt the query, key, and value mechanism in a transformer fashion for heterogeneous message passing so that entities can effectively interact with each other. Via such interactions on different relation types, our model can generate representations with rich information for the user profile prediction. We conduct experiments on two real-world e-commerce datasets and observe a significant performance boost of our approach.

Multi-Sentence Argument Linking via An Event-Aware Hierarchical Encoder

Hang Yang
Yubo Chen
Kang Liu
Jun Zhao
Taifeng Wang

Multi-sentence argument linking aims at detecting implicit event arguments across sentences, which is indispensable when textual events span across multiple sentences in a document. Previous studies suffer from the inherent limitations of error propagation and lack the explicit modeling of the local and non-local interactions in a textual event. In this paper, we propose an event-aware hierarchical encoder for multi-sentence argument linking. Specifically, we introduce a hierarchical encoder to explicitly capture the local and global interactions in a textual event. Furthermore, we introduce an auxiliary task to predict the event-relevant context in a manner of multi-task learning, which can implicitly benefit the argument linking model to be aware of the event-relevant context. The empirical results on the widely used argument linking dataset show that our model significantly outperforms the baselines, which demonstrates the effectiveness of our proposed method.

SCI: Subspace Learning Based Counterfactual Inference for Individual Treatment Effect Estimation

Liuyi Yao
Yaliang Li
Sheng Li
Mengdi Huai
Jing Gao
Aidong Zhang

Inferring causal effect from observational data has attracted much attention from various domains. Under the potential outcome framework, the estimation of counterfactuals is crucial for the investigation of causal effect at the individual level. Existing representation learning approaches focus on learning one balanced feature space, which ignores certain information predictive to the outcomes. To fully utilize the predictive information, we propose a Subspace learning based Counterfactual Inference (SCI) method to estimate causal effect at the individual level. Different from existing work, SCI learns both a common subspace, which preserves the information across all the treatment groups, and treatment-specific subspaces, which retain the information associated with each specific treatment. Learning from two kinds of subspaces helps SCI obtain better causal effect estimations than state-of-the-art methods, demonstrated by a series of experiments on synthetic and real-world datasets.

GraphEvolveDroid: Mitigate Model Degradation in the Scenario of Android Ecosystem Evolution

Yonghao Gu
Liangxun Li

Machine learning-based Android malware detection models suffer from model degradation over time due to ecosystem evolution, which means models trained on history data perform poorly on newly arrived data. Existing solutions to handle the above problem focus on sophisticated feature engineering to find stable features, which is labor-intensive. In this paper, we try to mitigate model degradation by substituting the representation paradigm from Euclidean (vector) to non-Euclidean (graph) without changing features and propose a graph-based Android malware detection model called GraphEvolveDroid. Specifically, we first construct a directed evolutionary network with the KNN model, where each node represents an APP and the starting APP node of each edge is the ancestor of the ending APP node. Then we use stacked GCN layers to transmit the information of ancestor nodes to child nodes so that the shift of the distribution of child nodes can be suppressed. Experimental results on a large real dataset spanning three years demonstrate that GraphEvolveDroid could significantly mitigate model degradation because of slowing down the shift of data distribution.

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

HongChien Yu
Chenyan Xiong
Jamie Callan

Dense retrieval systems conduct first-stage retrieval using embedded representations and simple similarity metrics to match a query to documents. Its effectiveness depends on encoded embeddings to capture the semantics of queries and documents, a challenging task due to the shortness and ambiguity of search queries. This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels. It also keeps the document index unchanged to reduce overhead. ANCE-PRF significantly outperforms ANCE and other recent dense retrieval systems on several datasets. Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.

MixBERT for Image-Ad Relevance Scoring in Advertising

Tan Yu
Xiaokang Li
Jianwen Xie
Ruiyang Yin
Qing Xu
Ping Li

For a good advertising effect, images in the ad should be highly relevant with the ad title. The images in an ad are normally selected from the gallery based on their relevance scores with the ad's title. To ensure the selected images are relevant with the title, a reliable text-image matching model is necessary. The state-of-the-art text- image matching model, cross-modal BERT, only understands the visual content in the image, which is sub-optimal when the image description is available. In this work, we present MixBERT, an adimage relevance scoring model. It models the ad-image relevance by matching the ad title with the image description and visual content. MixBERT adopts a two-stream architecture. It adaptively selects the useful information from noisy image description and suppresses the noise impeding effective matching. To effectively describe the details in visual content of the image, a set of local convolutional features is used as the initial representation of the image. Moreover, to enhance the perceptual capability of our model in key entities which are important to advertising, we upgrade masked language modeling in vanilla BERT to masked key entity modeling. Offline and online experiments demonstrate its effectiveness.

Aspect Sentiment Triplet Extraction Using Reinforcement Learning

Samson Yu Bai Jian
Tapas Nayak
Navonil Majumder
Soujanya Poria

Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we present a novel paradigm, ASTE-RL, by regarding the aspect and opinion terms as arguments of the expressed sentiment in a hierarchical reinforcement learning (RL) framework. We first focus on sentiments expressed in a sentence, then identify the target aspect and opinion terms for that sentiment. This takes into account the mutual interactions among the triplet's components while improving exploration and sample efficiency. Furthermore, this hierarchical RL setup enables us to deal with multiple and overlapping triplets. In our experiments, we evaluate our model on existing datasets from laptop and restaurant domains and show that it achieves state-of-the-art performance. The implementation of this work is publicly available at https://github.com/declare-lab/ASTE-RL.

Storing Multi-model Data in RDBMSs based on Reinforcement Learning

Gongsheng Yuan
Jiaheng Lu
Shuxun Zhang
Zhengtong Yan

How to manage various data in a unified way is a significant research topic in the field of databases. To address this problem, researchers have proposed multi-model databases to support multiple data models in a uniform platform with a single unified query language. However, since relational databases are predominant in the current market, it is expensive to replace them with others. Besides, due to the theories and technologies of RDBMSs having been enhanced over decades, it is hard to use few years to develop a multi-model database that can be compared with existing RDBMSs in handling security, query optimization, transaction management, etc. In this paper, we reconsider employing relational databases to store and query multi-model data. Unfortunately, the mismatch between the complexity of multi-model data structure and the simplicity of flat relational tables makes this difficult. Against this challenge, we utilize the reinforcement learning (RL) method to learn a relational schema by interacting with an RDBMS. Instead of using the classic Q-learning algorithm, we propose a variant Q-learning algorithm, called Double Q-tables, to reduce the dimension of the original Q-table and improve learning efficiency. Experimental results show that our approach could learn a relational schema outperforming the existing multi-model storage schema in terms of query time and space consumption.

Multi-objective Privacy-preserving Text Representation Learning

Huixin Zhan
Kun Zhang
Chenyi Hu
Victor Sheng

Private information can either take the form of key phrases that are explicitly contained in the text or be implicit. For example, demographic information about the author of a text can be predicted with above-chance accuracy from linguistic cues in the text itself. Letting alone its explicitness, some of the private information correlates with the output labels and therefore can be learned by a neural network. In such a case, there is a tradeoff between the utility of the representation (measured by the accuracy of the classification network) and its privacy. This problem is inherently a multi-objective problem because these two objectives may conflict, necessitating a trade-off. Thus, we explicitly cast this problem as multi-objective optimization (MOO) with the overall objective of finding a Pareto stationary solution. We, therefore, propose a multiple-gradient descent algorithm (MGDA) that enables the efficient application of the Frank-Wolfe algorithm [10] using the line search. Experimental results on sentiment analysis and part-of-speech (POS) tagging show that MGDA produces higher-performing models than most recent proxy objective approaches, and performs as well as single objective baselines.

Projective Ranking: A Transferable Evasion Attack Method on Graph Neural Networks

He Zhang
Bang Wu
Xiangwen Yang
Chuan Zhou
Shuo Wang
Xingliang Yuan
Shirui Pan

Graph Neural Networks (GNNs) have emerged as a series of effective learning methods for graph-related tasks. However, GNNs are shown vulnerable to adversarial attacks, where attackers can fool GNNs into making wrong predictions on adversarial samples with well-designed perturbations. Specifically, we observe that the current evasion attacks suffer from two limitations: (1) the attack strategy based on the reinforcement learning method might not be transferable when the attack budget changes; (2) the greedy mechanism in the vanilla gradient-based method ignores the long-term benefits of each perturbation operation. In this paper, we propose a new attack method named projective ranking to overcome the above limitations. Our idea is to learn a powerful attack strategy considering the long-term benefits of perturbations, then adjust it as little as possible to generate adversarial samples under different budgets. We further employ mutual information to measure the long-term benefits of each perturbation and rank them accordingly, so the learned attack strategy has better attack performance. Our method dramatically reduces the adaptation cost of learning a new attack strategy by projecting the attack strategy when the attack budget changes. Our preliminary evaluation results in synthesized and real-world datasets demonstrate that our method owns powerful attack performance and effective transferability.

Causally Attentive Collaborative Filtering

Jingsen Zhang
Xu Chen
Wayne Xin Zhao

Attention-based recommender models hold the promise of improving performance by learning to discriminate different user/item feature importances. However, due to the existence of the latent confounders, the correlations captured by attention mechanisms may fail to reflect the true influence of the features on the targets (i.e., spurious correlation). In this paper, we propose to empower attention mechanism by the causal inference, which is a powerful tool to identify the real causal effects. Our model is based on the potential outcome framework, where the item features are regarded as the treatment and the outcome is the predicted user preference. In specific, the causal relation of each feature on the outcome is measured by the individual treatment effect (ITE). In order to distill the causal information into the attention learning process, we minimize the distance between the traditional attention weights and the normalized ITE. With such causal regularization, the learned attention weights can capture the real causal effects, which are expected to correct the feature importances for improving performance. We conduct extensive experiments based on three real-world datasets to demonstrate the effectiveness.

SIFN: A Sentiment-aware Interactive Fusion Network for Review-based Item Recommendation

Kai Zhang
Hao Qian
Qi Liu
Zhiqiang Zhang
Jun Zhou
Jianhui Ma
Enhong Chen

Recent studies in recommender systems have managed to achieve significantly improved performance. However, despite being extensively studied, these methods still suffer from two limitations. First, previous studies either encode the document or extract latent sentiment via neural networks, which are difficult to interpret the sentiment of reviewers intuitively. Second, they neglect the personalized interaction of reviews with user/item, i.e., each review has different contributions when modeling the preference of user/item

To remedy these issues, we propose a Sentiment-aware Interactive Fusion Network (SIFN) for review-based item recommendation. Specifically, we first encode user/item reviews via BERT and propose a light-weighted sentiment learner to extract semantic features of each review. Then, we propose a sentiment prediction task that guides the sentiment learner to extract sentiment-aware features via explicit sentiment labels. Finally, we design a rating prediction task that contains a rating learner with an interactive and fusion module to fuse the identity (i.e., user and item ID) and each review representation so that various interactive features can synergistically influence the final rating score. Experimental results demonstrate that the proposed model is superior to state-of-the-art models.

Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features

Wang Zhang
Xuan Guo
Ting Pan
Chaochao Liu
Pengfei Jiao
Lin Pan
Wenjun Wang

Roles of nodes are defined as classes of equivalent nodes. Nodes that have similar local connective patterns may share the same role. As a complementary concept of community, role can also help to recognize real-world entities. For example, it can denote identity or function in social networks. Role has been studied over the past decades, and learning role-based network representations is crucial to many downstream tasks. The important step for role-based network embedding method is extracting features to measure structural similarity instead of proximity. Although some methods have been developed to capture role features to learn structural similarities between nodes, they all design these features of fixed types, such as the global, local, and higher-order features. These features can only represent a certain type of structure, and it is very difficult to model the complex relationship between different scale features in the field of role-based network embedding. Therefore, we propose a novel role-oriented network embedding framework based on adversarial learning between higher-order and local features (ARHOL) to generate powerful role-based node representations. The higher-order features are discrete so we leverage the Auto-Encoder on them to obtain continuous representations. Then we apply the GIN on its outputs to aggregate local information. Finally, we consider the GIN as the generator and design an adversarial game between local features and GIN outputs to integrate these two aspects of features, which can enhance each other and improve the robustness. The extensive experiments on real-world networks demonstrate the superiority and efficiency of our model, and prove the effectiveness of integrating higher-order and local features.

Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic

Wenjia Zhang
Lin Gui
Yulan He

As the digital news industry becomes the main channel of information dissemination, the adverse impact of fake news is explosively magnified. The credibility of a news report should not be considered in isolation. Rather, previously published news articles on the similar event could be used to assess the credibility of a news report. Inspired by this, we propose a BERT-based multimodal unreliable news detection framework, which captures both textual and visual information from unreliable articles utilising the contrastive learning strategy. The contrastive learner interacts with the unreliable news classifier to push similar credible news (or similar unreliable news) closer while moving news articles with similar content but opposite credibility labels away from each other in the multimodal embedding space. Experimental results on a COVID-19 related dataset, ReCOVery, show that our model outperforms a number of competitive baseline in unreliable news detection.

DML: Dynamic Multi-Granularity Learning for BERT-Based Document Reranking

Xuanyu Zhang
Qing Yang

Recently, pre-trained language models have been successfully applied to the task of text retrieval and ranking. However, in real scenes, users' click behavior is usually affected by selection, position, or exposure bias, which may lead to insufficient positive annotations and introduce additional noise. And for different candidate documents of the same query, the previous optimization objectives usually use a single granularity and static loss weights. It makes the performance of ranking models more susceptible to the bias issue mentioned above. Thus, in this paper, we focus on BERT-based document reranking and propose Dynamic Multi-Granularity Learning (DML). By introducing Gaussian distribution into traditional loss functions, the weights of different documents can change dynamically according to the prediction probability to avoid the impact of unlabeled positive documents. Besides, both document-granularity and instance-granularity are considered to balance relative relations and absolute scores of candidate documents. Extensive experiments show that DML significantly outperforms previous state-of-the-art models on the MS MARCO document ranking dataset.

DialogueBERT: A Self-Supervised Learning based Dialogue Pre-training Encoder

Zhenyu Zhang
Tao Guo
Meng Chen

With the rapid development of artificial intelligence, conversational bots have became prevalent in mainstream E-commerce platforms, which can provide convenient customer service timely. To satisfy the user, the conversational bots need to understand the user's intention, detect the user's emotion, and extract the key entities from the conversational utterances. However, understanding dialogues is regarded as a very challenging task. Different from common language understanding, utterances in dialogues appear alternately from different roles and are usually organized as hierarchical structures. To facilitate the understanding of dialogues, in this paper, we propose a novel contextual dialogue encoder (i.e. DialogueBERT) based on the popular pre-trained language model BERT. Five self-supervised learning pre-training tasks are devised for learning the particularity of dialouge utterances. Four different input embeddings are integrated to catch the relationship between utterances, including turn embedding, role embedding, token embedding and position embedding. DialogueBERT was pre-trained with 70 million dialogues in real scenario, and then fine-tuned in three different downstream dialogue understanding tasks. Experimental results show that DialogueBERT achieves exciting results with 88.63% accuracy for intent recognition, 94.25% accuracy for emotion recognition and 97.04% F1 score for named entity recognition, which outperforms several strong baselines by a large margin.

Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Zhiling Zhang
Zelin Zhou
Haifeng Tang
Guangwei Li
Mengyue Wu
Kenny Q. Zhu

Audio tagging aims at predicting sound events occurred in a recording. Traditional models require enormous laborious annotations, otherwise performance degeneration will be the norm. Therefore, we investigate robust audio tagging models in low-resource scenarios with the enhancement of knowledge graphs. Besides existing ontological knowledge, we further propose a semi-automatic approach that can construct temporal knowledge graphs on diverse domain-specific label sets. Moreover, we leverage a variant of relation-aware graph neural network, D-GCN, to combine the strength of the two knowledge types. Experiments on AudioSet and SONYC urban sound tagging datasets suggest the effectiveness of the introduced temporal knowledge, and the advantage of the combined KGs with D-GCN over single knowledge source.

Multimodal Graph Meta Contrastive Learning

Feng Zhao
Donglin Wang

In recent years, graph contrastive learning has achieved promising node classification accuracy using graph neural networks (GNNs), which can learn representations in an unsupervised manner. However, such representations cannot be generalized to unseen novel classes with only few-shot labeled samples in spite of exhibiting good performance on seen classes. In order to assign generalization capability to graph contrastive learning, we propose multimodal graph meta contrastive learning (MGMC) in this paper, which integrates multimodal meta learning into graph contrastive learning. On one hand, MGMC accomplishes effectively fast adapation on unseen novel classes by the aid of bilevel meta optimization to solve few-shot problems. On the other hand, MGMC can generalize quickly to a generic dataset with multimodal distribution by inducing the FiLM-based modulation module. In addition, MGMC incorporates the lastest graph contrastive learning method that does not rely on the onstruction of augmentations and negative examples. To our best knowledge, this is the first work to investigate graph contrastive learning for few-shot problems. Extensieve experimental results on three graph-structure datasets demonstrate the effectiveness of our proposed MGMC in few-shot node classification tasks.

Multi-Task Self-Supervised Learning for Script Event Prediction

Bo Zhou
Yubo Chen
Kang Liu
Jun Zhao
Jiexin Xu
Xiaojian Jiang
Jinlong Li

Most existing approaches to script event prediction rely on manually labeled data heavily, which is often expensive to obtain. To cope with the training data bottleneck, we investigate methods of combining multiple self-supervised tasks, i.e. tasks where models are explicitly trained with automatically generated labels. We propose two self-supervised pre-training tasks:one is End Identification and the other is Contrastive Scoring. Multi-task learning framework is then leveraged to combine these two tasks to jointly train the model. The pre-trained model is then fine-tuned using human-annotated script event prediction training data. Experimental results on the commonly used dataset show that our approach can achieve competitive performance compared to the previous models which are trained with the whole dataset by using just 10% of the training data, and our model trained on the whole dataset outperforms previous models significantly.

SeDyT: A General Framework for Multi-Step Event Forecasting via Sequence Modeling on Dynamic Entity Embeddings

Hongkuan Zhou
James Orme-Rogers
Rajgopal Kannan
Viktor Prasanna

Temporal Knowledge Graphs store events in the form of subjects, relations, objects, and timestamps which are often represented by dynamic heterogeneous graphs. Event forecasting is a critical and challenging task in Temporal Knowledge Graph reasoning that predicts the subject or object of an event in the future. To obtain temporal embeddings multi-step away in the future, existing methods learn generative models that capture the joint distribution of the observed events. To reduce the high computation costs, these methods rely on unrealistic assumptions of independence and approximations in training and inference. In this work, we propose SeDyT, a discriminative framework that performs sequence modeling on the dynamic entity embeddings to solve the multi-step event forecasting problem. SeDyT consists of two components: a Temporal Graph Neural Network that generates dynamic entity embeddings in the past and a sequence model that predicts the entity embeddings in the future. Compared with the generative models, SeDyT does not rely on any heuristic-based probability model and has low computation complexity in both training and inference. SeDyT is compatible with most Temporal Graph Neural Networks and sequence models. We also design an efficient training method that trains the two components in one gradient descent propagation. We evaluate the performance of SeDyT on five popular datasets. By combining temporal Graph Neural Network models and sequence models, SeDyT achieves an average of 2.4% MRR improvement when not using the validation set and more than 10% MRR improvement when using the validation set.

Subtractive Aggregation for Attributed Network Anomaly Detection

Shuang Zhou
Qiaoyu Tan
Zhiming Xu
Xiao Huang
Fu-lai Chung

Attributed network anomaly detection is essential in various networked systems. It aims to detect nodes that significantly deviate from their corresponding background. In conventional anomaly detection, the background is defined as the vast majority. But in networks, anomalies can be local and look normal when compared with the majority. While several efforts have explored to consider communities as the background, it remains challenging to learn suitable communities for effective anomaly detection. Also, the patterns of anomalies are unknown and it is nontrivial to define criteria of anomalies. To bridge the gap, in this paper, we argue that, by using appropriate models, it is sufficient to simply consider neighbor nodes as the background to detect anomalies. Correspondingly, we propose a novel abnormality-aware graph neural network (AAGNN). It utilizes subtractive aggregation to represent each node as the deviation from its neighbors (the background). Normal nodes with high confidence are employed as labels to learn a tailored hypersphere as the criterion of anomalies. Experiments demonstrate that AAGNN surpasses state-of-the-art methods significantly.

Tabular Data Concept Type Detection Using Star-Transformers

Yiwei Zhou
Siffi Singh
Christos Christodoulopoulos

Tabular data is an invaluable information resource for search, in-formation extraction and question answering about the world. It is critical to understand the semantic concept types for table columns in order to fully exploit the information in tabular data. In this paper, we focus on learning-based approaches for column concept type detection without relying on any metadata or queries to existing knowledge bases. We propose a model that employs both statistical and semantic features of table columns, and use Star-Transformers to gather and scatter information across the whole table to boost the performance on individual columns. We apply distant supervision to construct a tabular dataset with columns annotated with DBpedia classes. Our experiment results show that our model achieves 93.57 accuracy on the dataset, exceeding that of the state-of-the-art baselines.

REFINE: Random RangE FInder for Network Embedding

Hao Zhu
Piotr Koniusz

Network embedding approaches have recently attracted considerable interest as they learn low-dimensional vector representations of nodes. Embeddings based on the matrix factorization are effective but they are usually computationally expensive due to the eigen-decomposition step. In this paper, we propose a Random RangE FInder based Network Embedding (REFINE) algorithm, which can perform embedding on one million of nodes (YouTube) within 30 seconds in a single thread. REFINE is 10x faster than ProNE, which is 10-400x faster than other methods such as LINE, DeepWalk, Node2Vec, GraRep, and Hope. Firstly, we formulate our network embedding approach as a skip-gram model, but with an orthogonal constraint, and we reformulate it into the matrix factorization problem. Instead of using randomized tSVD (truncated SVD) as other methods, we employ the Randomized Blocked QR decomposition to obtain the node representation fast. Moreover, we design a simple but efficient spectral filter for network enhancement to obtain higher-order information for node representation. Experimental results prove that REFINE is very efficient on datasets of different sizes (from thousand to million of nodes and edges) for node classification, while enjoying a good performance.

Self-Supervised Embedding for Subspace Clustering

Wenjie Zhu
Bo Peng
Chunchun Chen

Subspace clustering based on data self-expressive model aims to represent each data point as a linear combination of other data points on the dataset. Most existing methods focus on developing the regularization of self-expressive coefficients to solve the transductive unsupervised problem. In this paper, we propose a novel Self-Supervised Embedding for Subspace Clustering (S2ESC), which exploits a low-dimensional feature space where the data points belonging to the same category can be well-connected. Specifically, each data point is encouraged to be close to its transformation which is termed as self-supervised embedding learning. Therefore, the self-expressive coefficients can be learned by representing the data points over the corresponding transformations in the feature space. We introduce the low-rank and sparse regularization to illustrate the performance of self-supervised embedding. Extensive experiments on benchmark datasets demonstrate that our method outperforms the compared traditional subspace clustering methods.

SESSION: Applied Research Paper Track

Leveraging Semantic Information to Facilitate the Discovery of Underserved Podcasts

Maryam Aziz
Alice Wang
Aasish Pappu
Hugues Bouchard
Yu Zhao
Benjamin Carterette
Mounia Lalmas

Podcasts are a popular medium for rapid dissemination of information, entertainment, and casual conversations. Content aggregators are taking an increased interest in recommending podcasts to listeners to help them build larger audiences. With many podcasts released every day, many podcasts that would be of interest to listeners remain underserved by these recommendation systems. In this paper, we study variables related to podcast appeal to listeners selected at random in a large online study, in a production setting, involving more than five million recommendations. We present the results of two observational studies, which suggests that underserved podcast have the potential to grow their audiences. To mitigate the rich-get-richer effect, we propose leveraging semantic information, via means of knowledge graphs, to recommend underserved podcasts to listeners. Finally, we conduct empirical experiments that show our method is effective at recommending underserved podcasts, in comparison to baseline methods that rely on listening behavior.

Enabling Efficiency-Precision Trade-offs for Label Trees in Extreme Classification

Tavor Z. Baharav
Daniel L. Jiang
Kedarnath Kolluri
Sujay Sanghavi
Inderjit S. Dhillon

Extreme multi-label classification (XMC) aims to learn a model that can tag data points with a subset of relevant labels from an extremely large label set. Real world e-commerce applications like personalized recommendations and product advertising can be formulated as XMC problems, where the objective is to predict for a user a small subset of items from a catalog of several million products. For such applications, a common approach is to organize these labels into a tree, enabling training and inference times that are logarithmic in the number of labels [23]. While training a model once a label tree is available is well studied, designing the structure of the tree is a difficult task that is not yet well understood, and can dramatically impact both model latency and statistical performance. Existing approaches to tree construction either optimize exclusively for statistical performance or optimize exclusively for latency. We propose an efficient information theory inspired algorithm to construct intermediate operating points that trade off between the benefits of both, which was not previously possible. We corroborate our theoretical analysis with numerical results, showing that on the Wiki-500K [4] benchmark dataset our method can reduce a proxy for expected latency by up to 28% while maintaining the same accuracy as Parabel [23]. On several datasets derived from e-commerce customer logs, our modified label tree is able to improve this expected latency metric by up to 20% while maintaining the same accuracy. Finally, we discuss challenges in realizing these latency improvements in deployed models.

Multi-Property Molecular Optimization using an Integrated Poly-Cycle Architecture

Guy Barshatski
Galia Nordon
Kira Radinsky

Molecular lead optimization is an important task of drug discovery focusing on generating molecules similar to a drug candidate but with enhanced properties. Most prior works focused on optimizing a single property. However, in real settings, we wish to find molecules that satisfy multiple constraints, e.g., potency and safety. Simultaneously optimizing these constraints was shown to be difficult, mostly due to the lack of training examples satisfying all constraints. In this work, we present a novel approach for multi-property optimization. Unlike prior approaches, that require a large training set of pairs of a lead molecule and an enhanced molecule, our approach is unpaired. Our architecture learns a transformation for each property optimization separately, while constraining the latent embedding space between all transformations. This allows generating a molecule which optimizes multiple properties simultaneously. We present a novel adaptive loss which balances the separate transformations and stabilizes the optimization process. We evaluate our method on optimizing for two properties: dopamine receptor (DRD2) and drug likeness (QED), and show our method outperforms previous state-of-the-art, especially when training examples satisfying all constraints are sparse.

Contrastive Curriculum Learning for Sequential User Behavior Modeling via Data Augmentation

Shuqing Bian
Wayne Xin Zhao
Kun Zhou
Jing Cai
Yancheng He
Cunxiang Yin
Ji-Rong Wen

Within online platforms, it is critical to capture the semantics of sequential user behaviors for accurately modeling user interests. However, dynamic characteristics and sparse behaviors make it difficult to train effective user representations for sequential user behavior modeling.

Inspired by the recent progress in contrastive learning, we propose a novel Contrastive Curriculum Learning framework for producing effective representations for modeling sequential user behaviors. We make important technical contributions in two aspects, namely data quality and sample ordering. Firstly, we design a model-based data generator by generating high-quality samples confirming to users' attribute information. Given a target user, it can leverage the fused attribute semantics for generating more close-to-real sequences. Secondly, we propose a curriculum learning strategy to conduct contrastive learning via an easy-to-difficult learning process. The core component is a learnable difficulty evaluator, which can score augmented sequences, and schedule them in curriculums. Extensive results on both public and industry datasets demonstrate the effectiveness of our approach on downstream tasks.

Structural Temporal Graph Neural Networks for Anomaly Detection in Dynamic Graphs

Lei Cai
Zhengzhang Chen
Chen Luo
Jiaping Gui
Jingchao Ni
Ding Li
Haifeng Chen

Detecting anomalies in dynamic graphs is a vital task, with numerous practical applications in areas such as security, finance, and social media. Existing network embedding based methods have mostly focused on learning good node representations, whereas largely ignoring the subgraph structural changes related to the target nodes in a given time window. In this paper, we propose StrGNN, an end-to-end structural temporal Graph Neural Network model for detecting anomalous edges in dynamic graphs. In particular, we first extract the h-hop enclosing subgraph centered on the target edge and propose a node labeling function to identify the role of each node in the subgraph. Then, we leverage the graph convolution operation and Sortpooling layer to extract the fixed-size feature from each snapshot/timestamp. Based on the extracted features, we utilize the Gated Recurrent Units to capture the temporal information for anomaly detection. We fully implement StrGNN and deploy it into a real enterprise security system, and it greatly helps detect advanced threats and optimize the incident response. Extensive experiments on six benchmark datasets also demonstrate the effectiveness of StrGNN.

Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models

Bo Chen
Yichao Wang
Zhirong Liu
Ruiming Tang
Wei Guo
Hongkun Zheng
Weiwei Yao
Muyu Zhang
Xiuqiang He

Effectively modeling feature interactions is crucial for CTR prediction in industrial recommender systems. The state-of-the-art deep CTR models with parallel structure (e.g., DCN) learn explicit and implicit feature interactions through independent parallel networks. However, these models suffer from trivial sharing issues, namely insufficient sharing in hidden layers and excessive sharing in network input, limiting the model's expressiveness and effectiveness. Therefore, to enhance information sharing between explicit and implicit feature interactions, we propose a novel deep CTR model EDCN. EDCN introduces two advanced modules, namely bridge module and regulation module, which work collaboratively to capture the layer-wise interactive signals and learn discriminative feature distributions for each hidden layer of the parallel networks. Furthermore, two modules are lightweight and model-agnostic, which can be generalized well to mainstream parallel deep CTR models. Extensive experiments and studies are conducted to demonstrate the effectiveness of EDCN on two public datasets and one industrial dataset. Moreover, the compatibility of two modules over various parallel-structured models is verified, and they have been deployed onto the online advertising platform in Huawei, where a one-month A/B test demonstrates the improvement over the base parallel-structured model by 7.30% and 4.85% in terms of CTR and eCPM, respectively.

ETA Prediction with Graph Neural Networks in Google Maps

Austin Derrow-Pinion
Jennifer She
David Wong
Oliver Lange
Todd Hester
Luis Perez
Marc Nunkesser
Seongjae Lee
Xueying Guo
Brett Wiltshire
Peter W. Battaglia
Vishal Gupta
Ang Li
Zhongwen Xu
Alvaro Sanchez-Gonzalez
Yujia Li
Petar Velickovic

Travel-time prediction constitutes a task of high importance in transportation networks, with web mapping services like Google Maps regularly serving vast quantities of travel time queries from users and enterprises alike. Further, such a task requires accounting for complex spatiotemporal interactions (modelling both the topological properties of the road network and anticipating events---such as rush hours---that may occur in the future). Hence, it is an ideal target for graph representation learning at scale. Here we present a graph neural network estimator for estimated time of arrival (ETA) which we have deployed in production at Google Maps. While our main architecture consists of standard GNN building blocks, we further detail the usage of training schedule methods such as MetaGradients in order to make our model robust and production-ready. We also provide prescriptive studies: ablating on various architectural decisions and training regimes, and qualitative analyses on real-world situations where our model provides a competitive edge. Our GNN proved powerful when deployed, significantly reducing negative ETA outcomes in several regions compared to the previous production baseline (40+% in cities like Sydney).

AutoCombo: Automatic Malware Signature Generation Through Combination Rule Mining

Min Du
Wenjun Hu
William Hewlett

Malware detection is an essential step in building trustworthy computer systems. Signature-based detection detects a sample as malware if the sample data match or contain a pre-stored malware signature. Among all detection methods that malware experts are constantly exploring, signature-based malware detection is indispensable, due to its simplicity, explainability and efficiency. Malware signatures could have various formats, for example, a substring, a subsequence, or a combination rule. A combination rule signature could be viewed as a fixed set of properties, each of which describes some characteristic of an analyzed sample. Although security experts have dedicated many efforts to extract meaningful features from samples, the step of signature generation from the features has been rather ad hoc and time-consuming.

This paper focuses on the generation of combination rule signatures. We abstract and formally define the problem of combination rule malware signature generation, followed by a systematic study towards an effective and efficient implementation. Inspired by classic frequent itemsets mining solutions, the proposed AutoCombo approach is greedy but also complete. It generates higher quality signatures first, but is also able to traverse all possible property combinations for a complete generation. Further optimizations and future research potential are also discussed. The proposed approach is currently in use to assist the analysis for millions of files per day in a large security company. Our evaluation results using large-scale production data have also shown its efficacy. With the release of over 10 million real production records as well as our exploratory code, we hope this initial study could draw AI experts' attention and advance the research even further in this field.

From Pixels to Words: A Scalable Journey of Text Information from Product Images to Retail Catalog

Pranay Dugar
Rajesh Shreedhar Bhat
Asit Sharad Tarsode
Uddipto Dutta
Kunal Banerjee
Anirban Chatterjee
Vijay Srinivas Agneeswaran

Extracting texts of various shapes, sizes, and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to E-commerce. In the context of the scale at which Walmart operates, the text from an image can be a richer and more accurate source of data than human inputs and can be used in several applications such as Attribute Extraction, Offensive Text Classification, Product Matching among others. The motivation of this particular work has come from different business requirements such as flagging products whose images contain words that are non-compliant with organizational policies and building an efficient automated system to identify similar products by comparing the information contained in their respective product images and many others. Existing methods fail to address domain specific challenges like high entropy, different orientations, and small texts in product images adequately. In this work, we provide a solution that not only addresses these challenges but is also proven to work at a million image scale for various retail business units within Walmart. Extensive experimentation revealed that our proposed solution has been able to save around 30% computational cost in both the training and the inference stages.

Beast: Scalable Exploratory Analytics on Spatio-temporal Data

Ahmed Eldawy
Vagelis Hristidis
Saheli Ghosh
Majid Saeedan
Akil Sevim
A.B. Siddique
Samriddhi Singla
Ganesh Sivaram
Tin Vu
Yaming Zhang

This paper introduces the open-source Beast system for scalable exploratory data science on big spatio-temporal data. Beast is based on well-established research and has been released to assist the research community with analyzing big spatio-temporal data. Beast provides a set of extensible components that naturally integrate with Spark to build exploratory data science pipelines. Beast can install in less than a minute on an existing Spark cluster and provides a wide array of features including loading vector and raster data represented in standard file formats, synthetic data generation for benchmarking, load-balanced spatial partitioning, data summarization, interactive visualization, and more. Beast builds on several research projects; its goal is to make all this research widely available to researchers in one integrative and coherent system.

SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detection

Shangbin Feng
Herun Wan
Ningnan Wang
Jundong Li
Minnan Luo

Twitter has become a major social media platform since its launching in 2006, while complaints about bot accounts have increased recently. Although extensive research efforts have been made, the state-of-the-art bot detection methods fall short of generalizability and adaptability. Specifically, previous bot detectors leverage only a small fraction of user information and are often trained on datasets that only cover few types of bots. As a result, they fail to generalize to real-world scenarios on the Twittersphere where different types of bots co-exist. Additionally, bots in Twitter are constantly evolving to evade detection. Previous efforts, although effective once in their context, fail to adapt to new generations of Twitter bots. To address the two challenges of Twitter bot detection, we propose SATAR, a self-supervised representation learning framework of Twitter users, and apply it to the task of bot detection. In particular, SATAR generalizes by jointly leveraging the semantics, property and neighborhood information of a specific user. Meanwhile, SATAR adapts by pre-training on a massive number of self-supervised users and fine-tuning on detailed bot detection scenarios. Extensive experiments demonstrate that SATAR outperforms competitive baselines on different bot detection datasets of varying information completeness and collection time. SATAR is also proved to generalize in real-world scenarios and adapt to evolving generations of social media bots.

Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach

Martin Gerlach
Marshall Miller
Rita Ho
Kosta Harlan
Djellel Difallah

Hyperlinks constitute the backbone of the Web; they enable user navigation, information discovery, content ranking, and many other crucial services on the Internet. In particular, hyperlinks found within Wikipedia allow the readers to navigate from one page to another to expand their knowledge on a given subject of interest or to discover a new one. However, despite Wikipedia editors' efforts to add and maintain its content, the distribution of links remains sparse in many language editions. This paper introduces a machine-in-the-loop entity linking system that can comply with community guidelines for adding a link and aims at increasing link coverage in new pages and wiki-projects with low resources. To tackle these challenges, we build a context- and language-agnostic entity linking model that combines data collected from millions of anchors found across wiki-projects, as well as billions of users' reading sessions. We develop an interactive recommendation interface that proposes candidate links to editors who can confirm, reject, or adapt the recommendation with the overall aim of providing a more accessible editing experience for newcomers through structured tasks. Our system's design choices were made in collaboration with members of several language communities. When the system is implemented as part of Wikipedia, its usage by volunteer editors will help us build a continuous evaluation dataset with active feedback. Our experimental results show that our link recommender can achieve a precision of 74-90% while ensuring a recall of 30-66% across 6 languages covering different sizes, continents, and families.

Self-Supervised Learning on Users' Spontaneous Behaviors for Multi-Scenario Ranking in E-commerce

Yulong Gu
Wentian Bao
Dan Ou
Xiang Li
Baoliang Cui
Biyu Ma
Haikuan Huang
Qingwen Liu
Xiaoyi Zeng

Multi-scenario Learning to Rank is essential for Recommender Systems, Search Engines and Online Advertising in e-commerce portals where the ranking models are usually applied in many scenarios. However, existing works mainly focus on learning the ranking model for a single scenario, and pay less attention to learning ranking models for multiple scenarios. We identify two practical challenges in industrial multi-scenario ranking systems: (1) The Feedback Loop problem that the model is always trained on the items chosen by the ranker itself. (2) Insufficient training data for small and new scenarios. To address the above issues, we present ZEUS, a novel framework that learns a Zoo of ranking modEls for mUltiple Scenarios based on pre-training on users' spontaneous behaviors (e.g. queries which are directly searched in the search box and not recommended by the ranking system). ZEUS decomposes the training process into two stages: self-supervised learning based pre-training and fine-tuning. Firstly, ZEUS performs self-supervised learning on users' spontaneous behaviors and generates a pre-trained model. Secondly, ZEUS fine-tunes the pre-trained model on users' implicit feedback in multiple scenarios. Extensive experiments on Alibaba's production dataset demonstrate the effectiveness of ZEUS, which significantly outperforms state-of-the-art methods. ZEUS averagely achieves 6.0%, 9.7%, 11.7% improvement in CTR, CVR and GMV respectively than state-of-the-art method.

Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach

Chuanbo Hu
Minglei Yin
Bin Liu
Xin Li
Yanfang Ye

Social media such as Instagram and Twitter have become important platforms for marketing and selling illicit drugs. Detection of online illicit drug trafficking has become critical to combat the online trade of illicit drugs. However, the legal status often varies spatially and temporally; even for the same drug, federal and state legislation can have different regulations about its legality. Meanwhile, more drug trafficking events are disguised as a novel form of advertising - commenting leading to information heterogeneity. Accordingly, accurate detection of illicit drug trafficking events (IDTEs) from social media has become even more challenging. In this work, we conduct the first systematic study on fine-grained detection of IDTEs on Instagram. We propose to take a deep multimodal multilabel learning (DMML) approach to detect IDTEs and demonstrate its effectiveness on a newly constructed dataset called multimodal IDTE (MM-IDTE). Specifically, our model takes text and image data as the input and combines multimodal information to predict multiple labels of illicit drugs. Inspired by the success of BERT, we have developed a self-supervised multimodal bidirectional transformer by jointly fine-tuning pretrained text and image encoders. We have constructed a large-scale dataset MM-IDTE with manually annotated multiple drug labels to support fine-grained detection of illicit drugs. Extensive experimental results on the MM-IDTE dataset show that the proposed DMML methodology can accurately detect IDTEs even in the presence of special characters and style changes attempting to evade detection.

ARGH!: Automated Rumor Generation Hub

Larry Huynh
Thai Nguyen
Joshua Goh
Hyoungshick Kim
Jin B. Hong

It is still challenging to effectively identify rumors due to rapid changes in people's interests and perceptions. To enhance rumor detectors, we first need to better understand which rumors are effective (in terms of bypassing detection) and their characteristics. In this paper, we introduce ARGH, a novel framework to automatically generate rumors using recent advancements in natural language processing, customized to target and generate specific topics. To show the effectiveness of ARGH, we conducted a user study with 212 participants and analyzed how well humans can detect the rumors generated by ARGH, and we also tested its performance against the state-of-the-art rumor detection model PLAN [17]. Surprisingly, the experimental results demonstrate that the generated rumors are significantly harder to identify as rumors than hand-written rumors, degrading the detection accuracy by both humans and machines by 18.87% and 17.62%, respectively. We believe that ARGH will be a useful tool to obtain high quality and evasive rumor datasets quickly, which is often a tedious and time consuming task. Further, our analysis results provide valuable insight into how to characterize evasive rumors and how they can be generated, which will help to enhance the existing rumor detection techniques.

LightMove: A Lightweight Next-POI Recommendation forTaxicab Rooftop Advertising

Jinsung Jeon
Soyoung Kang
Minju Jo
Seunghyeon Cho
Noseong Park
Seonghoon Kim
Chiyoung Song

Mobile digital billboards are an effective way to augment brand-awareness. Among various such mobile billboards, taxicab rooftop devices are emerging in the market as a brand new media. Motov is a leading company in South Korea in the taxicab rooftop advertising market. In this work, we present a lightweight yet accurate deep learning-based method to predict taxicabs' next locations to better prepare for targeted advertising based on demographic information of locations. Considering the fact that next POI recommendation datasets are frequently sparse, we design our presented model based on neural ordinary differential equations (NODEs), which are known to be robust to sparse/incorrect input, with several enhancements. Our model, which we call LightMove, has a larger prediction accuracy, a smaller number of parameters, and/or a smaller training/inference time, when evaluating with various datasets, in comparison with state-of-the-art models.

Prohibited Item Detection on Heterogeneous Risk Graphs

Yugang Ji
Chuan Shi
Xiao Wang

Prohibited item detection, which aims to detect illegal items hidden on e-commerce platforms, plays a significant role in evading risks and preventing crimes for online shopping. While traditional solutions usually focus on mining evidence from independent items, they cannot effectively utilize the rich structural relevance among different items. A naive idea is to directly deploy existing supervised graph neural networks to learn node representations for item classification. However, the very few manually labeled items with various risk patterns introduce two essential challenges: (1) How to enhance the representations of enormous unlabeled items? (2) How to enrich the supervised information in this few-labeled but multiple-pattern business scenario? In this paper, we construct item logs as a Heterogeneous Risk Graph (HRG), and propose the novel Heterogeneous Self-supervised Prohibited item Detection model (HSPD) to overcome these challenges. HSPD first designs the heterogeneous self-supervised learning model, which treats multiple semantics as the supervision to enhance item representations. Then, it presents the directed pairwise labeling to learn the distance from candidates to their most relevant prohibited seeds, which tackles the binary-labeled multi-patterned risks. Finally, HSPD integrates with self-training mechanisms to iteratively expand confident pseudo labels for enriching supervision. The extensive offline and online experimental results on three real-world HRGs demonstrate that HSPD consistently outperforms the state-of-the-art alternatives.

Unbiased Filtering of Accidental Clicks in Verizon Media Native Advertising

Yohay Kaplan
Naama Krasne
Alex Shtoff
Oren Somekh

Verizon Media (VZM) native advertising is one of VZM largest and fastest growing businesses, reaching a run-rate of several hundred million USDs in the past year. Driving the VZM native models that are used to predict event probabilities, such as click and conversion probabilities, is OFFSET - a feature enhanced collaborative-filtering based event-prediction algorithm. In this work we focus on the challenge of predicting click-through rates (CTR) when we are aware that some of the clicks have short dwell-time and are defined as accidental clicks. An accidental click implies little affinity between the user and the ad, so predicting that similar users will click on the ad is inaccurate. Therefore, it may be beneficial to remove clicks with dwell-time lower than a predefined threshold from the training set. However, we cannot ignore these positive events, as filtering these will cause the model to under predict. Previous approaches have tried to apply filtering and then adding corrective biases to the CTR predictions, but did not yield revenue lifts and therefore were not adopted. In this work, we present a new approach where the positive weight of the accidental clicks is distributed among all of the negative events (skips), based on their likelihood of causing accidental clicks, as predicted by an auxiliary model. These likelihoods are taken as the correct labels of the negative events, shifting our training from using only binary labels and adopting a binary cross-entropy loss function in our training process. After showing offline performance improvements, the modified model was tested online serving VZM native users, and provided 1.18% revenue lift over the production model which is agnostic to accidental clicks.

Understanding Job Seeker Funnel for Search and Discovery Personalization

Nagaraj Kota
Venkatesh Duppada
Ashvini Jindal
Mohit Wadhwa

The search and discovery process of a job-seeker towards realizing the dream opportunity can be very complex. Given the dynamic nature of job postings, churn-rate of skills, and gaps in intent matches, professionals often find it hard to discover the right opportunity. Most often, they need guidance on the right search queries to issue or next-steps in the job-seeking funnel to reach a target job-apply. In this work, we experiment with job-sessions dataset from LinkedIn, to understand and represent user's job-seeking behavior. In particular using action sequences unified from various search and discovery channels, we pre-train language models, e.g. BERT (Bidirectional Encoder Representations from Transformers) to model user's activities. We further fine-tune the BERT based contextual session embeddings towards predicting entities from target sessions, in an eXtreme Multi-Label (XML) classification setting. We hypothesize that XML fine-tuning task enables dense-representation, and predicted entities to be used in multiple downstream tasks of job-search query recommendation, job-search ranking, job recommendation retrieval, and job-notification expansion, as shown in experiments. We demonstrate significant improvements in accuracy across tasks leading to reduced time to reach a given job-apply, as well as increase in total job-applies in the system. We also share the learning from deploying these models in production. To the best of our knowledge, this is the first work to efficiently model cross-channel activities at scale using self-attention mechanisms, leading to statistically significant improvement in job-seeker experience.

Learning to Bundle Proactively for On-Demand Meal Delivery

Chengbo Li
Lin Zhu
Guangyuan Fu
Longzhi Du
Canhua Zhao
Tianlun Ma
Chang Ye
Pei Lee

On-demand meal delivery (ODMD) platforms such as DoorDash and Ele.me have experienced explosive growth in recent years. Effective logistics optimization strategies that could guarantee high service standards with controlled costs are crucial for the long-term sustainability of these platforms, and yet are also non-trivial due to the nature of ODMD operations. In particular, most of the orders are not known until they are placed by the customers, and any dispatching policy that only considers known requests would risk making myopic decisions in such a setting.

In this paper, we propose a novel approach to address this problem. At the core of our method is a learning-based metric called Proactive Bundle Cost Vector (PBCV), which quantifies the easiness of bundling a particular order with future orders. Based on PBCV, we build a proactive bundling policy that that considers the viability of serving unknown requests. Extensive online A/B tests demonstrate that the resultant policy has shown significant improvements of key performance metrics over baseline policies. Our solution has been successfully deployed at one of the world's largest ODMD platforms, serving tens of millions of customers on a daily basis.

Elastic and Stable Compaction for LSM-tree: A FaaS-Based Approach on TerarkDB

Jianchuan Li
Peiquan Jin
Yuanjin Lin
Ming Zhao
Yi Wang
Kuankuan Guo

LSM-tree is widely used as a write-optimized storage engine in many NoSQL systems. However, the periodical compaction operations in LSM-tree cost many I/O bandwidths and CPU resources of the local server, resulting in throughput drops of the system. To address this issue, this paper proposes a new compaction scheme based on the FaaS (Functions as a Service) architecture, which is called FaaS Compaction. It utilizes the elastic computing capability of FaaS and always pushes compactions to a FaaS cluster. The FaaS cluster will perform actual compaction operations, which will not affect the processing of the local server. Therefore, we can maintain stable performance even when periodical compactions are triggered. We also present a Parallel Slight Compaction method to solve the timeout problem caused by heavy compactions. We implement the FaaS Compaction based on TerarkDB and a real FaaS cluster and experimentally compare the FaaS Compaction with the RocksDB's local compaction scheme and the state-of-the-art offloading compaction policy. The results suggest the efficiency, stability, and elasticity of our proposal.

Causal-Aware Generative Imputation for Automated Underwriting

Qian Li
Tri Dung Duong
Zhichao Wang
Shaowu Liu
Dingxian Wang
Guandong Xu

Underwriting is an important process in insurance and is concerned with accepting individuals into insurance policy with tolerable claim risk. Underwriting is a tedious and labor intensive process relying on underwriters' domain knowledge and experience, thus is labor intensive and prone to error. Machine learning models are recently applied to automate the underwriting process and thus to ease the burden on the underwriters as well as improve underwriting accuracy. However, observational data used for underwriting modelling is high dimensional, sparse and incomplete, due to the dynamic evolving nature (e.g., upgrade) of business information systems. Simply applying traditional supervised learning methods e.g., logistic regression or Gradient boosting on such highly incomplete data usually leads to the unsatisfactory underwriting result, thus requiring practical data imputation for training quality improvement. In this paper, rather than choosing off-the-shelf solutions tackling the complex data missing problem, we propose an innovative Generative Adversarial Nets (GAN) framework that can capture the missing pattern from a causal perspective. Specifically, we design a structural causal model to learn the causal relations underlying the missing pattern of data. Then, we devise a Causality-aware Generative network (CaGen) using the learned causal relationship prior to generating missing values, and correct the imputed values via the adversarial learning. We also show that CaGen significantly improves the underwriting prediction in real-world insurance applications.

Grassland: A Rapid Algebraic Modeling System for Million-variable Optimization

Xihan Li
Xiongwei Han
Zhishuo Zhou
Mingxuan Yuan
Jia Zeng
Jun Wang

An algebraic modeling system (AMS) is a type of mathematical software for optimization problems, which allows users to define symbolic mathematical models in a specific language, instantiate them with given source of data, and solve them with the aid of external solver engines. With the bursting scale of business models and increasing need for timeliness, traditional AMSs are not sufficient to meet the following industry needs: 1) million-variable models need to be instantiated from raw data very efficiently; 2) Strictly feasible solution of million-variable models need to be delivered in a rapid manner to make up-to-date decisions against highly dynamic environments. Grassland is a rapid AMS that provides an end-to-end solution to tackle these emerged new challenges. It integrates a parallelized instantiation scheme for large-scale linear constraints, and a sequential decomposition method that accelerates model solving exponentially with an acceptable loss of optimality. Extensive benchmarks on both classical models and real enterprise scenario demonstrate 6-10x speedup of Grassland over state-of-the-art solutions on model instantiation. Our proposed system has been deployed in the large-scale real production planning scenario of Huawei. With the aid of our decomposition method, Grassland successfully accelerated Huawei's million-variable production planning simulation pipeline from hours to 3-5 minutes, supporting near-real-time production plan decision making against highly dynamic supply-demand environment.

Unsupervised Categorical Representation Learning for Package Arrival Time Prediction

Yang Li
Xingyu Wu
Jinglong Wang
Yong Liu
Xiaoqing Wang
Yuming Deng
Chunyan Miao

Estimated Time of package Arrival (ETA) is an essential task for Alibaba E-commerce platforms like Taobao and Tmall, which may influence the user experiences of one billion customers. The main challenge in ETA prediction of Alibaba platforms is learning from high-dimensional categorical attributes, which is equally important to obtain appropriate representations for each feature, and describe the proximity among them. Although recent supervised end-to-end methods have achieved great improvements, the unsupervised embedding method for categorical attributes has not been well-studied yet, especially when dealing with large-scale sparse datasets.

In this paper, we propose Bayesian Graph Embedding (BGE) to learn dense representations for high-dimensional categorical attributes in an unsupervised way. Ignited by the idea of Bayesian network and graph embedding, we design an unsupervised algorithm to absorb the knowledge of prior dependencies and unobserved attributes, which is ignored by end-to-end methods. A joint optimization objective is raised to mine the proximity between categorical attributes, which achieves a consistency with Bayesian network. Moreover, a multi-tower model architecture is put forward for the multi-task learning of the joint objective, based on which the dense representations of categorical attributes can be well-exploited. The produced embeddings can be applied in downstream tasks such as regression and classification with further fine-tuning. Extensive experiments have been conducted on two datasets with more than two million samples, collected from Alibaba real production environment. The experimental results demonstrate the proposed approach outperforms both supervised and unsupervised baseline methods in the effectiveness and efficiency of E-commerce ETA prediction task.

You Are What and Where You Are: Graph Enhanced Attention Network for Explainable POI Recommendation

Zeyu Li
Wei Cheng
Haiqi Xiao
Wenchao Yu
Haifeng Chen
Wei Wang

Point-of-interest (POI) recommendation is an emerging area of research on location-based social networks to analyze user behaviors and contextual check-in information. For this problem, existing approaches, with shallow or deep architectures, have two major drawbacks. First, for these approaches, the attributes of individuals have been largely ignored. Therefore, it would be hard, if not impossible, to gather sufficient user attribute features to have complete coverage of possible motivation factors. Second, most existing models preserve the information of users or POIs by latent representations without explicitly highlighting salient factors or signals. Consequently, the trained models with unjustifiable parameters provide few persuasive rationales to explain why users favor or dislike certain POIs and what really causes a visit. To overcome these drawbacks, we propose GEAPR, a POI recommender that is able to interpret the POI prediction in an end-to-end fashion. Specifically, GEAPR learns user representations by aggregating different factors, such as structural context, neighbor impact, user attributes, and geolocation influence. GEAPR takes advantage of a triple attention mechanism to quantify the influences of different factors for each resulting recommendation and performs a thorough analysis of the model interpretability. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed model. GEAPR is deployed and under test on an internal web server. An example interface is presented to showcase its application on explainable POI recommendation.

Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series

Shuming Liang
Zhidong Li
Bin Liang
Yu Ding
Yang Wang
Fang Chen

Pipe failure prediction in the water industry aims to prioritize the pipes that are at high risk of failure for proactive maintenance. However, existing statistical or machine learning models that rely on historical failures and asset attributes can hardly leverage the structure information of pipe networks. In this work, we develop a failure prediction framework for pipe networks by jointly considering the pipes' features, the network structure, the geographical neighboring effect, and the temporal failure series. We apply a multi-hop Graph Neural Network (GNN) to failure prediction. We propose a method of constructing a geographical graph structure depending on not only the physical connections but also geographical distances between pipes. To differentiate the pipes with diverse properties, we employ an attention mechanism in the neighborhood aggregation process of each GNN layer. Also, residual connections and layer-wise aggregation are used to avoid the over-smoothing issue in deep GNNs. The historical failures exhibit a strong temporal pattern. Inspired by point process, we develop a module to learn the pipes' evolutionary effect and the time-decayed excitement of historical failures on the current state of the pipe. The proposed framework is evaluated on two real-world large-scale pipe networks. It outperforms the existing statistical, machine learning, and state-of-the-art GNN baselines. Our framework provides the water utility with core data-driven support for proactive maintenance including regular pipe inspection, pipe renewal planning, and sensor system deployment. It can be extended to other infrastructure networks in the future.

Distilling Knowledge from BERT into Simple Fully Connected Neural Networks for Efficient Vertical Retrieval

Peiyang Liu
Xi Wang
Lin Wang
Wei Ye
Xiangyu Xi
Shikun Zhang

Distilled BERT models are more suitable for efficient vertical retrieval in online sponsored vertical search with low-latency requirements than BERT due to fewer parameters and faster inference. Unfortunately, most of these models are still far from ideal inference speed. This paper presents a novel and effective method to distill knowledge from BERT into simple fully connected neural networks (FNN). Results of extensive experiments on English and Chinese datasets demonstrate that our method achieves comparable results with existing distilled BERT models while the inference is accelerated by more than ten times. We have successfully applied our method on our online sponsored vertical search engine and get remarkable improvements.

Heterogeneous Graph Neural Networks for Large-Scale Bid Keyword Matching

Zongtao Liu
Bin Ma
Quan Liu
Jian Xu
Bo Zheng

Digital advertising is a critical part of many e-commerce platforms such as Taobao and Amazon. While in recent years a lot of attention has been drawn to the consumer side including canonical problems like ctr/cvr prediction, the advertiser side, which directly serves advertisers by providing them with marketing tools, is now playing a more and more important role. When speaking of sponsored search, bid keyword recommendation is the fundamental service. This paper addresses the problem of keyword matching, the primary step of keyword recommendation. Existing methods for keyword matching merely consider modeling relevance based on a single type of relation among ads and keywords, such as query clicks or text similarity, which neglects rich heterogeneous interactions hidden behind them. To fill this gap, the keyword matching problem faces several challenges including: 1) how to learn enriched and robust embeddings from complex interactions among various types of objects; 2) how to conduct high-quality matching for new ads that usually lack sufficient data.

To address these challenges, we develop a heterogeneous-graph-neural-network-based model for keyword matching named HetMatch, which has been deployed both online and offline at the core sponsored search platform of Alibaba Group. To extract enriched and robust embeddings among rich relations, we design a hierarchical structure to fuse and enhance the relevant neighborhood patterns both on the micro and the macro level. Moreover, by proposing a multi-view framework, the model is able to involve more positive samples for cold-start ads. Experimental results on a large-scale industrial dataset as well as online AB tests exhibit the effectiveness of HetMatch.

Mining Software Entities in Scientific Literature: Document-level NER for an Extremely Imbalance and Large-scale Task

Patrice Lopez
Caifan Du
Johanna Cohoon
Karthik Ram
James Howison

We present a comprehensive information extraction system dedicated to software entities in scientific literature. This task combines the complexity of automatic reading of scientific documents (PDF processing, document structuring, styled/rich text, scaling) with challenges specific to mining software entities: high heterogeneity and extreme sparsity of mentions, document-level cross-references, disambiguation of noisy software mentions and poor portability of Machine Learning approaches between highly specialized domains. While NER is a key component to recognize new and unseen software, considering this task as a simple NER application fails to address most of these issues.

In this paper, we propose a multi-model Machine Learning approach where raw documents are ingested by a cascade of document structuring processes applied not to text, but to layout token elements. The cascading process further enriches the relevant structures of the document with a Deep Learning software mention recognizer adapted to the high sparsity of mentions. The Machine Learning cascade culminates with entity disambiguation to alleviate false positives and to provide software entity linking. A bibliographical reference resolution is integrated to the process for attaching references cited alongside the software mentions.

Based on the first gold-standard annotated dataset developed for software mentions, this work establishes a new reference end-to-end performance for this task. Experiments with the CORD-19 publications have further demonstrated that our system provides practically usable performance and is scalable to the whole scientific corpus, enabling novel applications for crediting research software and for better understanding the impact of software in science.

Algorithmic Balancing of Familiarity, Similarity, & Discovery in Music Recommendations

Rishabh Mehrotra

Algorithmic recommendations shape music consumption at scale, and understanding the role different behavioral aspects play in how content is consumed, is a central question for music streaming platforms. Focusing on the notions of familiarity, similarity and discovery, we identify the need for explicit consideration and optimization of such objectives, and establish the need to efficiently balance them when generating algorithmic recommendations for users. We posit that while familiarity helps drive short term engagement, jointly optimizing for discovery enables the platform to influence and shape consumption across suppliers. We propose a multi-level ordered-weighted averaging based objective balancer to help maintain a healthy balance with respect to familiarity and discovery objectives, and conduct a series of offline evaluations and online AB tests, to demonstrate that despite the presence of strict trade-offs, we can achieve wins on both satisfaction and discover centric objectives. Our proposed methods and insights have implications for the design and deployment of practical approaches for music recommendations, and our findings demonstrate that they can lead to substantial improvements on recommendation quality on one of the world's largest music streaming platforms.

Sequential Search with Off-Policy Reinforcement Learning

Dadong Miao
Yanan Wang
Guoyu Tang
Lin Liu
Sulong Xu
Bo Long
Yun Xiao
Lingfei Wu
Yunjiang Jiang

Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume.

To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knap-sack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.

Scalable Learning to Troubleshoot Query Performance Problems

Alexandar Mihaylov
Vincent Corvinelli
Parke Godfrey
Piotr Mierzejewski
Jaroslaw Szlichta
Calisto Zuzarte

Query optimization has long been fundamental for database systems. There are cracks in the edifice, however, as the complexity of modern query workloads outpace what database systems can manage well. Automatic tools are needed for database vendors, such as IBM with Db2, to help customers troubleshoot their performance problems, as manual troubleshooting is painstaking. To manage complex and large workloads, we develop a distributed system called dGALO that learns recurring problem patterns in query plans over workloads. dGALO employs these problem patterns to build a RDF-based, SPARQL-queried knowledge-base of plan-rewrite remedies. We illustrate a distributed implementation of dGALO on Apache Spark with efficient partitioning strategies for load balancing. The system employs additional pruning strategies via clustering, which yields a fine-grained trade off between runtime and accuracy. dGALO uses its knowledge-base to re-optimize queries, often to dramatic effect, and is a valuable tool for the development team to refine the optimizer with new techniques. We demonstrate by an experimental study over the TPC-DS benchmark the efficiency and effectiveness of our techniques.

Profiling Neural Blocks and Design Spaces for Mobile Neural Architecture Search

Keith G. Mills
Fred X. Han
Jialin Zhang
Seyed Saeed Changiz Rezaei
Fabian Chudak
Wei Lu
Shuo Lian
Shangling Jui
Di Niu

Neural architecture search automates neural network design and has achieved state-of-the-art results in many deep learning applications. While recent literature has focused on designing networks to maximize accuracy, little work has been conducted to understand the compatibility of architecture design spaces to varying hardware. In this paper, we analyze the neural blocks used to build Once-for-All (MobileNetV3), ProxylessNAS and ResNet families, in order to understand their predictive power and inference latency on various devices, including Huawei Kirin 9000 NPU, RTX 2080 Ti, AMD Threadripper 2990WX, and Samsung Note10. We introduce a methodology to quantify the friendliness of neural blocks to hardware and the impact of their placement in a macro network on overall network performance via only end-to-end measurements. Based on extensive profiling results, we derive design insights and apply them to hardware-specific search space reduction. We show that searching in the reduced search space generates better accuracy-latency Pareto frontiers than searching in the original search spaces, customizing architecture search according to the hardware. Moreover, insights derived from measurements lead to notably higher ImageNet top-1 scores on all search spaces investigated.

TSI: An Ad Text Strength Indicator using Text-to-CTR and Semantic-Ad-Similarity

Shaunak Mishra
Changwei Hu
Manisha Verma
Kevin Yen
Yifan Hu
Maxim Sviridenko

Coming up with effective ad text is a time consuming process, and particularly challenging for small businesses with limited advertising experience. When an inexperienced advertiser onboards with a poorly written ad text, the ad platform has the opportunity to detect low performing ad text, and provide improvement suggestions. To realize this opportunity, we propose an ad text strength indicator (TSI) which: (i) predicts the click-through-rate (CTR) for an input ad text, (ii) fetches similar existing ads to create a neighborhood around the input ad, (iii) and compares the predicted CTRs in the neighborhood to declare whether the input ad is strong or weak. In addition, as suggestions for ad text improvement, TSI shows anonymized versions of superior ads (higher predicted CTR) in the neighborhood. For (i), we propose a BERT based text-to-CTR model trained on impressions and clicks associated with an ad text. For (ii), we propose a sentence-BERT based semantic-ad-similarity model trained using weak labels from ad campaign setup data. Offline experiments demonstrate that our BERT based text-to-CTR model achieves a significant lift in CTR prediction AUC for cold start (new) advertisers compared to bag-of-words based baselines. In addition, our semantic-textual-similarity model for similar ads retrieval achieves a precision@1 of 0.93 (for retrieving ads from the same product category); this is significantly higher compared to unsupervised TF-IDF, word2vec, and sentence-BERT baselines. Finally, we share promising online results from advertisers in the Yahoo (Verizon Media) ad platform where a variant of TSI was implemented with sub-second end-to-end latency.

Prioritizing Original News on Facebook

Xiuyan Ni
Shujian Bu
Lucas Adams
Igor L. Markov

This work outlines how we prioritize original news, a critical indicator of news quality. By examining the landscape and lifecycle of news posts on our social media platform, we identify challenges of building and deploying an originality score. We pursue an approach based on normalized PageRank values and three-step clustering, and refresh the score on an hourly basis to capture the dynamics of online news. We describe a near real-time system architecture, evaluate our methodology, and deploy it to production. Our empirical results validate individual components and show that prioritizing original news increases user engagement with news and improves proprietary cumulative metrics.

Learning to Expand: Reinforced Response Expansion for Information-seeking Conversations

Haojie Pan
Cen Chen
Chengyu Wang
Minghui Qiu
Liu Yang
Feng Ji
Jun Huang

Information-seeking conversation systems are increasingly popular in real-world applications, especially for e-commerce companies. To retrieve appropriate responses for users, it is necessary to compute the matching degrees between candidate responses and users' queries with historical dialogue utterances. As the contexts are usually much longer than responses, it is thus necessary to expand the responses (usually short) with richer information. Recent studies on pseudo-relevance feedback (PRF) have demonstrated its effectiveness in query expansion for search engines, hence we consider expanding response using PRF information. However, existing PRF approaches are either based on heuristic rules or require heavy manual labeling, which are not suitable for solving our task. To alleviate this problem, we treat the PRF selection for response expansion as a learning task and propose a reinforced learning method that can be trained in an end-to-end manner without any human annotations. More specifically, we propose a reinforced selector to extract useful PRF terms to enhance response candidates and a BERT-based response ranker to rank the PRF-enhanced responses. The performance of the ranker serves as a reward to guide the selector to extract useful PRF terms, which boosts the overall task performance. Extensive experiments on both standard benchmarks and commercial datasets prove the superiority of our reinforced PRF term selector compared with other potential soft or hard selection methods. Both case studies and quantitative analysis show that our model is capable of selecting meaningful PRF terms to expand response candidates and also achieving the best results compared with all baselines on a variety of evaluation metrics. We have also deployed our method on online production in an e-commerce company, which shows a significant improvement over the existing online ranking system.

Dual Learning for Query Generation and Query Selection in Query Feeds Recommendation

Kunxun Qi
Ruoxu Wang
Qikai Lu
Xuejiao Wang
Ning Jing
Di Niu
Haolan Chen

Query feeds recommendation is a new recommended paradigm in mobile search applications, where a stream of queries need to be recommended to improve user engagement. It requires a great quantity of attractive queries for recommendation. A conventional solution is to retrieve queries from a collection of past queries recorded in user search logs. However, these queries usually have poor readability and limited coverage of article content, and are thus not suitable for the query feeds recommendation scenario. Furthermore, to deploy the generated queries for recommendation, human validation, which is costly in practice, is required to filter unsuitable queries. In this paper, we propose TitIE, a query mining system to generate valuable queries using the titles of documents. We employ both an extractive text generator and an abstractive text generator to generate queries from titles. To improve the acceptance rate during human validation, we further propose a model-based scoring strategy to pre-select the queries that are more likely to be accepted during human validation. Finally, we propose a novel dual learning approach to jointly learn the generation model and the selection model by making full use of the unlabeled corpora under a semi-supervised scheme, thereby simultaneously improving the performance of both models. Results from both offline and online evaluations demonstrate the superiority of our approach.

EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications

Minghui Qiu
Peng Li
Chengyu Wang
Haojie Pan
Ang Wang
Cen Chen
Xianyan Jia
Yaliang Li
Jun Huang
Deng Cai
Wei Lin

The literature has witnessed the success of leveraging Pre-trained Language Models (PLMs) and Transfer Learning (TL) algorithms to a wide range of Natural Language Processing (NLP) applications, yet it is not easy to build an easy-to-use and scalable TL toolkit for this purpose. To bridge this gap, the EasyTransfer platform is designed to develop deep TL algorithms for NLP applications. EasyTransfer is backended with a high-performance and scalable engine for efficient training and inference, and also integrates comprehensive deep TL algorithms, to make the development of industrial-scale TL applications easier. In EasyTransfer, the built-in data and model parallelism strategies, combined with AI compiler optimization, show to be 4.0x faster than the community version of distributed training. EasyTransfer supports various NLP models in the ModelZoo, including mainstream PLMs and multi-modality models. It also features various in-house developed TL algorithms, together with the AppZoo for NLP applications. The toolkit is convenient for users to quickly start model training, evaluation, and online deployment. EasyTransfer is currently deployed at Alibaba to support a variety of business scenarios, including item recommendation, personalized search, conversational question answering, etc. Extensive experiments on real-world datasets and online applications show that EasyTransfer is suitable for online production with cutting-edge performance for various applications. The source code of EasyTransfer is released at Github1.

From Closing Triangles to Higher-Order Motif Closures for Better Unsupervised Online Link Prediction

Ryan A. Rossi
Anup Rao
Sungchul Kim
Eunyee Koh
Nesreen K. Ahmed
Gang Wu

This paper introduces higher-order link prediction methods based on the notion of closing higher-order network motifs. The methods are fast and efficient for real-time ranking and link prediction-based applications such as online visitor stitching, web search, and online recommendation. In such applications, real-time performance is critical. The proposed methods do not require any explicit training data, nor do they derive an embedding from the graph data, or perform any explicit learning. Most existing unsupervised methods with the above desired properties are all based on closing triangles (common neighbors, Jaccard similarity, and the ilk). In this work, we develop unsupervised techniques based on the notion of closing higher-order motifs that generalize beyond closing simple triangles. Through extensive experiments, we find that these higher-order motif closures often outperform triangle-based methods, which are commonly used in practice. This result implies that one should consider other motif closures beyond simple triangles. We also find that the best motif closure depends highly on the underlying network and its structural properties. Furthermore, all methods described in this work are fast for link prediction-based applications requiring real-time performance. The experimental results indicate the importance of closing higher-order motifs for unsupervised link prediction. Finally, these new higher-order motif closures can serve as a basis for studying and developing better unsupervised real-time link prediction and ranking methods.

SAR-Net: A Scenario-Aware Ranking Network for Personalized Fair Recommendation in Hundreds of Travel Scenarios

Qijie Shen
Wanjie Tao
Jing Zhang
Hong Wen
Zulong Chen
Quan Lu

The travel marketing platform of Alibaba serves an indispensable role for hundreds of different travel scenarios from Fliggy, Taobao, Alipay apps, etc. To provide personalized recommendation service for users visiting different scenarios, there are two critical issues to be carefully addressed. First, since the traffic characteristics of different scenarios, e.g., individual data scale or representative topic, are significantly different, it is very challenging to train a unified model to serve all. Second, during the promotion period, the exposure of some specific items will be re-weighted due to manual intervention, resulting in biased logs, which will degrade the ranking model trained using these biased data. In this paper, we propose a novel Scenario-Aware Ranking Network (SAR-Net) to address these issues. SAR-Net harvests the abundant data from different scenarios by learning users' cross-scenario interests via two specific attention modules, which leverage the scenario features and item features to modulate the user behavior features, respectively. Then, taking the encoded features of previous module as input, a scenario-specific linear transformation layer is adopted to further extract scenario-specific features, followed by two groups of debias expert networks, i.e., scenario-specific experts and scenario-shared experts. They output intermediate results independently, which are further fused into the final result by a multi-scenario gating module. In addition, to mitigate the data fairness issue caused by manual intervention, we propose the concept of Fairness Coefficient (FC) to measures the importance of individual sample and use it to reweigh the prediction in the debias expert networks. Experiments on an offline dataset covering over 80 million users and 1.55 million travel items and an online A/B test demonstrate the effectiveness of our SAR-Net and its superiority over state-of-the-art methods. SAR-Net has also been deployed in the online travel marketing platform of Alibaba and is serving hundreds of travel scenarios.

One Model to Serve All: Star Topology Adaptive Recommender for Multi-Domain CTR Prediction

Xiang-Rong Sheng
Liqin Zhao
Guorui Zhou
Xinyao Ding
Binding Dai
Qiang Luo
Siran Yang
Jingshan Lv
Chi Zhang
Hongbo Deng
Xiaoqiang Zhu

Traditional industry recommendation systems usually use data in a single domain to train models and then serve the domain. However, a large-scale commercial platform often contains multiple domains, and its recommendation system often needs to make click-through rate (CTR) predictions for multiple domains. Generally, different domains may share some common user groups and items, and each domain may have its own unique user groups and items. Moreover, even the same user may have different behaviors in different domains. In order to leverage all the data from different domains, a single model can be trained to serve all domains. However, it is difficult for a single model to capture the characteristics of various domains and serve all domains well. On the other hand, training an individual model for each domain separately does not fully use the data from all domains. In this paper, we propose the Star Topology Adaptive Recommender (STAR) model to train a single model to serve all domains by leveraging data from all domains simultaneously, capturing the characteristics of each domain, and modeling the commonalities between different domains. Essentially, the net- work of each domain consists of two factorized networks: one centered network shared by all domains and the domain-specific network tailored for each domain. For each domain, we combine these two factorized networks and generate a unified network by element-wise multiplying the weights of the shared network and those of the domain-specific network, although these two factorized networks can be combined using other functions, which is open for further research. Most importantly, STAR can learn the shared network from all the data and adapt domain-specific parameters according to the characteristics of each domain. The experimental results from production data validate the superiority of the proposed STAR model. Since late 2020, STAR has been deployed in the display advertising system of Alibaba, obtaining 8.0% improvement on CTR and 6.0% increase on RPM (Revenue Per Mille).

From Limited Annotated Raw Material Data to Quality Production Data: A Case Study in the Milk Industry

Roee Shraga
Gil Katz
Yael Badian
Nitay Calderon
Avigdor Gal

Industry 4.0 offers opportunities to combine multiple sensor data sources using IoT technologies for better utilization of raw material in production lines. A common belief that data is readily available (the big data phenomenon), is oftentimes challenged by the need to effectively acquire quality data under severe constraints. In this paper we propose a design methodology, using active learning to enhance learning capabilities, for building a model of production outcome using a constrained amount of raw material training data. The proposed methodology extends existing active learning methods to effectively solve regression-based learning problems and may serve settings where data acquisition requires excessive resources in the physical world. We further suggest a set of qualitative measures to analyze learners performance. The proposed methodology is demonstrated using an actual application in the milk industry, where milk is gathered from multiple small milk farms and brought to a dairy production plant to be processed into cottage cheese.

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Fu Sun
Feng-Lin Li
Ruize Wang
Qianglong Chen
Xingyi Cheng
Ji Zhang

Knowledge enhanced pre-trained language models (K-PLMs) are shown to be effective for many public tasks in the literature, but few of them have been successfully applied in practice. To address this problem, we propose K-AID, a systematic approach that includes a low-cost knowledge acquisition process for acquiring domain knowledge, an effective knowledge infusion module for improving model performance, and a knowledge distillation component for reducing the model size and deploying K-PLMs on resource-restricted devices (e.g., CPU) for real-world application. Importantly, instead of capturing entity knowledge like the majority of existing K-PLMs, our approach captures relational knowledge, which contributes to better improving sentence-level text classification and text matching tasks that play a key role in question answering (QA). We conducted a set of experiments on five text classification tasks and three text matching tasks from three domains, namely E-commerce, Government, and Film&TV, and performed online A/B tests in E-commerce. Experimental results show that our approach is able to achieve substantial improvement on sentence-level question answering tasks and bring beneficial business value in industrial settings.

GEDIT: Geographic-Enhanced and Dependency-Guided Tagging for Joint POI and Accessibility Extraction at Baidu Maps

Yibo Sun
Jizhou Huang
Chunyuan Yuan
Miao Fan
Haifeng Wang
Ming Liu
Bing Qin

Providing timely accessibility reminders (such as closed and relocated) of a point-of-interest (POI) plays a vital role in improving user satisfaction of finding places and making visiting decisions. However, it is difficult to keep the POI database in sync with the real-world counterparts due to the dynamic nature of business changes and innovations. To alleviate this problem, we formulate and present a practical solution that jointly extracts POI mentions and identifies their coupled accessibility labels from unstructured text (hereafter referred to as joint POI and accessibility extraction). We approach this task as a sequence tagging problem, where the goal is to produce (POI name, accessibility label) pairs from unstructured text. This task is challenging because of two main issues: (1) POI names are often newly-coined words so as to successfully register new entities or brands and (2) there may exist multiple pairs in the text, which necessitates dealing with one-to-many or many-to-one mapping to make each POI coupled with its matching accessibility label. To this end, we propose a Geographic-Enhanced and Dependency-guIded sequence Tagging (GEDIT) model to concurrently address the two challenges. First, to alleviate challenge #1, we develop a geographic-enhanced pre-trained model to learn the text representations, which is able to significantly relieve the problem of newly-coined words. Second, to mitigate challenge #2, we apply a relational graph convolutional network to learn the tree node representations from the parsed dependency tree, which enables us to establish a correlation between a POI and its accessibility label. Finally, we construct a neural sequence tagging model by integrating and feeding the previously pre-learned representations into a CRF layer. Extensive experiments conducted on a real-world dataset demonstrate the superiority and effectiveness of GEDIT. In addition, it has already been deployed in production at Baidu Maps, and it successfully keeps processing hundreds of thousands of Web documents every week. Statistics show that the proposed solution can save significant human effort and labor costs to deal with the same amount of documents, which confirms that it is a practical way for POI accessibility maintenance.

AudiBERT: A Deep Transfer Learning Multimodal Classification Framework for Depression Screening

Ermal Toto
ML Tlachac
Elke A. Rundensteiner

Depression is a leading cause of disability with tremendous socioeconomic costs. In spite of early detection being crucial to improving prognosis, this mental illness remains largely undiagnosed. Depression classification from voice holds the promise to revolutionize diagnosis by ubiquitously integrating this screening capability into virtual assistants and smartphone technologies. Unfortunately, due to privacy concerns, audio datasets with depression labels have a small number of participants, causing current classification models to suffer from low performance. To tackle this challenge, we introduce Audio-Assisted BERT (AudiBERT), a novel deep learning framework that leverages the multimodal nature of human voice. To alleviate the small data problem, AudiBERT integrates pretrained audio and text representation models for the respective modalities augmented by a dual self-attention mechanism into a deep learning architecture. AudiBERT applied to depression classification consistently achieves promising performance with an increase in F1 scores between 6% and 30% compared to state-of-the-art audio and text models for 15 thematic question datasets. Using answers from medically targeted and general wellness questions, our framework achieves F1 scores of up to 0.92 and 0.86, respectively, demonstrating the feasibility of depression screening from informal dialogue using voice-enabled technologies.

WikiCheck: An End-to-end Open Source Automatic Fact-Checking API based on Wikipedia

Mykola Trokhymovych
Diego Saez-Trumper

With the growth of fake news and disinformation, the NLP community has been working to assist humans in fact-checking. However, most academic research has focused on model accuracy without paying attention to resource efficiency, which is crucial in real-life scenarios. In this work, we review the State-of-the-Art datasets and solutions for Automatic Fact-checking and test their applicability in production environments. We discover overfitting issues in those models, and we propose a data filtering method that improves the model's performance and generalization. Then, we design an unsupervised fine-tuning of the Masked Language models to improve its accuracy working with Wikipedia. We also propose a novel query enhancing method to improve evidence discovery using the Wikipedia Search API. Finally, we present a new fact-checking system, the WikiCheck API that automatically performs a facts validation process based on the Wikipedia knowledge base. It is comparable to SOTA solutions in terms of accuracy and can be used on low-memory CPU instances.

Toward an Effective Black-Box Adversarial Attack on Functional JavaScript Malware against Commercial Anti-Virus

Yun-Da Tsai
ChengKuan Chen
Shou-De Lin

Machine learning has been a rising technique in signatureless malware detection and is popular in the anti-virus industry. Despite the powerful ability of machine learning, it is known to be vulnerable to attack by injecting specially crafted input noise (adversarial example). In this paper, we develop a systematic attack method that is effective, general and also efficient which automatically generates functional malware. Experiment results showed that such adversarial malware could deceive commercial anti-virus and completely defeat learning-based malware detector provided by a well-known anti-virus vendor. We further examine the effectiveness of our approach on multiple anti-virus engines on VirusTotal and investigate the transferability of our proposed method between different features and classification algorithms. Finally, we show how our attack could resist JavaScript de-obfuscation techniques.

SAUCE: Truncated Sparse Document Signature Bit-Vectors for Fast Web-Scale Corpus Expansion

Muntasir Wahed
Daniel Gruhl
Alfredo Alba
Anna Lisa Gentile
Petar Ristoski
Chad DeLuca
Steve Welch
Ismini Lourentzou

Recent advances in text representation have shown that training on large amounts of text is crucial for natural language understanding. However, models trained without predefined notions of topical interest typically require careful fine-tuning when transferred to specialized domains. When a sufficient amount of within-domain text may not be available, expanding a seed corpus of relevant documents from large-scale web data poses several challenges. First, corpus expansion requires scoring and ranking each document in the collection, an operation that can quickly become computationally expensive as the web corpora size grows. Relying on dense vector spaces and pairwise similarity adds to the computational expense. Secondly, as the domain concept becomes more nuanced, capturing the long tail of domain-specific rare terms becomes non-trivial, especially under limited seed corpora scenarios.

In this paper, we consider the problem of fast approximate corpus expansion given a small seed corpus with a few relevant documents as a query, with the goal of capturing the long tail of a domain-specific set of concept terms. To efficiently collect large-scale domain-specific corpora with limited relevance feedback, we propose a novel truncated sparse document bit-vector representation, termed Signature Assisted Unsupervised Corpus Expansion (SAUCE). Experimental results show that SAUCE can reduce the computational burden while ensuring high within-domain lexical coverage.

Fulfillment-Time-Aware Personalized Ranking for On-Demand Food Recommendation

Haishuai Wang
Zhao Li
Xuanwu Liu
Donghui Ding
Zehong Hu
Peng Zhang
Chuan Zhou
Jiajun Bu

On-demand food delivery (OFD) platforms have greatly impacted the food service industry, where OFD recommendation systems play a central role in enhancing user experience and raising revenues. OFD recommendation, compared with existing online e-commerce recommendation systems, needs to put more emphasis on fulfillment time related variables, because the order fulfillment cycle time (OFCT) which refers to the time elapsed between a user placing a food order and receiving the food significantly influences a user's choice from the recommended items. In this paper, we investigate the OFCT related information and propose a Fulfillment-Time-Aware Personalized Ranking (FTAPR) method for recommendation. FTAPR mainly consists of three components. First, Transformers are used to estimate OFCT based on a large amount of user order sequences. Then, the predicted OFCT and other OFCT related features are fused and encoded by a deep & cross network to learn fulfillment time related feature representation. At the last step, the time bias representation from the deep & cross network is integrated into the ranking system to deliver final search results. Extensive offline and online experiments on real-world datasets collected from one of China's largest OFD platforms Ele.me show the superiority of our model, e.g., an online A/B testing shows that FTAPR brings 1.3% and 2.5% gains in CTR and CVR compared with baselines.

Influence Maximization in Multi-Relational Social Networks

Wei Wang
Haili Yang
Yuanfu Lu
Yuanhang Zou
Xu Zhang
Shuting Guo
Leyu Lin

Influence maximization (IM) is a classic problem, which aims to find a set of k users (called seed set) in a social network such that the expected number of users influenced by the seed users is maximized. Existing IM algorithms mainly focus on one-by-one influence diffusion among users with friendships. However, in addition to 1-to-1 friendships, 1-to-N group relations usually exist in real social platforms, which are seldom fully exploited by conventional methods.

In this paper, with the real-world datasets in WeChat, the largest online social platform in China, we first study the IM problem in multi-relational social networks consisting of friendships and group relations, and propose a novel Generate&Extend framework to find influential seed users for product promotion. Specifically, to achieve a trade-off between effectiveness and efficiency, we present a truncated meta-seed generator to select a small number of users, which are influential with consideration of both friendships and group relations. More importantly, a structural seed extender is put forward to extend the meta-seed set, so as to encode the differentiated propagation structures between friendships and group relations. Extensive online/offline experiments on three real-world datasets demonstrate that Generate&Extend significantly outperforms the state of the arts. Our Generate&Extend has been deployed at WeChat for mini-program promoting, and severing more than 200 million users.

Efficient Learning to Learn a Robust CTR Model for Web-scale Online Sponsored Search Advertising

Xin Wang
Peng Yang
Shaopeng Chen
Lin Liu
Lian Zhao
Jiacheng Guo
Mingming Sun
Ping Li

Click-through rate (CTR) prediction is crucial for online sponsored search advertising. Several successful CTR models have been adopted in the industry, including the regularized logistic regression (LR). Nonetheless, the learning process suffers from two limitations: 1) Feature crosses for high-order information may generate trillions of features, which are sparse for online learning examples; 2) Rapid changing of data distribution brings challenges to the accurate learning since the model has to perform a fast adaptation on the new data. Moreover, existing adaptive optimizers are ineffective in handling the sparsity issue for high-dimensional features.

In this paper, we propose to learn an optimizer in a meta-learning scenario, where the optimizer is learned on prior data and can be easily adapted to the new data. We firstly build a low-dimensional feature embedding on prior data to encode the association among features. Then, the gradients on new data can be decomposed into the low-dimensional space, enabling the parameter update smoothed and relieving the sparsity. Note that this technology could be deployed into a distributed system to ensure efficient online learning on the trillions-level parameters. We conduct extensive experiments to evaluate the algorithm in terms of prediction accuracy and actual revenue. Experimental results demonstrate that the proposed framework achieves a promising prediction on the new data. The final online revenue is noticeably improved compared to the baseline. This framework was initially deployed in Baidu Search Ads (a.k.a. Phoenix Nest) in 2014 and is currently still being used in certain modules of Baidu's ads systems.

'Could You Describe the Reason for the Transfer?': A Reinforcement Learning Based Voice-Enabled Bot Protecting Customers from Financial Frauds

Zihao Wang
Fudong Wang
Haipeng Zhang
Minghui Yang
Shaosheng Cao
Zujie Wen
Zhe Zhang

With the booming of the Internet finance and e-payment business, telecom and online fraud has become a serious problem which grows rapidly. In China, 351 billion RMB (approximately 0.3% of China's GDP) was lost in 2018 due to telecommunication and online fraud, influencing tens of millions of individual customers. Anti-fraud algorithms have been widely adopted by major Internet finance companies to detect and block transactions induced by scam. However, due to limited contextual information, most systems would probably mistakenly block the normal transactions, leading to poor user experience. On the other hand, if the transactions induced by scam are detected yet not fully explained to the users, the users will continue to pay, suffering from direct financial losses.

To address these problems, we design a voice-enabled bot that interacts with the customers who are involved with potential telecommunication and online frauds decided by the back-end system. The bot seeks additional information from the customers through natural conversations to confirm whether the customers are scammed and identify the actual fraud types. The details about the frauds are then provided to convince the customers that they are on the edge of being scammed. Our bot adopts offline reinforcement learning (RL) to learn dialogue policies from real-world human-human chat logs. During the conversations, our bot also identifies fraud types every turn based on the dialogue state.

The bot proposed outperforms baseline dialogue strategies by 2.8% in terms of task success rate, and 5% in terms of dialogue accuracy in offline evaluations. Furthermore, in the 8 months of real-world deployment, our bot lowers the dissatisfaction rate by 25% and increases the fraud prevention rate by 135% relatively, indicating a significant improvement in user experience as well as anti-fraud effectiveness. More importantly, we help prevent millions of users from being deceived, and avoid trillions of financial losses.

Flexible O&M for Telecom Networks at Huawei: A Language Model-based Approach

Shuang Wu
Zhen Qin
Zhen Wang
Siwei Rao
Qiang Ye
Xingyue Quan
Guangjian Tian

Flexible operation and maintenance (O&M) is critical for telecommunication (telecom) service providers due to the ever-growing communication networks. Currently, most O&M operations still rely on rule-based strategies, which only cover limited scenarios and are costly to extend for novel applications as expert knowledge is intensively involved. To build a more flexible O&M system, we propose a language model to extract useful representations out of massive network signaling messages and use the representations to perform downstream O&M tasks. Given that a vanilla language model is not directly applicable for the structured signaling messages, we develop an expert-knowledge-inspired statistical approach to preprocess the messages and a hierarchical network architecture to extract message relations among different levels. Moreover, network messages in the real world are often contaminated, which can mislead the language model to learn incorrect message patterns. To mitigate data contamination, we propose a reverse training method that prevents the language model from learning the contaminated data. We collected hundreds of thousands of signaling message flows to train the proposed signaling language model and applied the trained model to O&M tasks. Offline experiments show that our proposed language model captures various signaling protocols and the extracted representations enable us to achieve expert-level performance in network anomaly detection and service recognition. Our language model has been deployed online at Huawei and significantly improved O&M efficiency.

Crawler Detection in Location-Based Services Using Attributed Action Net

Wei Xia
Fei Zhao
Haishuai Wang
Peng Zhang
Anhui Wang
Kang Li

Malicious Web crawlers threaten information system due to heavily taking up bandwidth resources and stealing private user data. Ele.me, a prevalent on-demand food delivery platform in China, suffers from the negative impact of crawlers. The crawler detection systems face two major challenges: spatial patterns of the crawler behaviors and limited labeled data for training. In this paper, we present efficient solutions to tackle these challenges. Specifically, we propose a new Attributed Action Net (AANet for short) model to detect Location-Based Services~(LBS) crawlers and a three-stage learning framework to train the model. AANet consists of three different embedding modules, including the action token sequence, temporal-spatial attributes of users, and the context information of the raw data. We have deployed the model at Ele.me, and both offline experiments and online A/B tests show that the proposed method is superior to the state-of-the-art models for sequence data classification on the food delivery platform.

Explore, Filter and Distill: Distilled Reinforcement Learning in Recommendation

Ruobing Xie
Shaoliang Zhang
Rui Wang
Feng Xia
Leyu Lin

Reinforcement learning (RL) has been verified in real-world list-wise recommendation. However, RL-based recommendation suffers from huge memory and computation costs due to its large-scale models. Knowledge distillation (KD) is an effective approach for model compression widely used in practice. However, RL-based models strongly rely on sufficient explorations on the enormous user-item space due to the data sparsity issue, which multiplies the challenges of KD with RL models. What the teacher should teach and how much the student should learn from each lesson need to be carefully designed. In this work, we propose a novel Distilled reinforcement learning framework for recommendation (DRL-Rec), which aims to improve both effectiveness and efficiency in list-wise recommendation. Specifically, we propose an Exploring and filtering module before the distillation, which decides what lessons the teacher should teach from both teachers' and students' aspects. We also conduct a Confidence-guided distillation at both output and intermediate levels with a list-wise KL divergence loss and a Hint loss, which aims to understand how much the student should learn for each lesson. We achieve significant improvements on both offline and online evaluations in a well-known recommendation system. DRL-Rec has been deployed on WeChat Top Stories for more than six months, affecting millions of users. The source codes are released in https://github.com/modriczhang/DRL-Rec.

CausCF: Causal Collaborative Filtering for Recommendation Effect Estimation

Xu Xie
Zhaoyang Liu
Shiwen Wu
Fei Sun
Cihang Liu
Jiawei Chen
Jinyang Gao
Bin Cui
Bolin Ding

To improve user experience and profits of corporations, modern industrial recommender systems usually aim to select the items that are most likely to be interacted with (e.g., clicks and purchases). However, they overlook the fact that users may purchase the items even without recommendations. The real effective items are the ones that can contribute to purchase probability uplift. To select these effective items, it is essential to estimate the causal effect of recommendations. Nevertheless, it is difficult to obtain the real causal effect since we can only recommend or not recommend an item to a user at one time. Furthermore, previous works usually rely on the randomized controlled trial (RCT) experiment to evaluate their performance. However, it is usually not practicable in the recommendation scenario due to its expensive experimental cost. To tackle these problems, in this paper, we propose a causal collaborative filtering (CausCF) method inspired by the widely adopted collaborative filtering (CF) technique. It is based on the idea that similar users not only have a similar taste on items but also have similar treatment effects under recommendations. CausCF extends the classical matrix factorization to the tensor factorization with three dimensions---user, item, and treatment. Furthermore, we also employ regression discontinuity design (RDD) to evaluate the precision of the estimated causal effects from different models. With the testable assumptions, RDD analysis can provide an unbiased causal conclusion without RCT experiments. Through dedicated experiments on both offline and online experiments, we demonstrate the effectiveness of our proposed CausCF on the causal effect estimation and ranking performance improvement.

Jura: Towards Automatic Compliance Assessment for Annual Reports of Listed Companies

Zhengqi Xu
Yixuan Cao
Rongyu Cao
Guoxiang Li
Xuanqiang Liu
Yan Pang
Yangbin Wang
Jianfei Zhang
Allie Cheung
Matthew Tam
Lukas Petrikas
Ping Luo

The initial public offering (IPO) market in Hong Kong is consistently one of the largest in the world. As part of its regulatory responsibilities, Hong Kong Exchanges and Clearing Limited (HKEX) reviews annual reports published by listed companies (issuers). The number of issuers has grown at a fast pace, reaching 2,538 as the end of 2020. This poses a challenge for manually reviewing these annual reports against the many diverse regulatory obligations (listing rules). We propose a system named Jura to improve the efficiency of annual report reviewing with the help of machine learning methods. This system checks the compliance of an issuer's published information against listing rules in four steps: panoptic document recognition, relevant passage location, fine-grained information extraction, and compliance assessment. This paper introduces in detail the passage location step, how it is critical for speeding up compliance assessment, and the various challenges faced. We argue that although a passage is a relatively independent unit, it needs to be combined with document structure and contextual information to accurately locate the relevant passages. With the help of Jura, HKEX reports saving 80% of the time on reviewing issuers' annual reports.

Contextual Skill Proficiency via Multi-task Learning at LinkedIn

Xiao Yan
Ada Ma
Jaewon Yang
Lin Zhu
How Jing
Jacob Bollinger
Qi He

The ability to infer an individual's expertise for a given skill has proven to be crucial in creating economic opportunity for every talent of the global workforce. Applications ranging from recommending relevant job opportunities to talents to providing better candidate suggestions to recruiters, all benefit from deep understanding of the skill "proficiency" of the talent pool.

LinkedIn's "Skill" profile section can be leveraged in this expert finding task. Whereas it is easy to incentivize members to put skills on their profile, estimating members' expertise is much more challenging for several reasons. First, the collection of ground-truth data at scale can be expensive and challenging. Second, "being proficient at a certain skill" can have very different meaning in different contexts -- a professor in machine learning having deep theoretical knowledge might lack the practical skill for implementing a large-scale recommendation system unlike experienced ML practitioners.

We present our proposed framework to infer a member's expertise in a certain skill based upon a multi-view, multi-task learning scheme that incorporates signals from multiple contexts. We show the efficacy of the proposed framework with offline evaluation results as well as online A/B testing in multiple products, from finding experts among friends, to recommending jobs to qualified members. We also show that our estimated proficiency can help alleviate the cold-start problem when applied to a new context (i.e., through transfer learning) where only a small amount of labeled data is needed to achieve reasonable performance. Finally, we share the insights that demonstrate the talent market is shocked disproportionately among members with different skill proficiency levels by COVID-19.

CADRE: A Cloud-Based Data Service for Big Bibliographic Data

Xiaoran Yan
Guangchen Ruan
Dimitar Nikolov
Matthew Hutchinson
Chathuri Peli Kankanamalage
Ben Serrette
James McCombs
Alan Walsh
Esen Tuna
Valentin Pentchev

Large bibliographic data sets hold the promise of revolutionizing the scientific enterprise when combined with state-of-the-science computational capabilities. Providing high-quality data services for large network datasets such as the Microsoft Academic Graph, which contains more than two billion citation links, poses significant difficulties for universities. Data systems based on the property graph model are capable of delivering efficient graph query services for large networks. However, real-life queries often combine multiple types of data models. To satisfy the needs of different user groups, we developed and deployed a cloud-based data system consisting of scalable graph and text-indexed query engines. For non-expert users, the property graph model also presents a technological barrier. To alleviate the steep learning curve, we designed an intuitive graphical user interface for query-building. For advanced users, a scalable notebook service in our platform provides a more flexible computing environments where the query results can be further analyzed. These systems form the data-backbone of the Collaborative Archive and Data Research Environment (CADRE), which provides efficient and high-quality bibliographic data services to eleven large public universities in North America.

Seasonal Relevance in E-Commerce Search

Haode Yang
Parth Gupta
Roberto Fernández Galán
Dan Bu
Dongmei Jia

Seasonality is an important dimension for relevance in e-commerce search. For example, a query jacket has a different set of relevant documents in winter than summer. For an optimal user experience, the e-commerce search engines should incorporate seasonality in product search. In this paper, we formally introduce the concept of seasonal relevance, define it and quantify using data from a major e-commerce store. In our analyses, we find 39% queries are highly seasonally relevant to the time of search and would benefit from handling seasonality in ranking. We propose LogSR and VelSR features to capture product seasonality using state-of-the-art neural models based on self-attention. Comprehensive offline and online experiments over large datasets show the efficacy of our methods to model seasonal relevance. The online A/B test on 784 MM queries shows the treatment with seasonal relevance features results in 2.20% higher purchases and better customer experience overall.

On the Diversity and Explainability of Recommender Systems: A Practical Framework for Enterprise App Recommendation

Wenzhuo Yang
Jia Li
Chenxi Li
Latrice Barnett
Markus Anderle
Simo Arajarvi
Harshavardhan Utharavalli
Caiming Xiong
Steven HOI

This paper introduces an enterprise app recommendation problem with a new "to-business'' use case, which aims to assist a sales team acting as the bridge connecting the applications and developers with the customers who apply these apps to solve their business problems. Our recommender system is an assistant to the sales team, helping recommend relevant apps to the customers for their businesses and increasing the likelihood of improving sales revenue. Besides recommendation accuracy, recommendation diversity and explainability are even more crucial since they provide more exposure opportunities for app developers and improve the transparency and trustworthiness of the recommender system. To allow the sales team to explore unpopular but relevant apps and understand why such apps are recommended, we propose a novel framework for improving aggregate recommendation diversity and generating recommendation explanations, which supports a wide variety of models for improving recommendation accuracy. The model in our framework is simple yet effective, which can be trained in an end-to-end manner and deployed as a recommendation service easily. Furthermore, our framework can also apply to other generic recommender systems for improving diversity and generating explanations. Experiments on public and private datasets demonstrate the effectiveness of our framework and solution.

Cascaded Deep Neural Ranking Models in LinkedIn People Search

Zimeng Yang
Song Yan
Abhimanyu Lad
Xiaowei Liu
Weiwei Guo

LinkedIn connects the world's professionals to make them more productive and successful. People Search plays an important role in fulfilling this goal by helping members find the most relevant and personalized results through a broad range of queries like names, job titles, skills, companies, locations, etc. It is one of the biggest search verticals at LinkedIn both in terms of engineering footprint and search traffic. In this paper, we present an overview of the People Search system, and discuss how we build and serve deep neural network (DNN) models, leveraging state-of-the-art deep natural language processing (NLP) techniques (e.g., convolutional neural networks (CNN) and Bidirectional Encoder Representations from Transformers (BERT)). We describe our journey of applying deep neural ranking models to a real-life product, including the modeling and system bottleneck challenges, crucial design choices, and lessons learned along the way. We hope a story of our endeavors and successes will provide meaningful insights to other similar systems.

Self-supervised Learning for Large-scale Item Recommendations

Tiansheng Yao
Xinyang Yi
Derek Zhiyuan Cheng
Felix Yu
Ting Chen
Aditya Menon
Lichan Hong
Ed H. Chi
Steve Tjoa
Jieqi (Jay) Kang
Evan Ettinger

Large scale recommender models find most relevant items from huge catalogs, and they play a critical role in modern search and recommendation systems. To model the input space with large-vocab categorical features, a typical recommender model learns a joint embedding space through neural networks for both queries and items from user feedback data. However, with millions to billions of items in the corpus, users tend to provide feedback for a very small set of them, causing a power-law distribution. This makes the feedback data for long-tail items extremely sparse.

Inspired by the recent success in self-supervised representation learning research in both computer vision and natural language understanding, we propose a multi-task self-supervised learning (SSL) framework for large-scale item recommendations. The framework is designed to tackle the label sparsity problem by learning better latent relationship of item features. Specifically, SSL improves item representation learning as well as serving as additional regularization to improve generalization. Furthermore, we propose a novel data augmentation method that utilizes feature correlations within the proposed framework.

We evaluate our framework using two real-world datasets with 500M and 1B training examples respectively. Our results demonstrate the effectiveness of SSL regularization and show its superior performance over the state-of-the-art regularization techniques. We also have already launched the proposed techniques to a web-scale commercial app-to-app recommendation system, with significant improvements top-tier business metrics demonstrated in A/B experiments on live traffic. Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.

Online Multi-horizon Transaction Metric Estimation with Multi-modal Learning in Payment Networks

Chin-Chia Michael Yeh
Zhongfang Zhuang
Junpeng Wang
Yan Zheng
Javid Ebrahimi
Ryan Mercer
Liang Wang
Wei Zhang

Predicting metrics associated with entities' transnational behavior within payment processing networks is essential for system monitoring. Multivariate time series, aggregated from the past transaction history, can provide valuable insights for such prediction. The general multivariate time series prediction problem has been well studied and applied across several domains, including manufacturing, medical, and entomology. However, new domain-related challenges associated with the data such as concept drift and multi-modality have surfaced in addition to the real-time requirements of handling the payment transaction data at scale. In this work, we study the problem of multivariate time series prediction for estimating transaction metrics associated with entities in the payment transaction database. We propose a model with five unique components to estimate the transaction metrics from multi-modality data. Four of these components capture interaction, temporal, scale, and shape perspectives, and the fifth component fuses these perspectives together. We also propose a hybrid offline/online training scheme to address concept drift in the data and fulfill the real-time requirements. Combining the estimation model with a graphical user interface, the prototype transaction metric estimation system has demonstrated its potential benefit as a tool for improving a payment processing company's system monitoring capability.

Multi-modal Dictionary BERT for Cross-modal Video Search in Baidu Advertising

Tan Yu
Yi Yang
Yi Li
Lin Liu
Mingming Sun
Ping Li

Due to their attractiveness, video advertisements are adored by advertisers. Baidu, as one of the leading search advertisement platforms in China, is putting more and more effort into video advertisements for its advertisement customers. Search-based video advertisement display is, in essence, a cross-modal retrieval problem, which is normally tackled through joint embedding methods. Nevertheless, due to the lack of interactions between text features and image features, joint embedding methods cannot achieve as high accuracy as its counterpart based on attention. Inspired by the great success achieved by BERT in NLP tasks, many cross-modal BERT models emerge and achieve excellent performance in cross-modal retrieval. Last year, Baidu also launched a cross-modal BERT, CAN, in video advertisement platform, and achieved considerably better performance than the previous joint-embedding model. In this paper, we present our recent work for video advertisement retrieval, Multi-modal Dictionary BERT (MDBERT) model. Compared with CAN and other cross-modal BERT models, MDBERT integrates a joint dictionary, which is shared among video features and word features. It maps the relevant word features and video features into the same codeword and thus fosters effective cross-modal attention. To support end-to-end training, we propose to soften the codeword assignment. Meanwhile, to enhance the inference efficiency, we adopt the product quantization to achieve fine-level feature space partition at a low cost. After launching MDBERT in Baidu video advertising platform, the conversion ratio (CVR) increases by 3.34%, bringing a considerable revenue boost for advertisers in Baidu.

CHASE: Commonsense-Enriched Advertising on Search Engine with Explicit Knowledge

Chao Zhang
Jingbo Zhou
Xiaoling Zang
Qing Xu
Liang Yin
Xiang He
Lin Liu
Haoyi Xiong
Dejing Dou

While online advertising is one of the major sources of income for search engines, pumping up the incomes from business advertisements while ensuring the user experience becomes a challenging but emerging area. Designing high-quality advertisements with persuasive content has been proved as a way to increase revenues through improving the Click-Through Rate (CTR). However, it is difficult to scale up the design of high-quality ads, due to the lack of automation in creativity. In this paper, we present Commonsense-Enriched Advertisement on Search Engine (CHASE) --- a system for the automatic generation of persuasive ads. CHASE adopts a specially designed language model that fuses the keywords, commonsense-related texts, and marketing contents to generate persuasive advertisements. Specifically, the language model has been pre-trained using massive contents of explicit knowledge and fine-tuned with well-constructed quasi-parallel corpora with effective control of the proportion of commonsense in the generated ads and fitness to the ads' keywords. The effectiveness of the proposed method CHASE has been verified by real-world web traffics for search and manual evaluation. In A/B tests, the advertisements generated by CHASE would bring 11.13% CTR improvement. The proposed model has been deployed to cover three advertisement domains (which are kid education, psychological counseling, and beauty e-commerce) at Baidu, the world's largest Chinese search engine, with adding revenue of about 1 million RMB (Chinese Yuan) per day.

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction

Danqing Zhang
Zheng Li
Tianyu Cao
Chen Luo
Tony Wu
Hanqing Lu
Yiwei Song
Bing Yin
Tuo Zhao
Qiang Yang

We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: named entity recognition (NER) and attribute value normalization (AVN). However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO.

CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms

Yingying Zhang
Zhengxiong Guan
Huajie Qian
Leili Xu
Hengbo Liu
Qingsong Wen
Liang Sun
Junwei Jiang
Lunting Fan
Min Ke

As business of Alibaba expands across the world among various industries, higher standards are imposed on the service quality and reliability of big data cloud computing platforms which constitute the infrastructure of Alibaba Cloud. However, root cause analysis in these platforms is non-trivial due to the complicated system architecture. In this paper, we propose a root cause analysis framework called CloudRCA which makes use of heterogeneous multi-source data including Key Performance Indicators (KPIs), logs, as well as topology, and extracts important features via state-of-the-art anomaly detection and log analysis techniques. The engineered features are then utilized in a Knowledge-informed Hierarchical Bayesian Network (KHBN) model to infer root causes with high accuracy and efficiency. Ablation study and comprehensive experimental comparisons demonstrate that, compared to existing frameworks, CloudRCA 1) consistently outperforms existing approaches in f1-score across different cloud systems; 2) can handle novel types of root causes thanks to the hierarchical structure of KHBN; 3) performs more robustly with respect to algorithmic configurations; and 4) scales more favorably in the data and feature sizes. Experiments also show that a cross-platform transfer learning mechanism can be adopted to further improve the accuracy by more than 10%. CloudRCA has been integrated into the diagnosis system of Alibaba Cloud and employed in three typical cloud computing platforms including MaxCompute, Realtime Compute and Hologres. It saves Site Reliability Engineers (SREs) more than 20% in the time spent on resolving failures in the past twelve months and improves service reliability significantly.

HierST: A Unified Hierarchical Spatial-temporal Framework for COVID-19 Trend Forecasting

Shun Zheng
Zhifeng Gao
Wei Cao
Jiang Bian
Tie-Yan Liu

The outbreak of the COVID-19 pandemic has largely influenced the world and our normal daily lives. To combat this pandemic efficiently, governments usually need to coordinate essential resources across multiple regions and adjust intervention polices at the right time, which all call for accurate and robust forecasting of future epidemic trends. However, designing such a forecasting system is non-trivial, since we need to handle all kinds of locations at different administrative levels, which include pretty different epidemic-evolving patterns. Moreover, there are dynamic and volatile correlations of pandemic conditions among these locations, which further enlarge the difficulty in forecasting. With these challenges in mind, we develop a novel spatial-temporal forecasting framework. First, to accommodate all kinds of locations at different administrative levels, we propose a unified hierarchical view, which mimics the aggregation procedure of pandemic statistics. Then, this view motivates us to facilitate joint learning across administrative levels and inspires us to design the cross-level consistency loss as an extra regularization to stabilize model training. Besides, to capture those dynamic and volatile spatial correlations, we design a customized spatial module with adaptive edge gates, which can both reinforce effective messages and disable irrelevant ones. We put this framework into production to help the battle against COVID-19 in the United States. A comprehensive online evaluation across three months demonstrates that our projections are the most competitive ones among all results produced by dozens of international group and even surpass the official ensemble in many cases. We also visualize our unique edge gates to understand the evolvement of spatial correlations and present intuitive case studies. Besides, we open source our implementation at https://github.com/dolphin-zs/HierST to facilitate future research towards better epidemic modeling.

Learning to Pack: A Data-Driven Tree Search Algorithm for Large-Scale 3D Bin Packing Problem

Qianwen Zhu
Xihan Li
Zihan Zhang
Zhixing Luo
Xialiang Tong
Mingxuan Yuan
Jia Zeng

The 3-dimensional bin packing problem (3D-BPP) is not only fundamental in combinatorial optimization but also widely applied in real world logistics. In the modern logistics industry, the complexity of constraints, heterogeneity of cargoes and scale of orders are dramatically increased, leading to great challenges to devise packing plans up to standard. While the tree search algorithm is proved to be a successful paradigm to solve the 3D-BPP, it is too time-consuming to be applied in the aforementioned large-scale scenarios. To overcome the limitation, we propose a data-driven tree search algorithm (DDTS) to tackle the 3D-BPP. The solution space with complicated constraints is explored by a tree search algorithm, and a convolutional neural network trained with historical data guides pruning the tree so as to accelerate the search process. Computational experiments on real-world datasets show that our algorithm outperforms the state-of-the-art approach with a loading rate improvement of 2.47%. Moreover, the deep learning technique increases searching efficiency by 37.14% with only 0.04% performance loss. The algorithm has been deployed in Huawei Logistics System, which increases the loading rate by 3% and could reduce the logistics cost by millions of dollars per year. To the best of our knowledge, we are the first to embed pruning networks into tree search for the large-scale 3D-BPP.

SESSION: Resource Paper Track

Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation

Negar Arabzadeh
Amin Bigdeli
Shirin Seyedsalehi
Morteza Zihayat
Ebrahim Bagheri

Researchers have already shown that it is possible to improve retrieval effectiveness through the systematic reformulation of users' queries. Traditionally, most query reformulation techniques relied on unsupervised approaches such as query expansion through pseudo-relevance feedback. More recently and with the increasing effectiveness of neural sequence-to-sequence architectures, the problem of query reformulation has been studied as a supervised query translation problem, which learns to rewrite a query into a more effective alternative. While quite effective in practice, such supervised query reformulation methods require a large number of training instances. In this paper, we present three large-scale query reformulation datasets, namely Diamond, Platinum and Gold datasets, based on the queries in the MS MARCO dataset. The Diamond dataset consists of over 188,000 query pairs where the original source query is matched with an alternative query that has a perfect retrieval effectiveness (an average precision of 1). To the best of our knowledge, this is the first set of datasets for supervised query reformulation that offers perfect query reformulations for a large number of queries. The implementation of our fully automated tool, which is based on a transformer architecture, and our three datasets are made publicly available. We also establish a neural query reformulation baseline performance on our datasets by reporting the performance of strong neural query reformulation baselines. It is our belief that our datasets will significantly impact the development of supervised query reformulation methods in the future.

MS MARCO Chameleons: Challenging the MS MARCO Leaderboard with Extremely Obstinate Queries

Negar Arabzadeh
Bhaskar Mitra
Ebrahim Bagheri

During the recent years and with the growing influence of neural architectures, tasks such as ad hoc retrieval have witnessed an impressive improvement in performance. For instance, the performance of rankers on the passage retrieval task on the MS MARCO dataset has improved by an order of magnitude in less than two years. In this paper, we go beyond the overall performance of the state of the art rankers and empirically study their performance from a finer-grained perspective. We find that while neural rankers have been able to consistently improve performance, this has been in part thanks to a specific set of queries from within the larger query set. We systematically show that there are subsets of queries that are difficult for each and every one of the neural rankers, which we refer to as obstinate queries. We show the obstinate queries are similar to easier queries in terms of their number of available relevant judgement documents and the length of the query itself but they are extremely more difficult to satisfy by existing rankers. Furthermore, we observe that query reformulation methods cannot help these queries. On this basis, we present three datasets derived from the MS MARCO Dev set, called the MS MARCO Chameleon datasets. We believe that the next breakthrough in performance would need to necessarily consider the queries in the MS MARCO Chameleons, as such, propose that a well-rounded evaluation strategy for any new ranker would need to include performance measures on both the overall MS MARCO dataset as well as the proposed MS MARCO Chameleon datasets.

VidLife: A Dataset for Life Event Extraction from Videos

Tai-Te Chu
An-Zi Yen
Wei-Hong Ang
Hen-Hsen Huang
Hsin-Hsi Chen

Filming video blogs, which is shortened to vlog, becomes a popular way for people to record their life experiences in recent years. In this work, we present a novel task that is aimed at extracting life events from videos and constructing personal knowledge bases of individuals. In contrast to most existing researches in the field of computer vision that focus on identifying low-level script-like activities such as moving boxes, our goal is to extract life events where high-level activities like moving into a new house are recorded. The challenges to be tackled include: (1) identifying which objects in a given scene related to the life events of the protagonist we concern, and (2) determining the association between an extracted visual concept and a more high-level description of a video clip. To address the research issues, we construct a video life event extraction dataset VidLife by exploiting videos from the TV series The Big Bang Theory, in which the plot is around the daily lives of several characters. A pilot multitask learning model is proposed to extract life events given video clips and subtitles for storing in the personal knowledge base.

GAKG: A Multimodal Geoscience Academic Knowledge Graph

Cheng Deng
Yuting Jia
Hui Xu
Chong Zhang
Jingyao Tang
Luoyi Fu
Weinan Zhang
Haisong Zhang
Xinbing Wang
Chenghu Zhou

The research of geoscience plays a strong role in helping people gain a better understanding of the Earth. To effectively represent the knowledge (KG) from enormous geoscience research papers, knowledge graphs can be a powerful means. In the face of enormous geoscience research papers, knowledge graphs can be a powerful means to manage the relationships of data and integrate knowledge extracted from them. However, the existing geoscience KGs mainly focus on the external connection between concepts, whereas the potential abundant information contained in the internal multimodal data of the paper is largely overlooked for more fine-grained knowledge mining. To this end, we propose GAKG, a large-scale multimodal academic KG based on 1.12 million papers published in various geoscience-related journals. In addition to the bibliometrics elements, we also extracted the internal illustrations, tables, and text information of the articles, and dig out the knowledge entities of the papers and the era and spatial attributes of the articles, coupling multimodal academic data and features. Specifically, GAKG realizes knowledge entity extraction under our proposed Human-In-the-Loop framework, the novelty of which is to combine the techniques of machine reading and information retrieval with manual annotation of geoscientists in the loop. Considering the fact that literature of geoscience often contains more abundant illustrations and time scale information compared with that of other disciplines, we extract all the geographical information and era from the geoscience papers' text and illustrations, mapping papers to the atlas and chronology. Based on GAKG, we build several knowledge discovery benchmarks for finding geoscience communities and predicting potential links. GAKG and its services have been made publicly available and user-friendly.

CoST: An annotated Data Collection for Complex Search

Cheyenne Dosso
Jose G. Moreno
Aline Chevalier
Lynda Tamine

While great progress is made in the area of information access, there are still open issues that involve designing intelligent systems supporting task-based search. Despite the importance of task-based search, the information retrieval and information science communities still feel the lack of open-ended and annotated datasets that enable the evaluation of a number of related facets of search tasks in downstream applications. Existing datasets are either sampled from large-scale logs but provide poor annotations, or sampled from lower-scale user studies but focus on ranked list evaluation. In this work, we present CoST: a novel richly annotated dataset for evaluating complex search tasks, collaboratively designed by researchers from the computer science and cognitive psychology domains, and intended to answer a wide range of research questions dealing with task-based search. CoST includes 5667 queries recorded in 630 task-based sessions that result from a user study involving 70 french native participants who are expert in one among 3 different domains of expertise (computer science, medicine, psychology). Each participant completed 15 tasks with 5 different types of cognitive complexity (fact-finding, exploratory learning, decision-making, problem-solving, multicriteria-inferential). In addition to search data (e.g., queries and clicks), CoST provides task and session-related data, task annotations and query annotations. We illustrate possible usages of CoST through the evaluation of query classification models and the understanding of the effect of task complexity and domain on user's search behavior.

DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs

Carsten Felix Draschner
Claus Stadler
Farshad Bakhshandegan Moghaddam
Jens Lehmann
Hajira Jabeen

This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are optimized to the multi-modal nature of knowledge graphs. DistRDF2ML provides aligned software design and usage principles as common data science stacks that offer an easy-to-use package for creating machine learning pipelines. The modules used in the pipeline, the hyper-parameters and the results are exported as a semantic structure that can be used to enrich the original knowledge graph. The semantic representation of metadata and machine learning results offers the advantage of increasing the machine learning pipelines' reusability, explainability, and reproducibility. The entire framework of DistRDF2ML is open source, integrated into the holistic SANSA stack, documented in scala-docs, and covered by unit tests. DistRDF2ML demonstrates its scalable design across different processing power configurations and (hyper-)parameter setups within various experiments. The framework brings the three worlds of knowledge graph engineers, distributed computation developers, and data scientists closer together and offers all of them the creation of explainable ML pipelines using a few lines of code.

WorldKG: A World-Scale Geographic Knowledge Graph

Alishiba Dsouza
Nicolas Tempelmeier
Ran Yu
Simon Gottschalk
Elena Demidova

OpenStreetMap is a rich source of openly available geographic information. However, the representation of geographic entities, e.g., buildings, mountains, and cities, within OpenStreetMap is highly heterogeneous, diverse, and incomplete. As a result, this rich data source is hardly usable for real-world applications. This paper presents WorldKG - a new geographic knowledge graph aiming to provide a comprehensive semantic representation of geographic entities in OpenStreetMap. We describe the WorldKG knowledge graph, including its ontology that builds the semantic dataset backbone, the extraction procedure of the ontology and geographic entities from OpenStreetMap, and the methods to enhance entity annotation. We perform statistical and qualitative dataset assessment, demonstrating the large scale and high precision of the semantic geographic information in WorldKG.

TwiBot-20: A Comprehensive Twitter Bot Detection Benchmark

Shangbin Feng
Herun Wan
Ningnan Wang
Jundong Li
Minnan Luo

Twitter has become a vital social media platform while an ample amount of malicious Twitter bots exist and induce undesirable social effects. Successful Twitter bot detection proposals are generally supervised, which rely heavily on large-scale datasets. However, existing benchmarks generally suffer from low levels of user diversity, limited user information and data scarcity. Therefore, these datasets are not sufficient to train and stably benchmark bot detection measures. To alleviate these problems, we present TwiBot-20, a massive Twitter bot detection benchmark, which contains 229,573 users, 33,488,192 tweets, 8,723,736 user property items and 455,958 follow relationships. TwiBot-20 covers diversified bots and genuine users to better represent the real-world Twittersphere. TwiBot-20 also includes three modals of user information to support both binary classification of single users and community-aware approaches. To the best of our knowledge, TwiBot-20 is the largest Twitter bot detection benchmark to date. We reproduce competitive bot detection methods and conduct a thorough evaluation on TwiBot-20 and two other public datasets. Experiment results demonstrate that existing bot detection measures fail to match their previously claimed performance on TwiBot-20, which suggests that Twitter bot detection remains a challenging task and requires further research efforts.

Evaluating Graph Vulnerability and Robustness using TIGER

Scott Freitas
Diyi Yang
Srijan Kumar
Hanghang Tong
Duen Horng Chau

Network robustness plays a crucial role in our understanding of complex interconnected systems such as transportation, communication, and computer networks. While significant research has been conducted in the area of network robustness, no comprehensive open-source toolbox currently exists to assist researchers and practitioners in this important topic. This lack of available tools hinders reproducibility and examination of existing work, development of new research, and dissemination of new ideas. We contribute TIGER, an open-sourced Python toolbox to address these challenges. TIGER contains 22 graph robustness measures with both original and fast approximate versions; 17 failure and attack strategies; 15 heuristic and optimization-based defense techniques; and 4 simulation tools. By democratizing the tools required to study network robustness, our goal is to assist researchers and practitioners in analyzing their own networks; and facilitate the development of new research in the field. TIGER has been integrated into the Nvidia Data Science Teaching Kit available to educators across the world; and Georgia Tech's Data and Visual Analytics class with over 1,000 students. TIGER is open sourced at: https://github.com/safreita1/TIGER

LC: A Flexible, Extensible Open-Source Toolkit for Model Compression

Yerlan Idelbayev
Miguel Á. Carreira-Perpiñán

The continued increase in memory, runtime and energy consumption of deployed machine learning models on one side, and the trend to miniaturize intelligent devices and sensors on the other side, imply that model compression will remain a critical need for the foreseeable future. A scalable solution to this problem must be able to handle arbitrary choices of the reference model to be compressed (driven by the machine learning task), of the form of compression to use, and of the costs and constraints to obey (driven by the target device). We describe an open-source toolkit that is primarily designed to be flexible and extensible, but which is also efficient in compression time and achieves state-of-the-art accuracy-compression curves, as demonstrated empirically over a number of deep net architectures. Mathematically, this is achieved by formulating compression as a constrained optimization using auxiliary variables that facilitate separability, and solving it via a penalty method and alternating optimization, which results in a "learning-compression" (LC) algorithm. This alternates a "learning" step over the original model, independent of the compression, and a "compression" step over the compressed parameters, independent of the dataset and task. Each step can typically be solved by reusing well-known algorithms, such as SGD or EM in the learning step, or SVD or k-means in the compression step, and this makes the algorithm flexible and extensible. The toolkit is available at https://github.com/UCMerced-ML/LC-model-compression.

DL-Traff: Survey and Benchmark of Deep Learning Models for Urban Traffic Prediction

Renhe Jiang
Du Yin
Zhaonan Wang
Yizhuo Wang
Jiewen Deng
Hangchen Liu
Zekun Cai
Jinliang Deng
Xuan Song
Ryosuke Shibasaki

Nowadays, with the rapid development of IoT (Internet of Things) and CPS (Cyber-Physical Systems) technologies, big spatiotemporal data are being generated from mobile phones, car navigation systems, and traffic sensors. By leveraging state-of-the-art deep learning technologies on such data, urban traffic prediction has drawn a lot of attention in AI and Intelligent Transportation System community. The problem can be uniformly modeled with a 3D tensor (T, N, C), where T denotes the total time steps, N denotes the size of the spatial domain (i.e., mesh-grids or graph-nodes), and C denotes the channels of information. According to the specific modeling strategy, the state-of-the-art deep learning models can be divided into three categories: grid-based, graph-based, and multivariate time-series models. In this study, we first synthetically review the deep traffic models as well as the widely used datasets, then build a standard benchmark to comprehensively evaluate their performances with the same settings and metrics. Our study named DL-Traff is implemented with two most popular deep learning frameworks, i.e., TensorFlow and PyTorch, which is already publicly available as two GitHub repositories https://github.com/deepkashiwa20/DL-Traff-Grid and https://github.com/deepkashiwa20/DL-Traff-Graph. With DL-Traff, we hope to deliver a useful resource to researchers who are interested in spatiotemporal data analysis.

PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval

Craig Macdonald
Nicola Tonellotto
Sean MacAvaney
Iadh Ounis

PyTerrier is a Python-based retrieval framework for expressing simple and complex information retrieval (IR) pipelines in a declarative manner. While making use of the long-established Terrier IR platform for basic text indexing and retrieval, its salient utility comes from its expressive Python operators, which allow for individual IR operations to be pipelined and combined in different flexible manners as requested by the search application. Each operation applies a transformation upon a dataframe, while operators are defined with clear semantics in relational algebra. Going further, we have recently expanded the PyTerrier framework to include additional support for state-of-the-art BERT-based text re-rankers (such as EPIC) and dense retrieval implementations (such as ANCE and ColBERT). Transformer pipelines can be tuned and evaluated in a declarative manner. To increase the reusability of this framework as a resource for the IR community, PyTerrier provides easy access to a variety of standard benchmark datasets, including pre-built indices. Finally, we highlight the advantages of such a framework for information retrieval researchers and educators.

QuaPy: A Python-Based Framework for Quantification

Alejandro Moreo
Andrea Esuli
Fabrizio Sebastiani

QuaPy is an open-source framework for performing quantification (a.k.a. supervised prevalence estimation), written in Python. Quantification is the task of training quantifiers via supervised learning, where a quantifier is a predictor that estimates the relative frequencies (a.k.a. prevalence values) of the classes of interest in a sample of unlabelled data. While quantification can be trivially performed by applying a standard classifier to each unlabelled data item and counting how many data items have been assigned to each class, it has been shown that this "classify and count" method is outperformed by methods specifically designed for quantification. QuaPy provides implementations of a number of baseline methods and advanced quantification methods, of routines for quantification-oriented model selection, of several broadly accepted evaluation measures, and of robust evaluation protocols routinely used in the field. QuaPy also makes available datasets commonly used for testing quantifiers, and offers visualization tools for facilitating the analysis and interpretation of the results. The software is open-source and publicly available under a BSD-3 licence via https://github.com/HLT-ISTI/QuaPy, and can be installed via pip (https://pypi.org/project/QuaPy/)

Pirá: A Bilingual Portuguese-English Dataset for Question-Answering about the Ocean

André F. A. Paschoal
Paulo Pirozelli
Valdinei Freire
Karina V. Delgado
Sarajane M. Peres
Marcos M. José
Flávio Nakasato
André S. Oliveira
Anarosa A. F. Brandão
Anna H. R. Costa
Fabio G. Cozman

Current research in natural language processing is highly dependent on carefully produced corpora. Most existing resources focus on English; some resources focus on languages such as Chinese and French; few resources deal with more than one language. This paper presents the Pirá dataset, a large set of questions and answers about the ocean and the Brazilian coast both in Portuguese and English. Pirá is, to the best of our knowledge, the first QA dataset with supporting texts in Portuguese, and, perhaps more importantly, the first bilingual QA dataset that includes this language. The Pirá dataset consists of 2261 properly curated question/answer (QA) sets in both languages. The QA sets were manually created based on two corpora: abstracts related to the Brazilian coast and excerpts of United Nation reports about the ocean. The QA sets were validated in a peer-review process with the dataset contributors. We discuss some of the advantages as well as limitations of Pirá, as this new resource can support a set of tasks in NLP such as question-answering, information retrieval, and machine translation.

VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case Law

Julien Rossi
Svitlana Vakulenko
Evangelos Kanoulas

Citing legal opinions is a key part of legal argumentation, an expert task that requires retrieval, extraction and summarization of information from court decisions. The identification of legally salient parts in an opinion for the purpose of citation may be seen as a domain-specific formulation of a highlight extraction or passage retrieval task. As similar tasks in other domains such as web search show significant attention and improvement, progress in the legal domain is hindered by the lack of resources for training and evaluation. This paper presents a new dataset that consists of the citation graph of court opinions, which cite previously published court opinions in support of their arguments. In particular, we focus on the verbatim quotes, i.e., where the text of the original opinion is directly reused. With this approach, we explain the relative importance of different text spans of a court opinion by showcasing their usage in citations, and measuring their contribution to the relations between opinions in the citation graph. We release VerbCL, a large-scale dataset derived from CourtListener and introduce the task of highlight extraction as a single-document summarization task based on the citation graph establishing the first baseline results for this task on the VerbCL dataset.

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models

Benedek Rozemberczki
Paul Scherer
Yixuan He
George Panagopoulos
Alexander Riedel
Maria Astefanoaei
Oliver Kiss
Ferenc Beres
Guzmán López
Nicolas Collignon
Rik Sarkar

We present PyTorch Geometric Temporal, a deep learning framework combining state-of-the-art machine learning algorithms for neural spatiotemporal signal processing. The main goal of the library is to make temporal geometric deep learning available for researchers and machine learning practitioners in a unified easy-to-use framework. PyTorch Geometric Temporal was created with foundations on existing libraries in the PyTorch eco-system, streamlined neural network layer definitions, temporal snapshot generators for batching, and integrated benchmark datasets. These features are illustrated with a tutorial-like case study. Experiments demonstrate the predictive performance of the models implemented in the library on real-world problems such as epidemiological forecasting, ride-hail demand prediction, and web traffic management. Our sensitivity analysis of runtime shows that the framework can potentially operate on web-scale datasets with rich temporal features and spatial structure.

SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles

David Schindler
Felix Bensmann
Stefan Dietze
Frank Krüger

Knowledge about software used in scientific investigations is important for several reasons, for instance, to enable an understanding of provenance and methods involved in data handling. However, software is usually not formally cited, but rather mentioned informally within the scholarly description of the investigation, raising the need for automatic information extraction and disambiguation. Given the lack of reliable ground truth data, we present SoMeSci-Software Mentions in Science-a gold standard knowledge graph of software mentions in scientific articles. It contains high quality annotations (IRR: K=.82) of 3756 software mentions in 1367 PubMed Central articles. Besides the plain mention of the software, we also provide relation labels for additional information, such as the version, the developer, a URL or citations. Moreover, we distinguish between different types, such as application, plugin or programming environment, as well as different types of mentions, such as usage or creation. To the best of our knowledge, SoMeSci is the most comprehensive corpus about software mentions in scientific articles, providing training samples for Named Entity Recognition, Relation Extraction, Entity Disambiguation, and Entity Linking. Finally, we sketch potential use cases and provide baseline results.

librec-auto: A Tool for Recommender Systems Experimentation

Nasim Sonboli
Masoud Mansoury
Ziyue Guo
Shreyas Kadekodi
Weiwen Liu
Zijun Liu
Andrew Schwartz
Robin Burke

Recommender systems are complex. They integrate the individual needs of users with the characteristics of particular domains of application which may span items from large and potentially heterogeneous collections. Extensive experimentation is required to understand the multidimensional properties of recommendation algorithms and the fit between algorithm and application. librec-auto is a tool that automates many aspects of off-line batch recommender system experimentation. It has a large library of state-of-the-art and historical recommendation algorithms and a wide variety of evaluation metrics. It further supports the study of diversity and fairness in recommendation through the integration of re-ranking algorithms and fairness-aware metrics. It supports declarative configuration for reproducible experiment management and supports multiple forms of hyper-parameter optimization.

TrUMAn: Trope Understanding in Movies and Animations

Hung-Ting Su
Po-Wei Shen
Bing-Chen Tsai
Wen-Feng Cheng
Ke-Jyun Wang
Winston H. Hsu

Understanding and comprehending video content is crucial for many real-world applications such as search and recommendation systems. While recent progress of deep learning has boosted performance on various tasks using visual cues, deep cognition to reason intentions, motivation, or causality remains challenging. Existing datasets that aim to examine video reasoning capability focus on visual signals such as actions, objects, relations, or could be answered utilizing text bias. Observing this, we propose a novel task, along with a new dataset: Trope Understanding in Movies and Animations (TrUMAn), with 2423 videos associated with 132 tropes, intending to evaluate and develop learning systems beyond visual signals. Tropes are frequently used storytelling devices for creative works. By coping with the trope understanding task and enabling the deep cognition skills of machines, data mining applications and algorithms could be taken to the next level. To tackle the challenging TrUMAn dataset, we present a Trope Understanding and Storytelling (TrUSt) with a new Conceptual Storyteller module, which guides the video encoder by performing video storytelling on a latent space. Experimental results demonstrate that state-of-the-art learning systems on existing tasks reach only 12.01% of accuracy with raw input signals. Also, even in the oracle case with human-annotated descriptions, BERT contextual embedding achieves at most 28% of accuracy. Our proposed TrUSt boosts the model performance and reaches 13.94% performance. We also provide detailed analysis to pave the way for future research. TrUMAn is publicly available at:https://www.cmlab.csie.ntu.edu.tw/project/trope

GeoVectors: A Linked Open Corpus of OpenStreetMap Embeddings on World Scale

Nicolas Tempelmeier
Simon Gottschalk
Elena Demidova

OpenStreetMap (OSM) is currently the richest publicly available information source on geographic entities (e.g., buildings and roads) worldwide. However, using OSM entities in machine learning models and other applications is challenging due to the large scale of OSM, the extreme heterogeneity of entity annotations, and a lack of a well-defined ontology to describe entity semantics and properties. This paper presents GeoVectors - a unique, comprehensive world-scale linked open corpus of OSM entity embeddings covering the entire OSM dataset and providing latent representations of over 980 million geographic entities in 180 countries. The GeoVectors corpus captures semantic and geographic dimensions of OSM entities and makes these entities directly accessible to machine learning algorithms and semantic applications. We create a semantic description of the GeoVectors corpus, including identity links to the Wikidata and DBpedia knowledge graphs to supply context information. Furthermore, we provide a SPARQL endpoint - a semantic interface that offers direct access to the semantic and latent representations of geographic entities in OSM.

ULTRA: An Unbiased Learning To Rank Algorithm Toolbox

Anh Tran
Tao Yang
Qingyao Ai

Learning to rank system has become an important aspect of our daily life. However, the implicit user feedback that is used to train many learning to rank models is usually noisy and suffers from user bias (i.e., position bias). Thus, obtaining unbiased model using biased feedback has become an important research field for IR. Existing studies on unbiased learning to rank (ULTR) can be generalized into two families-algorithms that attain unbiasness with logged data, offline learning, and algorithms that achieve unbiasness by estimating unbiased parameters with real-time user interactions, namely online learning. While there exist many algorithms from both families, there lacks a unified way to compare and benchmark them. As a result, it can be challenging for researchers to choose the right technique for their problems or for people who are new to the field to learn and understand existing algorithms. To solve this problem, we introduced ULTRA, which is a flexible, extensible, and easily configure ULTR toolbox. Its key features include support for multiple ULTR algorithms with configurable hyper parameters, a variety of built-in click models that can be used separately to simulate clicks, different ranking model architectures and evaluation metrics, and simple learning to rank pipeline creation. In this paper, we discuss the general framework of ULTR, briefly describe the algorithms in ULTRA, detail the structure, and pipeline of the toolbox. We experimented on all the algorithms supported by ULTRA and showed that the toolbox performance is reasonable. Our toolbox is an important resource for researchers to conduct experiments on ULTR algorithms with different configurations as well as testing their own algorithms with the supported features.

LiteratureQA: A Qestion Answering Corpus with Graph Knowledge on Academic Literature

Haiwen Wang
Le Zhou
Weinan Zhang
Xinbing Wang

In this paper, we introduce LiteratureQA, a large question answering (QA) corpus consisting of publicly available academic papers. Different from other QA corpus, LiteratureQA has its unique challenges such as how to leverage the structured knowledge of citation networks. We further examine some popular QA method and present a benchmark approach of answering academic questions by combining both semantic text and graph knowledge to improve the prevalent pre-training model. We hope this resource could help research and development of tasks for machine reading over academic text.

Machamp: A Generalized Entity Matching Benchmark

Jin Wang
Yuliang Li
Wataru Hirota

Entity Matching (EM) refers to the problem of determining whether two different data representations refer to the same real-world entity. It has been a long-standing interest of the data management community. Many efforts have been paid in creating benchmark tasks as well as in developing advanced matching techniques for EM. However, existing benchmark tasks for EM are limited to the case where the two data collections of entities are structured tables with the same schema. Meanwhile, the tables in data collections for matching could be structured, semi-structured, or unstructured in real-world scenarios of data science. In this paper, we come up with a new research problem - Generalized Entity Matching to satisfy this requirement and create a benchmark Machamp for it. Machamp consists of seven tasks having diverse characteristics and thus provides good coverage of use cases in real applications. We summarize existing EM benchmark tasks for structured tables and conduct a series of processing and cleaning efforts to transform them into matching tasks between tables with different structures. Based on that, we further conduct comprehensive profiling of the proposed tasks and evaluate several popular entity matching approaches on them. With the help of Machamp, it is the first time that researchers can evaluate EM techniques between data collections with different structures. It is public available via link: https://github.com/megagonlabs/machamp.

MOOCCubeX: A Large Knowledge-centered Repository for Adaptive Learning in MOOCs

Jifan Yu
Yuquan Wang
Qingyang Zhong
Gan Luo
Yiming Mao
Kai Sun
Wenzheng Feng
Wei Xu
Shulin Cao
Kaisheng Zeng
Zijun Yao
Lei Hou
Yankai Lin
Peng Li
Jie Zhou
Bin Xu
Juanzi Li
Jie Tang
Maosong Sun

The prosperity of massive open online courses provides fodder for plentiful research efforts on adaptive learning. However, current open-access educational datasets are still far from sufficient to meet the need for various topics of adaptive learning. Existing released datasets often cover only small-scale data, lack fine-grained knowledge concepts. They are even difficult to curate and supplement due to platform limitations. In this work, we construct MOOCCubeX, a large, knowledge-centered repository consisting of 4,216 courses, 230,263 videos, 358,265 exercises, 637,572 fine-grained concepts and over 296 million behavioral data of 3,330,294 students, for supporting the research topics on adaptive learning in MOOCs. Licensed by XuetangX, one of the largest MOOC websites in China, we obtain abundant and diverse course resources and student behavioral data and are permitted to make subsequent periodic updates. We propose a framework to accomplish data processing, weakly supervised fine-grained concept graph mining, and data curation to improve usability and richness. Based on the fine-grained concepts, we re-organize the data from the knowledge perspective and acquire more external learning resources from the web. Our repository is now available at https://github.com/THU-KEG/MOOCCubeX.

RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms

Wayne Xin Zhao
Shanlei Mu
Yupeng Hou
Zihan Lin
Yushuo Chen
Xingyu Pan
Kaiyuan Li
Yujie Lu
Hui Wang
Changxin Tian
Yingqian Min
Zhichao Feng
Xinyan Fan
Xu Chen
Pengfei Wang
Wendi Ji
Yaliang Li
Xiaoling Wang
Ji-Rong Wen

In recent years, there are a large number of recommendation algorithms proposed in the literature, from traditional collaborative filtering to deep learning algorithms. However, the concerns about how to standardize open source implementation of recommendation algorithms continually increase in the research community. In the light of this challenge, we propose a unified, comprehensive and efficient recommender system library called RecBole (pronounced as [rEk'boUl@r]), which provides a unified framework to develop and reproduce recommendation algorithms for research purpose. In this library, we implement 73 recommendation models on 28 benchmark datasets, covering the categories of general recommendation, sequential recommendation, context-aware recommendation and knowledge-based recommendation. We implement the RecBole library based on PyTorch, which is one of the most popular deep learning frameworks. Our library is featured in many aspects, including general and extensible data structures, comprehensive benchmark models and datasets, efficient GPU-accelerated execution, and extensive and standard evaluation protocols. We provide a series of auxiliary functions, tools, and scripts to facilitate the use of this library, such as automatic parameter tuning and break-point resume. Such a framework is useful to standardize the implementation and evaluation of recommender systems. The project and documents are released at https://recbole.io/.

SESSION: Demo Papers

FairCORELS, an Open-Source Library for Learning Fair Rule Lists

Ulrich Aïvodji
Julien Ferry
Sébastien Gambs
Marie-José Huguet
Mohamed Siala

FairCORELS is an open-source Python module for building fair rule lists. It is a multi-objective variant of CORELS, a branch-and-bound algorithm to learn certifiably optimal rule lists. FairCORELS supports six statistical fairness metrics, proposes several exploration parameters and leverages on the fairness constraints to prune the search space efficiently. It can easily generate sets of accuracy-fairness trade-offs. The models learnt are interpretable by design and a sparsity parameter can be used to control their length.

Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens

Angelos-Christos Anadiotis
Oana Balalau
Théo Bouganim
Francesco Chimienti
Helena Galhardas
Mhd Yamen Haddad
Stéphane Horel
Ioana Manolescu
Youssr Youssef

Investigative Journalism (IJ, in short) requires combining highly heterogeneous digital datasets coming from a wide variety of sources. We have developed ConnectionLens, a system that integrates such sources into a single heterogeneous graph and enables users to query the graph using keywords. The first iteration of the system [7] followed a mediator architecture which severely constrained its query scalability. Thus, we fully re-engineered the system, moving it to a warehouse architecture, and replacing its core components (information extraction, data querying, and interactive interfaces), which allowed us to handle uses cases orders of magnitude larger than the previous platform. In a consortium of computer scientists and investigative journalists, we propose to demonstrate ConnectionLens' capability to integrate arbitrary heterogeneous datasets and query them flexibly by means of keywords. Among several scenarios, our main focus will be on a real-world journalistic use case about situations which may lead to Conflicts of Interest between biomedical experts and various organizations, such as corporations, lobbies, etc. The demonstration will showcase the end-to-end data analysis pipeline, illustrate each system component, and the different parameters governing graph creation and querying.

MVQAS: A Medical Visual Question Answering System

Haoyue Bai
Xiaoyan Shan
Yefan Huang
Xiaoli Wang

This paper demonstrates a medical visual question answering (VQA) system to address three challenges: 1) medical VQA often lacks large-scale labeled training data which requires huge efforts to build; 2) it is costly to implement and thoroughly compare medical VQA models on self-created datasets; 3) applying general VQA models to the medical domain by transfer learning is challenging due to various visual concepts between general images and medical images. Our system has three main components: data generation, model library, and model practice. To address the first challenge, we first allow users to upload self-collected clinical data such as electronic medical records (EMRs) to the data generation component and provides an annotating tool for labeling the data. Then, the system semi-automatically generates medical VQAs for users. Second, we develop a model library by implementing VQA models for users to evaluate their datasets. Users can do simple configurations by selecting self-interested models. The system then automatically trains the models, conducts extensive experimental evaluation, and reports comprehensive findings. The reports provide new insights into the strengths and weaknesses of selected models. Third, we provide an online chat module for users to communicate with an AI robots for further evaluating the models. The source codes are shared on https://github.com/shyanneshan/VQA-Demo.

Landmark Explanation: An Explainer for Entity Matching Models

Andrea Baraldi
Francesco Del Buono
Matteo Paganelli
Francesco Guerra

State-of-the-art approaches model Entity Matching (EM) as a binary classification problem, where Machine (ML) or Deep Learning (DL) based techniques are applied to evaluate if descriptions of pairs of entities refer to the same real-world instance. Despite these approaches have experimentally demonstrated to achieve high effectiveness, their adoption in real scenarios is limited by the lack of interpretability of their behavior.

This paper showcases Landmark Explanation1, a tool that makes generic post-hoc (model-agnostic) perturbation-based explanation systems able to explain the behavior of EM models. In particular, Landmark Explanation computes local interpretations, i.e., given a description of a pair of entities and an EM model, it computes the contribution of each term in generating the prediction. The demonstration shows that the explanations generated by Landmark Explanation are effective even for non-matching pairs of entities, a challenge for explanation systems.

Beamer: An End-to-End Deep Learning Framework for Unifying Data Cleaning in DNN Model Training and Inference

Nifei Bi
Xiansen Chen
Chen Xu
Aoying Zhou

Deep learning has made extraordinary progress in the last few years, focusing on improving the accuracy and speed of standard deep learning benchmarks. Nevertheless, datasets in production environments are often messy, which makes data cleaning crucial for DNN model training and inference. Existing solutions that combine big data processing systems and deep learning systems to accomplish the data cleaning, DNN model training and inference are internally tied to one of Spark or Flink. However, Spark and Flink usually show different performance under batch and stream processing workloads. In order to employ Spark in batch training and Flink in streaming inference, existing solutions incur the burden of maintaining two data cleaning programs. In this demonstration, we showcase Beamer: an end-to-end deep learning framework for unifying the data cleaning program when employing Spark in training and Flink in inference, respectively.

WhatTheWikiFact: Fact-Checking Claims Against Wikipedia

Anton Chernyavskiy
Dmitry Ilvovsky
Preslav Nakov

The rise of Internet has made it a major source of information. Unfortunately, not all information online is true, and thus a number of fact-checking initiatives have been launched, both manual and automatic, to deal with the problem. Here, we present our contribution in this regard: WhatTheWikiFact, a system for automatic claim verification using Wikipedia. The system can predict the veracity of an input claim, and it further shows the evidence it has retrieved as part of the verification process. It shows confidence scores and a list of relevant Wikipedia articles, together with detailed information about each article, including the phrase used to retrieve it, the most relevant sentences extracted from it and their stance with respect to the input claim, as well as the associated probabilities. The system supports several languages: Bulgarian, English, and Russian.

DashBot: An ML-Guided Dashboard Generation System

Sandrine Da Col
Radu Ciucanu
Marta Soare
Nassim Bouarour
Sihem Amer-Yahia

Data summarization provides a bird's eye view of data and groupby queries have been the method of choice for data summarization. Such queries provide the ability to group by some attributes and aggregate by others, and their results can be coupled with a visualization to convey insights. The number of possible groupbys that can be computed over a dataset is quite large which naturally calls for developing approaches to aid users in choosing which groupbys best summarize data. We demonstrate DashBot, a system that leverages Machine Learning to guide users in generating data-driven and customized dashboards. A dashboard contains a set of panels, each of which is a groupby query. DashBot iteratively recommends the most relevant panel while ensuring coverage. Relevance is computed based on intrinsic measures of the dataset and coverage aims to provide comprehensive summaries. DashBot relies on a Multi-Armed Bandits (MABs) approach to balance exploitation of relevance and exploration of different regions of the data to achieve coverage. Users can provide feedback and explanations to customize recommended panels. We demonstrate the utility and features of DashBot on different datasets.

SearchEHR: A Family History Search System for Clinical Decision Support

Xiang Dai
Maciej Rybinski
Sarvnaz Karimi

Finding patients with specific clinical conditions, such as having a familial disease history of diabetes, is an important task for clinical decision support. Clinical notes in Electronic Health Records (EHR), which document the patient medical history and familial disease history, are valuable resources for patient cohort selection. However, such information is difficult to discover in clinical text, and full-text search techniques often fail due to the unique characteristics of clinical language. We describe a system---SearchEHR---that combines Natural Language Processing (NLP) and Information Retrieval (IR) techniques to facilitate utilising clinical notes to find cohorts of patients, with a special focus on family disease history.

OpenAttHetRL: An Open Source Toolkit for Attributed Heterogeneous Network Representation Learning

Roohollah Etemadi
Morteza Zihayat
Kuan Feng
Jason Adelman
Ebrahim Bagheri

Learning the latent representations of entities based on their relationships and the data associated with them is an essential task in many applications such as ranking, recommendation systems, graph-based team formation, keyword search, and many more. However, the majority of existing techniques learn the latent representations of either network or textual data. Structural embedding techniques suffer from the sparsity of real-world networks. Attributes of nodes are a source of rich information to ameliorate network embedding vectors which are overlooked in the literature. Thus, most existing network representation learning tools capture structural information. This paper introduces an open-source toolkit called OpenAttHetRL to learn the latent representations of entities based on their both network and textual data in an end-to-end fashion. OpenAttHetRL is easy to employ and adapt for a variety of tasks including ranking, recommendation systems, and expert finding. OpenAttHetRL aims to provide a unified toolkit for data pre-processing, building and training models, and performing predictions for a downstream task. It employs a graph convolution network to capture the relationships among entities and a kernel pooling technique to preserve the similarity of their textual data in the embedding space. We use expert finding in community question answering systems to demonstrate how OpenAttHetRL can be trained to get latent representations of questions, their askers, tags, and answerers and find potential answerers of new questions.

An RDF Data Management System for Conflict Casualties

Yad Fatah
Mark Nourallah
Lynn Wahab
Fatima K. Abu Salem
Shady Elbassuoni

In a world embroiled in armed conflicts, documenting conflict casualties is an important goal for many NGOs. Most of such documented records of casualties are however managed through internal databases, spreadsheets or Web forms. As such, exploring and querying such data becomes extremely chaotic. In this paper, we demonstrate CasualtIS, an RDF data management system for conflict casualties. Our system models conflict casualties data as RDF graphs and allows users to query such data using a SPARQL endpoint. Our system also includes a template-based natural-language querying interface to support non-expert users. Our system can be used for various purposes by end users, such as fact-checking certain claims about conflict casualties, aggregating casualties over time and location, and finding contextual information about casualties, such as the cause of death, actors involved, and other similar critical information. We demonstrate our system using two case studies, one related to casualties in the Iraqi war and the other related to casualties in the Syrian war.

PyTFL: A Python-based Neural Team Formation Toolkit

Radin Hamidi Rad
Aabid Mitha
Hossein Fani
Mehdi Kargar
Jaroslaw Szlichta
Ebrahim Bagheri

We present PyTFL, a library written in Python for the team formation task. In team formation task, the main objective is to form a team of experts given a set of skills. We demonstrate an efficient and well-structured open-source toolkit that can easily be imported into Python. Our toolkit incorporates state-of-the-art approaches for team formation, e.g., neural-based team formation, and supports team formation sub-tasks such as collaboration graph preparation, model training and validation, systematic evaluation based on qualitative and quantitative team metrics, and efficient team formation and prediction. While there are strong research papers on the team formation problem, PyTFL is the first toolkit to be publicly released for this purpose.

TagPick: A System for Bridging Micro-Video Hashtags and E-commerce Categories

Li He
Dingxian Wang
Hanzhang Wang
Hongxu Chen
Guandong Xu

Hashtag, a product of user tagging behavior, which can well describe the semantics of the user-generated content personally over social network applications, e.g., the recently popular micro-videos. Hashtags have been widely used to facilitate various micro-video retrieval scenarios, such as search engine and categorization. In order to leverage hashtags on micro-media platform for effective e-commerce marketing campaign, there is a demand from e-commerce industry to develop a mapping algorithm bridging its categories and micro-video hashtags. In this demo paper, we therefore proposed a novel solution called TagPick that incorporates clues from all user behavior metadata (hashtags, interactions, multimedia information) as well as relational data (graph-based network) into a unified system to reveal the correlation between e-commerce categories and hashtags in industrial scenarios. In particular, we provide a tag-level popularity strategy to recommend the relevant hashtags for e-Commerce platform (e.g., eBay).

HAO Unity: A Graph-based System for Unifying Heterogeneous Data

Fei Jie
Yanxiang Huang
Qiangwei Bai
Xindong Wu

Many real-world applications have to face the problem of diversity in data formats and semantics. Currently, how to deal with heterogeneous data effectively is still a big challenge. With the rise of knowledge graphs, more and more applications are built upon graph-like data models, which benefit from flexible schemas and convenient support for relationship queries. We propose a graph-based unifying system for heterogeneous data unification, which helps to (1) transform data in many other formats into graphs, or conversely, from graph to other formats, (2) integrate graph data based on HAO intelligence, which achieves schema integration and entity consolidation, and (3) explore data at different levels via querying the integrated graphs. In this paper, we introduce the overall system architecture, explain in detail the implementation, and display the usage in two practical scenarios.

SPARQL-vision: A Platform for Querying, Visualising and Exploring SPARQL endpoints

Maria Krommyda
Verena Kantere

The wide adaptation of the Semantic Web and the Resource Description Framework (RDF) has made available many important datasets. The SPARQL query language facilitates the exploration of this information, which is available in a semi-structured way that does not comply with relational data models, deviating from exploration techniques that most researchers are familiar with. Usually, only people with training and extensive knowledge of the RDF model can explore and understand it in depth. We present here a platform that supports the users with querying, exploring and visualizing information available in SPARQL endpoints. A dedicated visualization module, built upon a knowledge database, allows us to provide case-specific visualization solutions for SPARQL query results. The selection is based exclusively on features extracted from the result, without any knowledge about the structure, content and characteristics of the underlying dataset.

CLC-RS: A Chinese Legal Case Retrieval System with Masked Language Ranking

Jieke Li
Min Yang
Chengming Li

With the ever-increasing size of legal cases in China, relevant legal case retrieval given a user query has attracted considerable attention. Conventional keyword-based retrieval systems look for matching cases that contain one or more words specified by the user. However, keyword search is sharply focused on finding the exact terms specified in the query, making the retrieval systems miss many relevant documents. In addition, it is difficult for new users to identify appropriate keywords for accurate legal case retrieval. In this paper, we develop a novel Chinese legal case retrieval system (called CLC-RS), which improves the quality of semantic search with natural language queries in the legal domain. CLC-RS performs legal case retrieval in a two-stage fashion. First, we employ a classic token-based ranking method to efficiently reduce the solution space, returning a subset of candidate legal cases. Then, we deploy a novel masked language ranking model to re-rank the candidate legal cases. The experimental results show that the proposed system is both efficient and effective, providing a practical information retrieval (IR) system for retrieving Chinese legal cases. The web site for the developed CLC-RS system is available at: https://www.delilegal.com/.

RCES: Rapid Cues Exploratory Search Using Taxonomies For COVID-19

Wei Li
Rishi Choudhary
Arjumand Younus
Bruno Ohana
Nicole Baker
Brendan Leen
M. Atif Qureshi

To assist the COVID-19 focused researchers in life science and healthcare in understanding the pandemic, we present an exploratory information retrieval system called RCES. The system employs a previously developed EVE (Explainable Vector-based Embedding) model using DBpedia and an adopted model using MeSH taxonomies to exploit concept relations related to COVID-19. Various expansion methods are also developed, along with explanations and facets that collectively form rapid cues for a valuable navigational and informed user experience.

Health Claims Unpacked: A toolkit to Enhance the Communication of Health Claims for Food

Xiao Li
Huizhi Liang
Zehao Liu

Health claims are sentences on the food product packages to claim the nutrition and the benefits of the nutrition. Consumers in different European contexts often have difficulties understanding health claims, leading to increased confusion about and decreased trust in the food they buy. Focusing on this problem, we develop a toolkit for improving the communication of health claims for consumers. The toolkit provides (1) interactive activities to disseminate knowledge about health claims to the public, and (2) an NLP-based analysis and prediction engine that food manufacturers can use to estimate how consumers like the health claims that the manufacturers created. By using the AI-powered toolkit, consumers, manufacturers, and food safety regulators are engaged in determining the different linguistic and cultural barriers to the effective communication of health claims and formulating solutions that can be implemented on multiple levels, including regulation, enforcement, marketing, and consumer education.

From Community Search to Community Understanding: A Multimodal Community Query Engine

Zhao Li
Pengcheng Zou
Xia Chen
Shichang Hu
Peng Zhang
Yumou Zhang
Bingsheng He
Yuchen Li
Xing Tang

In this demo, we present an online multi-modal community query engine (MQE) on Alibaba's billion-scale heterogeneous network. MQE has two distinct features in comparison with existing community query engines. Firstly, MQE supports multimodal community search on heterogeneous graphs with keyword and image queries. Secondly, to facilitate community understanding in real business scenarios, MQE generates natural language descriptions for the retrieved community in combination with other useful demographic information. The distinct features of MQE benefit many downstream applications in Alibaba's e-commerce platform like recommendation. Our experiments confirm the effectiveness and efficiency of MQE on graphs with billions of edges.

IMAS++: An Intelligent Medical Analysis System Enhanced with Deep Graph Neural Networks

Feng Luo
Yue Zhang
Xiaoli Wang

This paper demonstrates an intelligent medical analysis system. We aim to address two main challenges: 1) medical data often contain heterogeneous information which are usually valuable but difficult to be modeled; 2) medical data are often lacking of large scale labeled data which usually require huge efforts to build. To resolve the first challenge, we propose a novel multi-modal heterogeneous graph model to represent the medical data. Based on this model, graph neural networks can be directly applied to effective medical case clustering. This helps to resolve the second challenge for label assignment in the same cluster. To further evaluate the practical use of the proposed model, the system also proposes an effective similar medical case retrieval framework based on a novel graph similarity learning model. We have implemented the system and the source codes are published at https://github.com/emmali808/ADDS. With our system, users can easily pinpoint valuable historical medical information they are interested in and obtain closely relevant medical cases for further diagnosis.

RW-Team: Robust Team Formation using Random Walk

John Nemec
Heidar Davoudi
Lukasz Golab
Mehdi Kargar
Yuliya Lytvyn
Piotr Mierzejewski
Jaroslaw Szlichta
Morteza Zihayat

There is a growing need to find meaningful teams in expert networks such as DBLP and GitHub. However, existing team formation methods, such as those based on shortest paths between experts, may generate weakly-connected teams. We demonstrate RW-Team, a robust team formation framework based on a random walk with restart (RWR). We introduce a greedy algorithm to reduce the search space, and we use a Monte Carlo approximation of RWR to improve performance. To handle large graphs, we implement RW-Team in Apache Spark. The proposed demonstration will allow participants to form teams of researchers having various skill sets and explore connections among team members using several graph visualization techniques.

DLQ: A System for Label-Constrained Reachability Queries on Dynamic Graphs

You Peng
Wenjie Zhao
Wenjie Zhang
Xuemin Lin
Ying Zhang

Label-Constraint Reachability query (LCR) which extracts of reachability information from large edge-labeled graphs, has attracted tremendous interest. Various LCR algorithms have been proposed to solve this fundamental query, which has a wide range of applications in social networks, biological networks, economic networks, etc. In this paper, we implement the state-of-the-art P2H+ algorithm as well as functions to analyze the effectiveness. Moreover, our Dynamic LCR Query (DLQ) system also supports dynamic updates with the 2-hop labeling method. In this demonstration, we present the DLQ system for Label-Constrained Reachability Queries that utilize the 2-hop labeling algorithm with dynamic graph maintenance.

DORA THE EXPLORER: Exploring Very Large Data With Interactive Deep Reinforcement Learning

Aurélien Personnaz
Sihem Amer-Yahia
Laure Berti-Equille
Maximilian Fabricius
Srividya Subramanian

We demonstrate DORA THE EXPLORER, a system that guides users in finding items of interest in a very large data set. DORA THE EXPLORER provides users with the full spectrum of exploration modes and is driven by Data Familiarity or Curiosity, as well as User Interventions. DORA THE EXPLORER is able to handle data and search scenario complexity, i.e., the difficulty to find scattered/clustered individual records in the data set, and user ability to express what s/he needs. DORA THE EXPLORER relies on Deep Reinforcement Learning that combines intrinsic (curiosity) and extrinsic (familiarity) rewards. DORA's main goal is to support scientific discovery from data. We describe the system architecture and illustrate it with three demonstration scenarios on a 2.6 mil-lion galaxies SDSS, a large sky survey data set1. A video of DORA THE EXPLORER is available at https://bit.ly/dora-demo, the codehttps://github.com/apersonnaz/rl-guided-galaxy-exploration, and the application at https://bit.ly/dora-application

A Semantic Data Marketplace for Easy Data Sharing within a Smart City

André Pomp
Alexander Paulus
Andreas Burgdorf
Tobias Meisen

Today, smart city applications are largely based on data collected from different stakeholders. This presupposes that the required data sources are publicly available. While open data platforms already provide a number of urban data sources, enterprises and citizens have few opportunities to make their data available. To complicate things further, if the data is published, the processing of this data is already extremely time-consuming today, as the data sources are heterogeneous and the corresponding homogenization has to be carried out by the data consumers themselves. In this paper, we present a data marketplace that enables different stakeholders (public institutions, enterprises, citizens) to easily provide data that can especially contribute to the further realization of smart cities. This marketplace is based on the principles of semantic data management, i.e., data providers annotate their added data with semantic models. With the help of these models, the data sources can be found and understood by data consumers and finally homogenized in a way that is suitable for their application.

PRASEMap: A Probabilistic Reasoning and Semantic Embedding based Knowledge Graph Alignment System

Zhiyuan Qi
Ziheng Zhang
Jiaoyan Chen
Xi Chen
Yefeng Zheng

Knowledge Graph (KG) alignment aims at finding equivalent entities and relations (i.e., mappings) between two KGs. The existing approaches utilize either reasoning-based or semantic embedding-based techniques, but few studies explore their combination. In this demonstration, we present PRASEMap, an unsupervised KG alignment system that iteratively computes the Mappings with both Probabilistic Reasoning (PR) And Semantic Embedding (SE) techniques. PRASEMap can support various embedding-based KG alignment approaches as the SE module, and it also enables easy human computer interaction that additionally provides an option for users to feed the mapping annotations back to the system for better results. The demonstration showcases these features via a stand-alone Web application with user friendly interfaces. The demo is available at https://prasemap.qizhy.com.

A Sentiment and Style Controllable Approach for Chinese Poetry Generation

Yizhan Shao
Tong Shao
Minghao Wang
Peng Wang
Jie Gao

Sentiment and style control are two vital aspects in automatic poetry generation. Excellent Chinese classical poetry should express a certain emotion and embody a specific style at the same time. Existing work still has deficiencies in controlling sentiment and style simultaneously. To address above issues, in this paper, we propose a novel approach for Chinese classical poetry generation, which can generate sentiment-controllable and style-controllable poems. First, it classifies hundreds of thousands of poems by style, sentiment, format, and primary keyword. Then, it utilizes masking self-attention mechanism to associate multiple tags and verses. Besides, it can generate metrical rhyming verses with distinctive sentiment and style characteristics according to the tag-set and secondary keywords. Finally, this approach is applied in Chang Qing Yin, which can collaborate with users to polish generated poems, providing alternatives automatically. Experimental results show that our approach performs well in sentiment and style control, and quality of generated poems outperforms several strong baselines.

CauseBox: A Causal Inference Toolbox for BenchmarkingTreatment Effect Estimators with Machine Learning Methods

Paras Sheth
Ujun Jeong
Ruocheng Guo
Huan Liu
K. Selçuk Candan

Causal inference is a critical task in various fields such as healthcare, economics, marketing and education. Recently, there have been significant advances through the application of machine learning techniques, especially deep neural networks. Unfortunately, to-date many of the proposed methods are evaluated on different (data, software/hardware, hyperparameter) setups and consequently it is nearly impossible to compare the efficacy of the available methods or reproduce results presented in original research manuscripts. In this paper, we propose a causal inference toolbox (CauseBox) that addresses the aforementioned problems. At the time of publication, the toolbox includes seven state of the art causal inference methods and two benchmark datasets. By providing convenient command-line and GUI-based interfaces, the CauseBox toolbox helps researchers fairly compare the state of the art methods in their chosen application context against benchmark datasets. The code is made public at github.com/paras2612/CauseBox.

Videolytics: System for Data Analytics of Video Streams

Tomáš Skopal
Dominika Ďurišková
Petr Pechman
Marek Dobranský
Vladislav Khachaturian

We present Videolytics, a web-based system for advanced analytics over recorded video streams. Video cameras have become widely used for indoor and outdoor surveillance. Covering even more public space in cities, the cameras serve various purposes ranging from security to traffic monitoring, urban life, and marketing. The goal is to obtain effective and efficient models to process the video data automatically and produce the desired features for data analytics. Videolytics combines the best of deep learning and hand-designed analytical models to create a solution applicable in real-life situations. The architecture of the Videolytics framework is centered around a database of video features and detected objects, where new higher-level objects result from fusion of (lower-level) objects and features already stored in the database. The system provides a number of visualization options, an SQL-based analytics module as well as a real-time surveillance mode.

A Cohesive Structure Based Bipartite Graph Analytics System

Kai Wang
Yiheng Hu
Xuemin Lin
Wenjie Zhang
Lu Qin
Ying Zhang

Bipartite graphs arise naturally when modeling two different types of entities such as user-item, author-paper, and director-board. In recent years, driven by numerous real-world applications in these networks, mining cohesive structures in bipartite graphs becomes a popular research topic. In this paper, we propose the first cohesive-structure-based bipartite graph analytics system, CohBGA. The key innovative features of our system are as follows. Firstly, we involve several cohesive-structure-based models and statistics in our system to analyze bipartite graphs at different levels of granularity. Secondly, CohBGA has a user-friendly and interactive visual interface with various functional tools to meet users' diverse query requirements. Thirdly, we implement state-of-the-art algorithms in CohBGA to support efficient query processing. Furthermore, as a generic framework is designed in CohBGA, CohBGA is going to be an open-source bipartite graph analytics platform that allows researchers to evaluate the effectiveness of more cohesive-structure-based models and algorithms for bipartite graphs.

Jasmine: Exploring the Dependency-Aware Execution on Distributed Shared Memory

Xing Wei
Huiqi Hu
Xuan Zhou
Xuecheng Qi
Weining Qian
Jiang Wang
Aoying Zhou

Distributed shared memory abstraction can coordinate a cluster of machine nodes to empower performance-critical queries with the scalable memory space and abundant parallelism. But to deploy the query under such an abstraction, the general execution model just makes operators expressed as multiple subtasks and sequentially schedule them in parallel, while neglecting those vital dependencies between subtasks and data. In this paper, we conduct the in-depth researches about the issues (i.e., low CPU Utilization and poor data locality) raised by the ignorance of dependencies, and then propose a dependency-aware query execution model called Jasmine, which can (i) help users explicitly declare the dependencies and (ii) take these declared dependencies into the consideration of execution to address the issues. We invite our audience to use the rich graphical interfaces to interact with Jasmine to explore the dependency-aware query execution on distributed shared memory.

AliMe MKG: A Multi-modal Knowledge Graph for Live-streaming E-commerce

Guohai Xu
Hehong Chen
Feng-Lin Li
Fu Sun
Yunzhou Shi
Zhixiong Zeng
Wei Zhou
Zhongzhou Zhao
Ji Zhang

Live streaming is becoming an increasingly popular trend of sales in E-commerce. The core of live-streaming sales is to encourage customers to purchase in an online broadcasting room. To enable customers to better understand a product without jumping out, we propose AliMe MKG, a multi-modal knowledge graph that aims at providing a cognitive profile for products, through which customers are able to seek information about and understand a product. Based on the MKG, we build an online live assistant that highlights product search, product exhibition and question answering, allowing customers to skim over item list, view item details, and ask item-related questions. Our system has been launched online in the Taobao app, and currently serves hundreds of thousands of customers per day.

A Chinese Knowledge Base Question Answering System

Xiaona Xue
Jinling Jiang
Wenjian Zhang
Yanxiang Huang
Xindong Wu

This paper presents a HAO-Interaction question answering system, which exploits knowledge based question answering (KBQA) technology to quickly obtain an answer path for the input question, and then a creative text generation mechanism to acquire the final answer text. The system also provides visibility of the answer path on the user interface in order to facilitate user understanding. Different from other KBQA systems, HAO-Interaction supports users to incorporate an organizational graph database while accessing all system functionalities. In addition, the answer generation solution implemented in the system does not require any training data. HAO-Interaction keeps low response latency while ensuring a high user satisfaction. The effectiveness of HAO-Interaction has been verified by analyzing thousands of user reviews collected by the system.

Form 10-Q Itemization

Yanci Zhang
Tianming Du
Yujie Sun
Lawrence Donohue
Rui Dai

The quarterly financial statement, or Form 10-Q, is one of the most frequently required filings for US public companies to disclose financial and other important business information. Due to the massive volume of 10-Q filings and the enormous variations in the reporting format, it has been a long-standing challenge to retrieve item-specific information from 10-Q filings that lack machine-readable hierarchy. This paper presents a solution for itemizing 10-Q files by complementing a rule-based algorithm with a Convolutional Neural Network (CNN) image classifier. This solution demonstrates a pipeline that can be generalized to a rapid data retrieval solution among a large volume of textual data using only typographic items. The extracted textual data can be used as unlabeled content-specific data to train transformer models (e.g., BERT) or fit into various field-focus natural language processing (NLP) applications.

FaxPlainAC: A Fact-Checking Tool Based on EXPLAINable Models with HumAn Correction in the Loop

Zijian Zhang
Koustav Rudra
Avishek Anand

Fact-checking on the Web has become the main mechanism through which we detect the credibility of the news or information. Existing fact-checkers verify the authenticity of the information (support or refute the claim) based on secondary sources of information. However, existing approaches do not consider the problem of model updates due to constantly increasing training data due to user feedback. It is therefore important to conduct user studies to correct models' inference biases and improve the model in a life-long learning manner in the future according to the user feedback. In this paper, we present FaxPlainAC, a tool that gathers user feedback on the output of explainable fact-checking models. FaxPlainAC outputs both the model decision, i.e., whether the input fact is true or not, along with the supporting/refuting evidence considered by the model. Additionally, FaxPlainAC allows for accepting user feedback both on the prediction and explanation. Developed in Python, FaxPlainAC is designed as a modular and easily deployable tool. It can be integrated with other downstream tasks and allowing for fact-checking human annotation gathering and life-long learning.

SaDes: An Interactive System for Sensitivity-aware Desensitization towards Tabular Data

Kechun Zhao
Hui Li
Zheng Gong
Jiangtao Cui

Before the publication of particular datasets, in order to protect the private information while preserving the usability as much as possible, desensitization is required. Automatic identification and evaluation of sensitive attributes are prerequisites for targeted desensitization of datasets, sensitivity can also reflect the effect of desensitization in turn. However, existing desensitization systems all rely on predefined desensitization model with respect to manually given sensitivity levels, which is subjective and unable to be applied end-to-end. Besides, there is no way for the user to tell whether the desensitization is performed enough or superfluous. In this demonstration, we present an interactive system for sensitivity-aware desensitization towards tabular data (SaDes). It automatically evaluates the risks of re-identification for arbitrary columns according to record-linkage attack, and performs desensitization accordingly. The risks of re-identification for the desensitized data can be immediately evaluated such that the user can iteratively execute desensitization in order to achieve a better balance between the usability and privacy. To the best of our knowledge, SaDes is the first system that provides automatic sensitivity evaluation and interactive desensitization in a back-to-back manner.

AntOpt: A Multi-functional Large-scale Decision Optimization Platform

Jun Zhou
Yang Bao
Hua Wu
Zhigang Hua

The orderly operation and development of any system are indivisible from decision optimization. Several issues in life are applicable to the thought of optimization problems to resolve. In this digital age, the size of information and data is obtaining larger and the potency of problem determination is changing into more demanding. Though there're many solvers for specific optimization problems, in the face of large-scale scenarios, there's no single platform that concurrently addresses usability, solvers' uniformity, and computing efficiency. In this demo, we present AntOpt, a decision optimization platform that integrates large-scale distributed computing engines, optimization algorithm solvers and productized services.

SESSION: Tutorials

Online Advertising Incrementality Testing: Practical Lessons And Emerging Challenges

Joel Barajas
Narayan Bhamidipati
James G. Shanahan

Online advertising has historically been approached as an ad-to-user matching problem within sophisticated optimization algorithms. As the research and ad-tech industries have progressed, advertisers have increasingly emphasized the causal effect estimation of their ads (incrementality) using controlled experiments (A/B testing). With low lift effects and sparse conversion, the development of incrementality testing platforms at scale suggests tremendous engineering challenges in measurement precision. Similarly, the correct interpretation of results addressing a business goal requires significant data science and experimentation research expertise. We propose a practical tutorial in the incrementality testing landscape, including: The business need; Literature solutions and industry practices; Designs in the development of testing platforms; The testing cycle, case studies, and recommendations. We provide first-hand lessons based on the development of such a platform in a major combined DSP and ad network, and after running several tests for up to two months each over recent years.

Aggregation Techniques in Crowdsourcing: Multiple Choice Questions and Beyond

Djellel Difallah
Alessandro Checco

Crowdsourcing has been leveraged in various tasks and applications, primarily to gather information from human annotators in exchange for a monetary reward. The main challenge associated with crowdsourcing is the low quality of the results, which can stem from multiple reasons, including bias, error, and adversarial behavior. Researchers and practitioners can apply quality control methods to prevent and detect low-quality responses. For example, worker selection methods utilize qualifications and attention check questions before assigning a task. Similarly, task routing identifies the workers who can provide a more accurate response to a given task type using recommender system techniques. In practice, posterior quality control methods are the most common approach to deal with noisy labels once they are obtained. Such methods require task repetition, i.e., assigning the task to multiple crowd-workers, followed by an aggregation mechanism (aka truth inference) to select the most likely answer or request an additional label. A large number of techniques have been proposed for crowdsourcing aggregation covering several types of task types. This tutorial aims to present common and recent label aggregation techniques for multiple-choice questions, multi-class labels, ratings, pairwise comparison, and image/text annotation. We believe that the audience will benefit from the focus on this specific research area to learn about the best techniques to apply in their crowdsourcing projects.

Large-Scale Information Extraction under Privacy-Aware Constraints

Rajeev Gupta
Ranganath Kondapally

In this digital age, people spend a significant portion of their lives online and this has led to an explosion of personal data from users and their activities. Typically, this data is private and nobody else, except the user, is allowed to look at it. This poses interesting and complex challenges from scalable information extraction point of view: extracting information under privacy aware constraints where there is little data to learn from but need highly accurate models to run on large amount of data across different users. Anonymization of data is typically used to convert private data into publicly accessible data. But this may not always be feasible and may require complex differential privacy guarantees in order to be safe from any potential negative consequences. Other techniques involve building models on a small amount of seen (eyes-on) data and a large amount of unseen (eyes-off) data. In this tutorial, we use emails as representative private data to explain the concepts of scalable IE under privacy-aware constraints.

Fair Graph Mining

Jian Kang
Hanghang Tong

In today's increasingly connected world, graph mining plays a pivotal role in many real-world application domains, including social network analysis, recommendations, marketing and financial security. Tremendous efforts have been made to develop a wide range of computational models. However, recent studies have revealed that many widely-applied graph mining models could suffer from potential discrimination. Fairness on graph mining aims to develop strategies in order to mitigate bias introduced/amplified during the mining process. The unique challenges of enforcing fairness on graph mining include (1) theoretical challenge on non-IID nature of graph data, which may invalidate the basic assumption behind many existing studies in fair machine learning, and (2) algorithmic challenge on the dilemma of balancing model accuracy and fairness. This tutorial aims to (1) present a comprehensive review of state-of-the-art techniques in fairness on graph mining and (2) identify the open challenges and future trends. In particular, we start with reviewing the background, problem definitions, unique challenges and related problems; then we will focus on an in-depth overview of (1) recent techniques in enforcing group fairness, individual fairness and other fairness notions in the context of graph mining, and (2) future directions in studying algorithmic fairness on graphs. We believe this tutorial could be attractive to researchers and practitioners in areas including data mining, artificial intelligence, social science and beneficial to a plethora of real-world application domains.

AutoML: From Methodology to Application

Yaliang Li
Zhen Wang
Yuexiang Xie
Bolin Ding
Kai Zeng
Ce Zhang

Machine Learning methods have been adopted for a wide range of real-world applications, ranging from social networks, online image/video-sharing platforms, and e-commerce to education, healthcare, etc. However, in practice, a large amount of effort is required to tune several components of machine learning methods, including data representation, hyperparameter, and model architecture, in order to achieve a good performance. To alleviate the required tunning efforts, Automated Machine Learning (AutoML), which can automate the process of applying machine learning methods, has been studied in both academy and industry recently. In this tutorial, we will introduce the main research topics of AutoML, including Hyperparameter Optimization, Neural Architecture Search, and Meta-Learning. Two emerging topics of AutoML, Automatic Feature Generation and Machine Learning Guided Database, will also be discussed since they are important components for real-world applications. For each topic, we will motivate it with application examples from industry, illustrate the state-of-the-art methodologies, and discuss some future research directions based on our experience from industry and the trends in academy.

CIKM 2021 Tutorial on Fairness of Machine Learning in Recommender Systems

Yunqi Li
Yingqiang Ge
Yongfeng Zhang

Recently, there has been growing attention on fairness considerations in machine learning. As one of the most pervasive applications of machine learning, recommender systems are gaining increasing and critical impacts on human and society since a growing number of users use them for information seeking and decision making. Therefore, it is crucial to address the potential unfairness problems in recommendation, which may hurt users' or providers' satisfaction in recommender systems as well as the interests of the platforms. The tutorial focuses on the foundations and algorithms for fairness in recommendation. It also presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking. The tutorial will introduce the taxonomies of current fairness definitions and evaluation metrics for fairness concerns. We will introduce previous works about fairness in recommendation and also put forward future fairness research directions. The tutorial aims at introducing and communicating fairness in recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.

IR From Bag-of-words to BERT and Beyond through Practical Experiments

Craig Macdonald
Nicola Tonellotto
Sean MacAvaney

The task of adhoc search is undergoing a renaissance, sparked by advances in natural language processing. In particular, pre-trained contextualized language models (such as BERT and T5) have consistently shown to be a highly-effective foundation upon which to build ranking models. These models are equipped with a far deeper understanding of language than the capabilities of bag-of-words (BoW) models. Applying these techniques to new tasks can be tricky, however, as they require knowledge of deep learning frameworks, and significant scripting and data munging. In this full-day tutorial, we build up from foundational retrieval principles to the latest neural ranking techniques. We first provide foundational background on classical bag-of-words methods. We then show how feature-based Learning to Rank methods can be used to re-rank these results. Finally, we cover contemporary approaches, such as BERT, doc2query, and dense retrieval. Throughout the process, we demonstrate how these can be easily experimentally applied to new search tasks in a declarative style of conducting experiments exemplified by the PyTerrier and OpenNIR search toolkits.

This tutorial is interactive in nature for participants. It is broken into sessions, each of which mixes explanatory presentation with hands-on activities using prepared Jupyter notebooks running on the Google Colab platform. These activities give participants experience applying the techniques covered in the tutorial on the TREC COVID benchmark test collection. The tutorial is broken into four sessions. In the first session, we cover foundational retrieval concepts, including inverted indexing, retrieval, and scoring. We also demonstrate how evaluation can be conducted in a declarative fashion within PyTerrier, encapsulating ideas such as significance testing, and multiple correction, as promoted as IR best practices. In the second session, we build upon the core retrieval concepts to demonstrate how to re-write queries (e.g., using RM3) and re-rank documents (e.g., using learning-to-rank). In the third session, we introduce contextualized language models, such as BERT and show how they can be utilized for document re-ranking (e.g, using Vanilla/monoBERT and EPIC). Finally, in session four, we move beyond re-ranking and cover how approaches that modify documents (e.g., DeepCT) as well as efforts to replace the traditional inverted index with an embedding-based index (e.g., ANCE, ColBERT, and ColBERT-PRF). By the end of the tutorial, participants will have experience conducting IR experiments from classical bag-of-words models to contemporary BERT models and beyond.

Fake News, Disinformation, Propaganda, and Media Bias

Preslav Nakov
Giovanni Da San Martino

The rise of Internet and social media changed not only how we consume information, but it also democratized the process of content creation and dissemination, thus making it easily available to anybody. Despite the hugely positive impact, this situation has the downside that the public was left unprotected against biased, deceptive, and disinformative content, which could now travel online at breaking-news speed and allegedly influence major events such as political elections, or disturb the efforts of governments and health officials to fight the ongoing COVID-19 pandemic. The research community responded to the issue, proposing a number of inter-connected research directions such as fact-checking, disinformation, misinformation, fake news, propaganda, and media bias detection. Below, we cover the mainstream research, and we also pay attention to less popular, but emerging research directions, such as propaganda detection, check-worthiness estimation, detecting previously fact-checked claims, and multimodality, which are of interest to human fact-checkers and journalists. We further cover relevant topics such as stance detection, source reliability estimation, detection of persuasion techniques in text and memes, and detecting malicious users in social media. Moreover, we discuss large-scale pre-trained language models, and the challenges and opportunities they offer for generating and for defending against neural fake news. Finally, we explore some recent efforts aiming at flattening the curve of the COVID-19 infodemic.

Adversarial Robustness of Deep Learning: Theory, Algorithms, and Applications

Wenjie Ruan
Xinping Yi
Xiaowei Huang

This tutorial aims to introduce the fundamentals of adversarial robustness of deep learning, presenting a well-structured review of up-to-date techniques to assess the vulnerability of various types of deep learning models to adversarial examples. This tutorial will particularly highlight state-of-the-art techniques in adversarial attacks and robustness verification of deep neural networks (DNNs). We will also introduce some effective countermeasures to improve robustness of deep learning models, with a particular focus on adversarial training. We aim to provide a comprehensive overall picture about this emerging direction and enable the community to be aware of the urgency and importance of designing robust deep learning models in safety-critical data analytical applications, ultimately enabling the end-users to trust deep learning classifiers. We will also summarize potential research directions concerning the adversarial robustness of deep learning, and its potential benefits to enable accountable and trustworthy deep learning-based data analytical systems and applications.

SESSION: Workshops

MODIMO: Workshop on Multi-Omics Data Integration for Modelling Biological Systems

Marco Beccuti
Vincenzo Bonnici
Rosalba Giugno

Multi-omics analysis aims at extracting previously uncovered biological knowledge by integrating information across multiple single-omic sources. Past approaches have focused on the simultaneous analysis of a small number of omic data sets. Current challenges face the problem of integrating multiple omic sources into a unified complex model, or of combining already available tools for two-by-two omics analyses and merging their outcomes. By doing so and leveraging integrated system-level knowledge, multi-omic approaches ought to enable the development of better qualitative and quantitative models for descriptive and predictive analyses. To move this area forward, new statistical and algorithmic frameworks are needed, for example for generalizing classical graph theory results to heterogeneous networks and applying them to diverse problems such as drug repurposing or understanding the immune response to infections. Thus, in short, this workshop aims at investigating novel methodologies for providing crucial insights into multi-omics data management, integration, and analysis in order to enable biological discoveries.

MUFin'21: First International Workshop on Modelling Uncertainty in the Financial World

Srikanta Bedathur
Tanmoy Bhowmik
Nitendra Rajput
Karamjit Singh
Maneet Singh

Of many things, Covid-19 has provided a stark proof that uncertainty is real, and it is here to stay. Perhaps nothing is more sensitive to uncertainty than the Financial World. To couple with it, while Artificial Intelligence techniques are used to predict the future state of events, their performance is significantly impacted by disruptions not captured in the past. Unforeseen scenarios such as economy changes, variations in the customer behaviour, pandemics, recessions, and fraudulent transactions often result in unexpected behaviour of financial models, thus associating a level of uncertainty with them. It is thus imperative for the research community to explore, identify, analyze, and address such uncertainties in order to develop robust models applicable in real-world scenarios. To this effect, the International Workshop on Modelling Uncertainty in the Financial World 2021 (MUFin21) aims to bring academics and industry experts together to discuss on this important, timely and yet- unsolved area of modelling uncertainties in the financial world.

Learning to Quantify: Methods and Applications (LQ 2021)

Juan José del Coz
Pablo González
Alejandro Moreo
Fabrizio Sebastiani

Learning to Quantify (LQ) is the task of training class prevalence estimators via supervised learning. The task of these estimators is to estimate, given an unlabelled set of data items D and a set of classes C ={c1,...., c|C|}, the prevalence (i.e., relative frequency) of each class c_i in D. LQ is interesting in all applications of classification in which the final goal is not determining which class (or classes) individual unlabelled data items belong to, but estimating the distribution of the unlabelled data items across the classes of interest. Example disciplines whose interest in labelling data items is at the aggregate level (rather than at the individual level) are the social sciences, political science, market research, ecological modelling, and epidemiology. While LQ may in principle be solved by classifying each data item in D and counting how many such items have been labelled with c_i, it has been shown that this "classify and count'' (CC) method yields suboptimal quantification accuracy. As a result, quantification is now no longer considered a mere byproduct of classification and has evolved as a task of its own. The goal of this workshop is bringing together all researchers interested in methods, algorithms, and evaluation measures and methodologies for LQ, as well as practitioners interested in their practical application to managing large quantities of data.

THECOG - Transforms In Behavioral And Affective Computing

Georgios Drakopoulos
Eleanna Kafeza

Human decision making is central in many functions across a broad spectrum of fields such as marketing, investment, smart contract formulations, political campaigns, and organizational strategic management. Behavioral economics seeks to study the psychological, cultural, and social factors contributing to decision making along reasoning. It should be highlighted here that behavioral economics do not negate classical economic theory but rather extend it in two distinct directions. First, a finer granularity can be obtained by studying the decision making process not of massive populations but instead of individuals and groups with signal estimation or deep learning techniques based on a wide array of attributes ranging from social media posts to physiological signs. Second, time becomes a critical parameter and changes to the disposition towards alternative decisions can be tracked with input-output or state space models. The primary findings so far are concepts like bounded rationality and perceived risk, while results include optimal strategies for various levels of information awareness and action strategies based on perceived loss aversion principles. From the above it follows that behavioral economics relies on deep learning, signal processing, control theory, social media analysis, affective computing, natural language processing, and gamification to name only a few fields. Therefore, it is directly tied to computer science in many ways. THECOG will be a central meeting point for researchers of various backgrounds in order to generate new interdisciplinary and groundbreaking results.

CDCEO'21 - First Workshop on Complex Data Challenges in Earth Observation

Aleksandra Gruca
Pedro Herruzo
Pilar Rípodas
Andrzej Kucik
Christian Briese
Michael K. Kopp
Sepp Hochreiter
Pedram Ghamisi
David P. Kreil

High-resolution remote sensing technology for Earth Observation (EO) has radically changed how we monitor the state of our planet around the clock. An effective interpretation of the resulting complex large-scale time series adopts the best machine learning techniques from signal processing, computer vision, pattern recognition, and artificial intelligence. The First Workshop on Complex Data Challenges in Earth Observation was open to both method development and advanced applications in a wide range of related topics, including image and signal processing, gap-filling, data fusion, feature extraction, prediction of spatio-temporal features, and the detection of rules underlying the observed state transitions and causal relationships. The full agenda, featuring keynotes and a selection of high quality contributed talks is available online at www.iarai.ac.at/cdceo21

IWILDS'21: Second International Workshop on Learning During Web Search

Anett Hoppe
Ran Yu
Irina Brich
Jiqun Liu

Web search is one of the most ubiquitous online activities and often used as a starting point to learn, i. e., to acquire or extend one's knowledge about certain topics or procedures. When learning by searching the Web, individuals are confronted with an unprecedented amount of information in various forms and varying quality. Thus, successful learning on the Web requires high degrees of self-regulation and should be supported by the adequate design of search, recommendation, and training tools. This creates a highly interdisciplinary research area at the intersection of information retrieval, human-computer interaction, psychology, and educational sciences. Search as Learning (SAL) research examines the relationships between querying, navigation, media consumption behavior, and the learning outcomes during Web search, how they can be measured, predicted, and supported.

Building on the growing SAL research community, IWILDS provides an interdisciplinary forum in a workshop that includes keynotes, paper presentations, and discussion. This year, IWILDS'21 specifically invites submissions focussing on challenges caused by the Covid-19 pandemic, investigating, for instance, health-related information acquisition on the Web and the pandemic-induced digitalization of formal learning.

First Workshop on Knowledge Injection in Neural Networks (KINN)

Vasudev Lal
Somak Aditya
Yezhou Yang
Pasquale Minervini
Sandya Mannarswamy

Deep learning (DL) has made rapid progress in the last decade, with neural network-based language and vision models achieving state-of-the-art performance in various tasks. Yet purely data-driven neural network models exhibit several issues impacting real-world deployment of such models adversely. These include reliance on large quantities of training data, poor robustness, lack of generalization, poor explainability, and glaring gaps in implicit and commonsense knowledge. The availability of rich structured (or semi-structured) knowledge sources has spurred the research community into exploring Knowledge Injection in Neural Networks (KINN) as a means of mitigating the above-mentioned challenges. This has led to the development of hybrid AI systems that combine the purely data-driven learning of the neural network models with an infusion of knowledge from external sources. Such KINN systems include the development of retrieval augmented neural models, neuro symbolic systems and a plethora of combinations of NNs and knowledge graphs and structured knowledge bases.

Given the considerable promise of knowledge injection in overcoming the current challenges associated with purely data driven NN models, we propose a workshop on Knowledge Injection in Neural Networks (KINN) at CIKM 2021. The goal of this workshop is to focus the attention of the CIKM research community on addressing the open challenges in this emerging research area. Given that the CIKM research community has a rich mix of experts in structured knowledge representation, IR and DL, this workshop is intended to facilitate cross-disciplinary collaboration across the various CIKM research communities into building efficient and scalable KINN systems.

Human-Centric Analytics and Systems Impacting Quality of Life: ECG Analytics and Beyond

Arijit Ukil
Leandro Marin
Antonio J. Jara
John Farserotu

With the proliferation of Internet of Things (IoT), sensor technologies and advancement in the analytics systems mainly in the form of deep learning algorithms, the long quest for developing human-centric applications like automated disease diagnosis, privacy-enabled analytics is becoming a reality. Knowledge-Driven Analytics and Systems Impacting Human Quality of Life (KDAH) is indeed an attractive proposition. For example, Phonocardiogram or heart sound-based detection of heart abnormality or Myocardial Infarction (MI) prediction using single lead Electrocardiogram (ECG) signals are signifying the capability of analytics algorithm to directly solve human-centric problems. We have proposed privacy-preserved analytics with phonocardiogram-based heart condition detection, established benchmark performance of MI detection using single lead ECG signals as well as demonstrated deep residual learning based algorithm to help carbon footprint management.

International Workshop on Privacy, Security and Trust in Computational Intelligence (PSTCI2021)

Xuyun Zhang
Deepak Puthal
Chi Yang
Guanfeng Liu
Kim-Kwang Raymond Choo
Hongzhi Yin

While being a lasting theme, privacy, security, and trust (PST) has been increasingly important in recent days due to the pervasive (but more prone) computation infrastructure and deep (but more intrusive) data analytics, and has been hugely demanded from governments, companies, and individuals. This workshop aims at providing a forum for researchers, practitioners and developers from different background areas such as computational intelligence, data privacy and cyber security, trust management, cloud computing, edge computing, Internet of Things, big data analytics, machine learning and data mining, knowledge discovery to exchange the latest experience, research ideas and synergic research and development on fundamental issues and applications about privacy, security and trust issues in computational intelligence.