CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Full Citation in the ACM Digital Library

SESSION: Keynote Talks

Implicit User-Generated Content in the Service of Public Health

Every day millions of people use online products and services to satisfy their information needs. While doing so, they produce large volumes of user-generated content (UGC). In this talk, we will distinguish between "explicit" UGC, which is intended to be made public (such as product ratings or reviews), and "implicit" UGC, which can be responsibly anonymized and aggregated in a privacy-preserving way to improve public health. We will analyze implicit UGC as a positive consumption externality, and will discuss its beneficial uses across a range of public health applications.

The bulk of this talk will focus on methods for aggregating and analyzing the data to provide timely signals that help guide public health interventions and assess their efficacy. We will discuss applications such as estimating disease incidence, outbreak prediction, mitigating pandemic spread, and improving public health messaging.

Ensemble Learning Methods for Dirty Data

Neural network ensemble is a collaborative learning paradigm that utilizes multiple neural networks to solve a complex learning problem. Constructing predictive models with high generalization performance is an important and yet most challenging goal for robust intelligence systems in the presence of dirty data. Given a target learning task, popular approaches have been dedicated to find the top performing model. However, it is difficult in general to estimate the best model when available data is finite, possibly dirty, and insufficient for the problem. In this keynote, I will give an overview of a diversity-centric ensemble learning framework developed at Georgia Tech, including methodologies and algorithms for measuring, enforcing, and combining multiple neural networks by improving generalization performance of the overall system and maximizing ensemble utility and resilience to dirty data.

Exploring and Analyzing Change: The Janus Project

Data change, all the time. The Janus project seeks to address the Variability dimension of Big Data by modeling, exploring, and analyzing such change, providing valuable insights into the evolving real world and ways in which data about it are collected and used.

We start by identifying technical challenges that need to be addressed to realize the Janus vision. Towards this end, we have extracted and worked with the histories of various structured datasets, including DBLP, IMDB, open government data, and Wikipedia, for which a detailed history of every edit is available. Our DBChEx (Database Change Explorer) prototype enables interactive exploration of data and schema changes, and we show how DBChEx can help users gain valuable insights by exploring two real-world datasets, IMDB and Wikipedia infoboxes.

Based on an analysis of the history of 3.5M tables on the English Wikipedia for a total of 53.8M table versions, we then illustrate the rich history of structured Wikipedia data: we show that tables are created in certain locations, they change their shape, they move, they grow, they shrink, their data change, they vanish, and they re-appear; indeed, each table has a life of its own. Finally, to help automatically interpret the useful knowledge harbored in the history of Wikipedia tables, we present recent results on two technical problems: (i) identifying Natural Keys, a particularly important piece of metadata, which serves as a primary key in tables over time and consists of attributes inherent to an entity, and (ii) matching tables, infoboxes and lists within a Wikipedia page across page revisions. We solve these problems at scale and make the resulting curated datasets available to the community to facilitate future research.

How Hybrid Work Will Make Work More Intelligent

We are in the middle of the most significant change to work practices in generations. For hundreds of years, physical space was the most important technology people used to get things done. The coming Hybrid Work Era, however, will be shaped by digital technology. The recent rapid shift to remote work accelerated the digital transformation already underway at many organizations, and new types of work-related data are now being generated at an unprecedented rate. For example, the average Microsoft Teams user spends 252% more time in the application now than they did in February 2020.

During the early stages of the pandemic, we saw the direct impact of digital technology on work in its ability to help people sustain collaboration across time and space. But looking forward, the new digital knowledge captured in the Hybrid Work Era will allow us to reimagine work at an even more fundamental level. AI systems, for example, can now learn from the conversations people have to support knowledge re-use, and even learn how successful conversations happen to help drive more productive meetings.

Historically, AI systems have been hindered in a work context by a lack of data; the development of foundation models is changing that, creating an opportunity to combine general world knowledge with the knowledge and behaviors currently locked up and siloed as we work. The CIKM community can shape the new future of work, but first must address the challenges surrounding workplace knowledge management that arise as we have more data, more sophisticated AI, and more human engagement. In this talk I will give an overview of what research tells us about emerging work practices, and explore how the CIKM community can build on these findings to help create a new – and better – future of work.

SESSION: CIKM'22 Full Papers

AutoForecast: Automatic Time-Series Forecasting Model Selection

In this work, we develop techniques for fast automatic selection of the best forecasting model for a new unseen time-series dataset, without having to first train (or evaluate) all the models on the new time-series data to select the best one. In particular, we develop a forecasting meta-learning approach called AutoForecast that allows for the quick inference of the best time-series forecasting model for an unseen dataset. Our approach learns both forecasting models performances over time horizon of same dataset and task similarity across different datasets. The experiments demonstrate the effectiveness of the approach over state-of-the-art (SOTA) single and ensemble methods and several SOTA meta-learners (adapted to our problem) in terms of selecting better forecasting models (i.e., 2X gain) for unseen tasks for univariate and multivariate testbeds.

On Smoothed Explanations: Quality and Robustness

Explanation methods highlight the importance of the input features in taking a predictive decision, and represent a solution to increase the transparency and trustworthiness in machine learning and deep neural networks (DNNs). However, explanation methods can be easily manipulated generating misleading explanations particularly under visually imperceptible adversarial perturbations. Recent work has identified the decision surface geometry of DNNs as the main cause of this phenomenon. To make explanation methods more robust against adversarially crafted perturbations, recent research has promoted several smoothing approaches. These approaches smooth either the explanation map or the decision surface.

In this work, we initiate a very thorough evaluation of the quality and robustness of the explanations offered by smoothing approaches. Different properties are evaluated. We present settings in which the smoothed explanations are both better, and worse, than the explanations derived by the commonly-used (non-smoothed) Gradient explanation method. By making the connection with the literature on adversarial attacks, we demonstrate that such smoothed explanations are robust primarily against additive attacks. However, a combination of additive and non-additive attacks can still manipulate these explanations, revealing important shortcomings in their robustness properties.

Generative Adversarial Zero-Shot Learning for Cold-Start News Recommendation

News recommendation models extremely rely on the interactive information between users and news articles to personalize the recommendation. Therefore, one of their most serious challenges is the cold-start problem (CSP). Their performance is dropped intensely for new users or new news. Zero-shot learning helps in synthesizing a virtual representation of the missing data in a variety of application tasks. Therefore, it can be a promising solution for CSP to generate virtual interaction behaviors for new users or new news articles. In this paper, we utilize the generative adversarial zero-shot learning in building a framework, namely, GAZRec, which is able to address the CSP caused by purely new users or new news. GAZRec can be flexibly applied to any neural news recommendation model. According to the experimental evaluations, applying the proposed framework to various news recommendation baselines attains a significant AUC improvement of 1% - 21% in different cold start scenarios and 1.2% - 6.6% in the regular situation when both users and news have a few interactions.

UnCommonSense: Informative Negative Knowledge about Everyday Concepts

Commonsense knowledge about everyday concepts is an important asset for AI applications, such as question answering and chatbots. Recently, we have seen an increasing interest in the construction of structured commonsense knowledge bases (CSKBs). An important part of human commonsense is about properties that do not apply to concepts, yet existing CSKBs only store positive statements. Moreover, since CSKBs operate under the open-world assumption, absent statements are considered to have unknown truth rather than being invalid. This paper presents the UNCOMMONSENSE framework for materializing informative negative commonsense statements. Given a target concept, comparable concepts are identified in the CSKB, for which a local closed-world assumption is postulated. This way, positive statements about comparable concepts that are absent for the target concept become seeds for negative statement candidates. The large set of candidates is then scrutinized, pruned and ranked by informativeness. Intrinsic and extrinsic evaluations show that our method significantly outperforms the state-of-the-art. A large dataset of informative negations is released as a resource for future research.

KRAF: A Flexible Advertising Framework using Knowledge Graph-Enriched Multi-Agent Reinforcement Learning

Bidding optimization is one of the most important problems in online advertising. Auto-bidding tools are designed to address this problem and are offered by most advertising platforms for advertisers to allocate their budgets. In this work, we present a Knowledge Graph-enriched Multi-Agent Reinforcement Learning Advertising Framework (KRAF). It combines Knowledge Graph (KG) techniques with a Multi-Agent Reinforcement Learning (MARL) algorithm for bidding optimization with the goal of maximizing advertisers' return on ad spend (ROAS) and user-ad interactions, which correlates to the ad platform revenue. In addition, this proposal is flexible enough to support different levels of user privacy and the advent of new advertising markets with more heterogeneous data. In contrast to most of the current advertising platforms that are based on click-through rate models using a fixed input format and rely on user tracking, KRAF integrates the heterogeneous available data (e.g., contextual features, interest-based attributes, information about ads) as graph nodes to generate their dense representation (embeddings). Then, our MARL algorithm leverages the embeddings of the entities to learn efficient budget allocation strategies. To that end, we propose a novel coordination strategy based on a mean-field style to coordinate the learning agents and avoid the curse of dimensionality when the number of agents grows. Our proposal is evaluated on three real-world datasets to assess its performance and the contribution of each of its components, outperforming several baseline methods in terms of ROAS and number of ad clicks.

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

Sparsity regularized loss minimization problems play an important role in various fields including machine learning, data mining, and modern statistics. Proximal gradient descent method and coordinate descent method are the most popular approaches to solving the minimization problem. Although existing methods can achieve implicit model identification, aka support set identification, in a finite number of iterations, these methods still suffer from huge computational costs and memory burdens in high-dimensional scenarios. The reason is that the support set identification in these methods is implicit and thus cannot explicitly identify the low-complexity structure in practice, namely, they cannot discard useless coefficients of the associated features to achieve algorithmic acceleration via dimension reduction. To address this challenge, we propose a novel accelerated doubly stochastic gradient descent (ADSGD) method for sparsity regularized loss minimization problems, which can reduce the number of block iterations by eliminating inactive coefficients during the optimization process and eventually achieve faster explicit model identification and improve the algorithm efficiency. Theoretically, we first prove that ADSGD can achieve a linear convergence rate and lower overall computational complexity. More importantly, we prove that ADSGD can achieve a linear rate of explicit model identification. Numerically, experimental results on benchmark datasets confirm the efficiency of our proposed method.

CASA-Net: A Context-Aware Correlation Convolutional Network for Scale-Adaptive Crack Detection

Surface cracks in infrastructure are a key indicator of structural safety and degradation. Visual-based crack detection is a critical task for the enormous application demands of infrastructure industries. Convolution operations have been widely deployed due to the strong feature learning abilities. However, global feature dependencies of multi-scale cracks are ignored due to the limited receptive field.In addition, the detection of cracks with low contrast suffers a serious performance loss.Therefore, to address the scale-adaptive crack detection problem, we propose a context-aware correlation convolutional network for scale-adaptive crack detection named CASA-Net. CASA-Net is capable of extracting multi-scale crack features for distinguishing between cracks and surface backgrounds, and evaluating feature correlations to capture global contexts. CASA-Net is composed of the multi-scale distinguishing feature extraction (MDFE) module and the context-aware feature correlation (CAFC) module. Specifically, the MDFE module consists of multiple cascaded convolutional layers and distinguishing feature extraction layers (DFLayers). The CAFC module consists of a mapping block and cascaded correlators to capture the context-aware features for long-range interactions. The performance of CASA-Net is evaluated on a benchmark crack dataset. The experimental results indicate that CASA-Net outperforms rival methods by achieving an F1-Score of 0.65 and an AP50 of 63.9%.

Collaborative Image Understanding

Automatically understanding the contents of an image is a highly relevant problem in practice. In e-commerce and social media settings, for example, a common problem is to automatically categorize user-provided pictures. Nowadays, a standard approach is to fine-tune pre-trained image models with application-specific data. Besides images, organizations however often also collect collaborative signals in the context of their application, in particular how users interacted with the provided online content, e.g., in forms of viewing, rating, or tagging. Such signals are commonly used for item recommendation, typically by deriving latent user and item representations from the data. In this work, we show that such collaborative information can be leveraged to improve the classification process of new images. Specifically, we propose a multitask learning framework, where the auxiliary task is to reconstruct collaborative latent item representations. A series of experiments on datasets from e-commerce and social media demonstrates that considering collaborative signals helps to significantly improve the performance of the main task of image classification by up to 9.1%.

Samba: Identifying Inappropriate Videos for Young Children on YouTube

YouTube videos are one of the most effective platforms for disseminating creative material and ideas, and they appeal to a diverse audience. Along with adults and older children, young children are avid consumers of YouTube materials. Children often lack means to evaluate if a given content is appropriate for their age, and parents have very limited options to enforce content restrictions on YouTube. Young children can thus become exposed to inappropriate content, such as violent, scary or disturbing videos on YouTube. Previous studies demonstrated that YouTube videos can be classified into appropriate or inappropriate for young viewers using video metadata, such as video thumbnails, title, comments, etc. Metadata-based approaches achieve high accuracy, but still have significant misclassifications, due to the reliability of input features. In this paper, we propose a fusion model, called Samba, which uses both metadata and video subtitles for content classification. Using subtitles in the model helps better infer the true nature of a video improving classification accuracy. On a large-scale, comprehensive dataset of 70K videos, we show that Samba achieves 95% accuracy, outperforming other state-of-the-art classifiers by at least 7%. We also publicly release our dataset.

DocSemMap 2.0: Semantic Labeling based on Textual Data Documentations Using Seq2Seq Context Learner

Methods for automated semantic labeling of data are an indispensable basis for increasing the usability of data. On the one hand, they contribute to the homogenization of the annotations and thus to the increase in quality; on the other hand, they reduce the modeling effort, provided that the quality of the used methodology is sufficient. In the past, research has focused primarily on data- and label-based methods. Another approach that has received recent attention is the incorporation of textual data documentations to support the automatic mapping of datasets to a knowledge graph. However, upon deeper analysis, our recent approach called DocSemMap gives away potential in a number of places. In this paper, we extend the current state of the art approach by uncovering existing shortcomings and presenting our own improvements. Using a sequence-to-sequence model (Seq2Seq), we exploit the context of datasets. An additional introduced classifier provides the linkage of documentation and labels for prediction. Our extended approach achieves a sustainable improvement in comparison to the reference approach.

Memory Graph with Message Rehearsal for Multi-Turn Dialogue Generation

Multi-turn dialogue system has attracted increasing attention in both academic and industry community. Multi-turn dialogue generation task is a challenging work as the relations among words, utterances and external knowledge are extremely complex. However, the existing methods only focus on constructing the relations between current utterance and historical utterances, and they even oversimplify the relation mining process. Moreover, with the accumulation of dialogue information, the deep semantic information is difficult to understand so that it needs a mechanism with the ability of reasoning and digesting information repeatedly, which is ignored by previous methods. In order to solve the above problems, we propose a Memory Graph with Message Rehearsal (MGMR) for dialogue generation based on the cognitive process of human memory. MGMR contains three main modules: sensory memory, short-term memory and long-term memory. Sensory memory converts the current utterance into embeddings from both word-level and sentence-level. We design a message rehearsal module in short-term memory to extract valuable information of current utterance deeply and repeatedly combined with the relative historical dialogue information and external knowledge stored in long-term memory. Furthermore, we innovatively design a memory graph in long-term memory to construct the relations among words, utterances and knowledge. The memory graph achieves three goals: extracting accurate relations between current utterance and historical utterances, updating the historical dialogue information, and achieving knowledge precipitation by expanding memory graph with the key words and relevant external knowledge of current utterance. We evaluate our model on real-world datasets and achieve better performance compared with the existing state-of-the-art methods.

Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models

Neural ranking models (NRMs) have become one of the most important techniques in information retrieval (IR). Due to the limitation of relevance labels, the training of NRMs heavily relies on negative sampling over unlabeled data. In general machine learning scenarios, it has shown that training with hard negatives (i.e., samples that are close to positives) could lead to better performance. Surprisingly, we find opposite results from our empirical studies in IR. When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse. Based on our investigation, the superficial reason is that there are more false negatives (i.e., unlabeled positives) in the top-ranked results with a stronger retriever, which may hurt the training process; The root is the existence of pooling bias in the dataset constructing process, where annotators only judge and label very few samples selected by some basic retrievers. Therefore, in principle, we can formulate the false negative issue in training NRMs as learning from labeled datasets with pooling bias. To solve this problem, we propose a novel Coupled Estimation Technique (CET) that learns both a relevance model and a selection model simultaneously to correct the pooling bias for training NRMs. Empirical results on three retrieval benchmarks show that NRMs trained with our technique can achieve significant gains on ranking effectiveness against other baseline strategies.

Imitation Learning to Outperform Demonstrators by Directly Extrapolating Demonstrations

We consider the problem of imitation learning from suboptimal demonstrations that aims to learn a better policy than demonstrators. Previous methods usually learn a reward function to encode the underlying intention of the demonstrators and use standard reinforcement learning to learn a policy based on this reward function. Such methods can fail to control the distribution shift between demonstrations and the learned policy since the learned reward function may not generalize well on out-of-distribution samples and can mislead the agent to highly uncertain states, resulting in degenerated performance. To address this limitation, we propose a novel algorithm called Outperforming demonstrators by Directly Extrapolating Demonstrations(ODED). Instead of learning a reward function, ODED trains an ensemble of extrapolation networks that generate extrapolated demonstrations, i.e., demonstrations that may be induced by a good agent, based on provided demonstrations. With these extrapolated demonstrations, we can use an off-the-shelf imitation learning algorithm to learn a good policy. Guided by extrapolated demonstrations, the learned policy avoids visiting highly uncertain states and therefore controls the distribution shift. Empirically, we show that ODED outperforms suboptimal demonstrators and achieves better performance than state-of-the-art imitation learning algorithms on the MuJoCo and DeepMind Control Suite tasks.

Contrastive Cross-Domain Sequential Recommendation

Cross-Domain Sequential Recommendation (CDSR) aims to predict future interactions based on user's historical sequential interactions from multiple domains. Generally, a key challenge of CDSR is how to mine precise cross-domain user preference based on the intra-sequence and inter-sequence item interactions. Existing works first learn single-domain user preference only with intra-sequence item interactions, and then build a transferring module to obtain cross-domain user preference. However, such a pipeline and implicit solution can be severely limited by the bottleneck of the designed transferring module, and ignores to consider inter-sequence item relationships.

In this paper, we propose C2DSR to tackle the above problems to capture precise user preferences. The main idea is to simultaneously leverage the intra- and inter- sequence item relationships, and jointly learn the single- and cross- domain user preferences. Specifically, we first utilize a graph neural network to mine inter-sequence item collaborative relationship, and then exploit sequential attentive encoder to capture intra-sequence item sequential relationship. Based on them, we devise two different sequential training objectives to obtain user single-domain and cross-domain representations. Furthermore, we present a novel contrastive cross-domain infomax objective to enhance the correlation between single- and cross- domain user representations by maximizing their mutual information. Additionally, we point out a serious information leak issue in prior datasets. We correct this issue and release the corrected datasets. Extensive experiments demonstrate the effectiveness of our approach C2DSR.

User Recommendation in Social Metaverse with VR

Social metaverse with VR has been viewed as a paradigm shift for social media. However, most traditional VR social platforms ignore emerging characteristics in a metaverse, thereby failing to boost user satisfaction. In this paper, we explore a scenario of socializing in metaverse with VR, which brings major advantages over conventional social media: 1) leverage flexible display of users' 360-degree viewports to satisfy individual user interests, 2) ensure the user feelings of co-existence, 3) prevent view obstruction to help users find friends in crowds, and 4) support socializing with digital twins. Therefore, we formulate the Co-presence, and Occlusion-aware Metaverse User Recommendation (COMUR) problem to recommend a set of rendered players for users in social metaverse with VR. We prove COMUR is an NP-hard optimization problem and design a dual-module deep graph learning framework (COMURNet) to recommend appropriate users for viewport display. Experimental results on real social metaverse datasets and a user study with Occulus Quest 2 manifest that the proposed model outperforms baseline approaches by at least 36.7% of solution quality.

Learning to Generalize in Heterogeneous Federated Networks

With the rapid development of the Internet of Things (IoT), the need to expand the amount of data through data-sharing to improve the model performance of edge devices has become increasingly compelling. To effectively protect data privacy while leveraging data across silos, federated learning has emerged. However, in the real world applications, federated learning inevitably faeces both data and model heterogeneity challenges. To address the heterogeneity issues in federated networks, in this work, we seek to jointly learn a global feature representation that is robust across clients and potentially also generalizable to new clients. More specifically, we propose a personalized <u>Fed</u>erated optimization framework with <u>M</u>eta <u>C</u>ritic (FedMC) that efficiently captures robust and generalizable domain-invariant knowledge across clients. Extensive experiments on four public datasets show that the proposed FedMC outperforms the competing state-of-the-art methods in heterogeneous federated learning settings. We have also performed detailed ablation analysis on the importance of different components of the proposed model.

Learn Basic Skills and Reuse: Modularized Adaptive Neural Architecture Search (MANAS)

Human intelligence is able to first learn some basic skills for solving basic problems and then assemble such basic skills into complex skills for solving complex or new problems. For example, the basic skills "dig hole,'' "put tree,'' "backfill'' and "watering'' compose a complex skill "plant a tree''. Besides, some basic skills can be reused for solving other problems. For example, the basic skill "dig hole'' not only can be used for planting a tree, but also can be used for mining treasures, building a drain, or landfilling. The ability to learn basic skills and reuse them for various tasks is very important for humans because it helps to avoid learning too many skills for solving each individual task, and makes it possible to solve a compositional number of tasks by learning just a few number of basic skills, which saves a considerable amount of memory and computational power in the human brain. We believe that machine intelligence should also capture the ability of learning basic skills and reusing them by composing into complex skills. In computer science language, each basic skill is a "module'', which is a reusable network that has a concrete meaning and performs a concrete basic operation. The modules are assembled into a bigger "model'' for doing a more complex task. The assembling procedure is adaptive to the input or task, i.e., for a given task, the modules should be assembled into the most suitable model for solving the given task. As a result, different inputs/tasks could have different assembled models.

In this work, we take recommender system as an example and propose Modularized Adaptive Neural Architecture Search (MANAS) to demonstrate the above idea. Neural Architecture Search (NAS) has shown its power in discovering superior neural architectures. However, existing NAS mostly focus on searching for a global architecture regardless of the specific input, i.e., the architecture is not adaptive to the input. In this work, we borrow the idea from modularized neural logic reasoning and consider three basic logical operation modules: AND, OR, NOT. Meanwhile, making recommendations for each user is considered as a task. MANAS automatically assembles the logical operation modules into a network architecture tailored for the given user. As a result, a personalized neural architecture is assembled for each user to make recommendations for the user, which means that the resulting neural architecture is adaptive to the model's input (i.e., the user's past behaviors). Experiments on different datasets show that the adaptive architecture assembled by MANAS outperforms static global architectures. Further experiments and empirical analysis provide insights to the effectiveness of MANAS. The code is open-source at

Enhancing User Behavior Sequence Modeling by Generative Tasks for Session Search

Users' search tasks have become increasingly complicated, requiring multiple queries and interactions with the results. Recent studies have demonstrated that modeling the historical user behaviors in a session can help understand the current search intent. Existing context-aware ranking models primarily encode the current session sequence (from the first behavior to the current query) and compute the ranking score using the high-level representations. However, there is usually some noise in the current session sequence (useless behaviors for inferring the search intent) that may affect the quality of the encoded representations. To help the encoding of the current user behavior sequence, we propose to use a decoder and the information of future sequences and a supplemental query. Specifically, we design three generative tasks that can help the encoder to infer the actual search intent: (1) predicting future queries, (2) predicting future clicked documents, and (3) predicting a supplemental query. We jointly learn the ranking task with these generative tasks using an encoder-decoder structured approach. Extensive experiments on two public search logs demonstrate that our model outperforms all existing baselines, and the designed generative tasks can actually help the ranking task. Besides, additional experiments also show that our approach can be easily applied to various Transformer-based encoder-decoder models and improve their performance.

CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks

Knowledge-intensive language tasks (KILT) usually require a large body of information to provide correct answers. A popular paradigm to solve this problem is to combine a search system with a machine reader, where the former retrieves supporting evidences and the latter examines them to produce answers. Recently, the reader component has witnessed significant advances with the help of large-scale pre-trained generative models. Meanwhile most existing solutions in the search component rely on the traditional "index-retrieve-then-rank'' pipeline, which suffers from large memory footprint and difficulty in end-to-end optimization. Inspired by recent efforts in constructing model-based IR models, we propose to replace the traditional multi-step search pipeline with a novel single-step generative model, which can dramatically simplify the search process and be optimized in an end-to-end manner. We show that a strong generative retrieval model can be learned with a set of adequately designed pre-training tasks, and be adopted to improve a variety of downstream KILT tasks with further fine-tuning. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index. Empirical results show that CorpusBrain can significantly outperform strong baselines for the retrieval task on the KILT benchmark and establish new state-of-the-art downstream performances. We also show that CorpusBrain works well under zero- and low-resource settings.

Towards Self-supervised Learning on Graphs with Heterophily

Recently emerged heterophilous graph neural networks have significantly reduced the reliance on the assumption of graph homophily where linked nodes have similar features and labels. These methods focus on a supervised setting that relies on labeling information heavily and presents the limitations on general graph downstream tasks. In this work, we propose a self-supervised representation learning paradigm on graphs with heterophily (namely HGRL) for improving the generalizability of node representations, where node representations are optimized without any label guidance. Inspired by the designs of existing heterophilous graph neural networks, HGRL learns the node representations by preserving the node original features and capturing informative distant neighbors. Such two properties are obtained through carefully designed pretext tasks that are optimized based on estimated high-order mutual information. Theoretical analysis interprets the connections between HGRL and existing advanced graph neural network designs. Extensive experiments on different downstream tasks demonstrate the effectiveness of the proposed framework.

Time Lag Aware Sequential Recommendation

Although a variety of methods have been proposed for sequential recommendation, it is still far from being well solved partly due to two challenges. First, the existing methods often lack the simultaneous consideration of the global stability and local fluctuation of user preference, which might degrade the learning of a user's current preference. Second, the existing methods often use a scalar based weighting schema to fuse the long-term and short-term preferences, which is too coarse to learn an expressive embedding of current preference. To address the two challenges, we propose a novel model called Time Lag aware Sequential Recommendation (TLSRec), which integrates a hierarchical modeling of user preference and a time lag sensitive fine-grained fusion of the long-term and short-term preferences. TLSRec employs a hierarchical self-attention network to learn users' preference at both global and local time scales, and a neural time gate to adaptively regulate the contributions of the long-term and short-term preferences for the learning of a user's current preference at the aspect level and based on the lag between the current time and the time of the last behavior of a user. The extensive experiments conducted on real datasets verify the effectiveness of TLSRec.

GCF-RD: A Graph-based Contrastive Framework for Semi-Supervised Learning on Relational Databases

Relational databases are the main storage model of structured data in most businesses, which usually involves multiple tables with key-foreign-key relationships. In practice, data analysts often want to pose predictive classification queries over relational databases. To answer such queries, many existing approaches perform supervised learning to train classification models, which heavily rely on the availability of sufficient labeled data. In this paper, we propose a novel graph-based contrastive framework for semi-supervised learning on relational databases, achieving promising predictive classification performance with only a handful of labeled data. Our framework utilizes contrastive learning to exploit additional supervision signals from massive unlabeled data. Specifically, we develop two contrastive graph views that are 1) advantageous for modeling complex relationships and correlations among structured data in a relational database, and 2) complementary to each other for learning robust representations of structured data to be classified. We also leverage label information in contrastive learning to mitigate its negative effect in knowledge transfer on the supervised counterpart. We conduct extensive experiments on three real-world relational databases and the results demonstrate that our framework is able to achieve the state-of-the-art predictive performance in limited labeled data settings, compared with various supervised and semi-supervised learning approaches.

Task Publication Time Recommendation in Spatial Crowdsourcing

The increasing proliferation of networked and geo-positioned mobile devices brings about increased opportunities for Spatial Crowdsourcing (SC), which aims to enable effective location-based task assignment. We propose and study a novel SC framework, namely Task Assignment with Task Publication Time Recommendation. The framework consists of two phases, task publication time recommendation and task assignment. More specifically, the task publication time recommendation phase hybrids different learning models to recommend the suitable publication time for each task to ensure the timely task assignment and completion while reducing the waiting time of the task requester at the SC platform. We use a cross-graph neural network to learn the representations of task requesters by integrating the obtained representations from two semantic spaces and utilize the self-attention mechanism to learn the representations of task-publishing sequences from multiple perspectives. Then a fully connected layer is used to predict suitable task publication time based on the obtained representations. In the task assignment phase, we propose a greedy and a minimum cost maximum flow algorithm to achieve the efficient and the optimal task assignment, respectively. An extensive empirical study demonstrates the effectiveness and efficiency of our framework.

Efficient Second-Order Optimization for Neural Networks with Kernel Machines

Second-order optimization has been recently explored in neural network training. However, the recomputation of the Hessian matrix in the second-order optimization posts much extra computation and memory burden in the training. There have been some attempts to address this issue by approximation on the Hessian matrix, which unfortunately degrades the performance of the neural models. In order to tackle this issue, we propose Kernel Stochastic Gradient Descent (Kernel SGD) which solves the optimization problem in a space transformed by the Hessian matrix of the kernel machine. Kernel SGD eliminates the Hessian matrix recomputation in the training and requires a much smaller memory cost which can be controlled via the mini-batch size. We show that Kernel SGD optimization is theoretically guaranteed to converge. Our experimental results on tabular, image and text data confirm that Kernel SGD converges up to 30 times faster than the existing second-order optimization techniques, and achieves the highest test accuracy on all the tasks tested. Kernel SGD even outperforms the first-order optimization baselines in some problems tested in our experiments.

ReLAX: Reinforcement Learning Agent Explainer for Arbitrary Predictive Models

Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation methods either exploit the internals of specific models or depend on each sample's neighborhood, thus they are hard to generalize for complex models and inefficient for large datasets. This work aims to overcome these limitations and introduces ReLAX, a model-agnostic algorithm to generate optimal counterfactual explanations. Specifically, we formulate the problem of crafting CFs as a sequential decision-making task and then find the optimal CFs via deep reinforcement learning (DRL) with discrete-continuous hybrid action space. Extensive experiments conducted on several tabular datasets have shown that ReLAX outperforms existing CF generation baselines, as it produces sparser counterfactuals, is more scalable to complex target models to explain, and generalizes to both classification and regression tasks. Finally, to demonstrate the usefulness of our method in a real-world use case, we leverage CFs generated by ReLAX to suggest actions that a country should take to reduce the risk of mortality due to COVID-19. Interestingly enough, the actions recommended by our method correspond to the strategies that many countries have actually implemented to counter the COVID-19 pandemic.

Explainable Link Prediction in Knowledge Hypergraphs

Link prediction in knowledge hypergraphs has been recognized as a critical issue in various downstream tasks for knowledge-enabled applications, from question answering to recommender systems. However, most existing approaches are primarily performed in a black-box fashion, which learn low-dimensional embeddings for inference, thus cannot provide human-understandable interpretation. In this paper, we present HyperMLN, an n-ary, mixed, and explainable framework that interprets the path-reasoning process with first-order logic, which provides a knowledge-enhanced interpretable prediction framework, in which domain knowledge in the logic rules improves the performance of embedding models, while semantic information in the embedding space can optimize the weight of the logic rules in turn. To provide benchmark rule sets for explainable link prediction methods, three types of meta-logic rules in each popular dataset are mined for interpreting results. While achieving explainability, our framework also realizes an average improvement of 3.2% on Hits@1 compared to the state-of-the-art knowledge hypergraph embedding method. Our code is available at

SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval

Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although recent neural ranking models using pre-trained language models can address this problem, they usually require expensive query inference costs, implying the trade-off between effectiveness and efficiency. Tackling the trade-off, we propose a novel uni-encoder ranking model, Sparse retriever using a Dual document Encoder (SpaDE), learning document representation via the dual encoder. Each encoder plays a central role in (i) adjusting the importance of terms to improve lexical matching and (ii) expanding additional terms to support semantic matching. Furthermore, our co-training strategy trains the dual encoder effectively and avoids unnecessary intervention in training each other. Experimental results on several benchmarks show that SpaDE outperforms existing uni-encoder ranking models.

Finding Heterophilic Neighbors via Confidence-based Subgraph Matching for Semi-supervised Node Classification

Graph Neural Networks (GNNs) have proven to be powerful in many graph-based applications. However, they fail to generalize well under heterophilic setups, where neighbor nodes have different labels. To address this challenge, we employ a confidence ratio as a hyper-parameter, assuming that some of the edges are disassortative (heterophilic). Here, we propose a two-phased algorithm. Firstly, we determine edge coefficients through subgraph matching using a supplementary module. Then, we apply GNNs with a modified label propagation mechanism to utilize the edge coefficients effectively. Specifically, our supplementary module identifies a certain proportion of task-irrelevant edges based on a given confidence ratio. Using the remaining edges, we employ the widely used optimal transport to measure the similarity between two nodes with their subgraphs. Finally, using the coefficients as supplementary information on GNNs, we improve the label propagation mechanism which can prevent two nodes with smaller weights from being closer. The experiments on benchmark datasets show that our model alleviates over-smoothing and improves performance.

Review-Based Domain Disentanglement without Duplicate Users or Contexts for Cross-Domain Recommendation

A cross-domain recommendation has shown promising results in solving data-sparsity and cold-start problems. Despite such progress, existing methods focus on domain-shareable information (overlapped users or same contexts) for a knowledge transfer, and they fail to generalize well without such requirements. To deal with these problems, we suggest utilizing review texts that are general to most e-commerce systems. Our model (named SER) uses three text analysis modules, guided by a single domain discriminator for disentangled representation learning. Here, we suggest a novel optimization strategy that can enhance the quality of domain disentanglement, and also debilitates detrimental information of a source domain. Also, we extend the encoding network from a single to multiple domains, which has proven to be powerful for review-based recommender systems. Extensive experiments and ablation studies demonstrate that our method is efficient, robust, and scalable compared to the state-of-the-art single and cross-domain recommendation methods.

An Empirical Study on How People Perceive AI-generated Music

Music creation is difficult because one must express one's creativity while following strict rules. The advancement of deep learning technologies has diversified the methods to automate complex processes and express creativity in music composition. However, prior research has not paid much attention to exploring the audiences' subjective satisfaction to improve music generation models. In this paper, we evaluate human satisfaction with the state-of-the-art automatic symbolic music generation models using deep learning. In doing so, we define a taxonomy for music generation models and suggest nine subjective evaluation metrics. Through an evaluation study, we obtained more than 700 evaluations from 100 participants, using the suggested metrics. Our evaluation study reveals that the token representation method and models' characteristics affect subjective satisfaction. Through our qualitative analysis, we deepen our understanding of AI-generated music and suggested evaluation metrics. Lastly, we present lessons learned and discuss future research directions of deep learning models for music creation.

AutoXAI: A Framework to Automatically Select the Most Adapted XAI Solution

A large number of XAI (eXplainable Artificial Intelligence) solutions have been proposed in recent years. Recently, thanks to new XAI evaluation metrics, it has become possible to compare these XAI solutions. However, selecting the most relevant XAI solution among all this diversity is still a tedious task, especially if a user has specific needs and constraints. In this paper, we propose AutoXAI, a framework that recommends the best XAI solution and its hyperparameters according to specified XAI evaluation metrics while considering the user's context (dataset, machine learning model, XAI needs and constraints). It adapts approaches from context-aware recommender systems on one side and strategies of optimization and evaluation from AutoML (Automated Machine Learning) on the other. Through two use cases, we show that AutoXAI recommends XAI solutions adapted to the user's needs with the best hyperparameters matching the user's constraints.

Meta-Path-based Fake News Detection Leveraging Multi-level Social Context Information

Fake news, false or misleading information presented as news, has a significant impact on many aspects of society, such as in politics or healthcare domains. Due to the deceiving nature of fake news, applying Natural Language Processing (NLP) techniques to the news content alone is insufficient. Therefore, more information is required to improve fake news detection, such as the multi-level social context (news publishers and engaged users in social media) information and the temporal information of user engagement. The proper usage of this information, however, introduces three chronic difficulties: 1) multi-level social context information is hard to be used without information loss, 2) temporal information of user engagement is hard to be used along with multi-level social context information, and 3) news representation with multi-level social context and temporal information is hard to be learned in an end-to-end manner. To overcome all three difficulties, we propose a novel fake news detection framework, Hetero-SCAN. We use Meta-Path, a composite relation connecting two node types, to extract meaningful multi-level social context information without loss. We then propose Meta-Path instance encoding and aggregation methods to capture the temporal information of user engagement and learn news representation end-to-end. According to our experiment, Hetero-SCAN yields significant performance improvement over state-of-the-art fake news detection methods.

Inductive Knowledge Graph Reasoning for Multi-batch Emerging Entities

Over the years, reasoning over knowledge graphs (KGs), which aims to infer new conclusions from known facts, has mostly focused on static KGs. The unceasing growth of knowledge in real life raises the necessity to enable the inductive reasoning ability on expanding KGs. Existing inductive work assumes that new entities all emerge once in a batch, which oversimplifies the real scenario that new entities continually appear. This study dives into a more realistic and challenging setting where new entities emerge in multiple batches. We propose a walk-based inductive reasoning model to tackle the new setting. Specifically, a graph convolutional network with adaptive relation aggregation is designed to encode and update entities using their neighboring relations. To capture the varying neighbor importance, we employ a query-aware feedback attention mechanism during the aggregation. Furthermore, to alleviate the sparse link problem of new entities, we propose a link augmentation strategy to add trustworthy facts into KGs. We construct three new datasets for simulating this multi-batch emergence scenario. The experimental results show that our proposed model outperforms state-of-the-art embedding-based, walk-based and rule-based models on inductive KG reasoning.

Scaling Up Maximal k-plex Enumeration

Finding all maximal k-plexes on networks is a fundamental research problem in graph analysis due to many important applications, such as community detection, biological graph analysis, and so on. A k-plex is a subgraph in which every vertex is adjacent to all but at most k vertices within the subgraph. In this paper, we study the problem of enumerating all large maximal k-plexes of a graph and develop several new and efficient techniques to solve the problem. Specifically, we first propose several novel upper-bounding techniques to prune unnecessary computations during the enumeration procedure. We show that the proposed upper bounds can be computed in linear time. Then, we develop a new branch-and-bound algorithm with a carefully-designed pivot re-selection strategy to enumerate all k-plexes, which outputs all k-plexes in O(n2?k n) time theoretically, where n is the number of vertices of the graph and ? k is strictly smaller than 2. In addition, a parallel version of the proposed algorithm is further developed to scale up to process large real-world graphs. Finally, extensive experimental results show that the proposed sequential algorithm can achieve up to 2× to 100× speedup over the state-of-the-art sequential algorithms on most benchmark graphs. The results also demonstrate the high scalability of the proposed parallel algorithm. For example, on a large real-world graph with more than 200 million edges, our parallel algorithm can finish the computation within two minutes, while the state-of-the-art parallel algorithm cannot terminate within 24 hours.

When Should We Use Linear Explanations?

The increasing interest in transparent and fair AI systems has propelled the research in explainable AI (XAI). One of the main research lines in XAI is post-hoc explainability, the task of explaining the logic of an already deployed black-box model. This is usually achieved by learning an interpretable surrogate function that approximates the black box. Among the existing explanation paradigms, local linear explanations are one of the most popular due to their simplicity and fidelity. Despite their advantages, linear surrogates may not always be the most adapted method to produce reliable, i.e., unambiguous and faithful explanations. Hence, this paper introduces Adapted Post-hoc Explanations (APE), a novel method that characterizes the decision boundary of a black-box classifier and identifies when a linear model constitutes a reliable explanation. Besides, characterizing the black-box frontier allows us to provide complementary counterfactual explanations. Our experimental evaluation shows that APE identifies accurately the situations where linear surrogates are suitable while also providing meaningful counterfactual explanations.

Efficient Trajectory Similarity Computation with Contrastive Learning

The ubiquity of mobile devices and the accompanying deployment of sensing technologies have resulted in a massive amount of trajectory data. One important fundamental task is trajectory similarity computation, which is to determine how similar two trajectories are. To enable effective and efficient trajectory similarity computation, we propose a novel robust model, namely <u>C</u>ontrastive <u>L</u>earning based <u>T</u>rajectory <u>Sim</u>ilarity Computation (CL-TSim). Specifically, we employ a contrastive learning mechanism to learn the latent representations of trajectories and then calculate the dissimilarity between trajectories based on these representations. Compared with sequential auto-encoders that are the mainstream deep learning architectures for trajectory similarity computation, CL-TSim does not require a decoder and step-by-step reconstruction, thus improving the training efficiency significantly. Moreover, considering the non-uniform sampling rate and noisy points in trajectories, we adopt two type of augmentations, i.e., point dowm-sampling and point distorting, to enhance the robustness of the proposed model. Extensive experiments are conducted on two widely-used real-world datasets, i.e., Porto and ChengDu, which demonstrate the superior effectiveness and efficiency of the proposed model.

Weakly-Supervised Online Hashing with Refined Pseudo Tags

With the rapid development of social media, various types of tags uploaded by social users are attached to the images. Compared to clean labels marked by experts, although user-provided tags are imperfect, e.g., wrong tags, reduplicative tags, or missing tags, they are more diverse, fine-grained, and informative. Currently, there exist several weakly-supervised hashing methods attempting to learn hash codes using tags as supervision. Although they could benefiting from the rich information contained in tags, most of them may defy the nature of social media data. In real scenarios, social media data appears in streaming fashion, but most weakly-supervised hashing methods are just batch-based which cannot effectively handle streaming data. To this end, only one weakly-supervised online hashing method has been proposed, but it is still far from enough to alleviate the negative effects of tags.

In this paper, to address the above problems, we propose a new method, termed Weakly-Supervised Online Hashing with Refined Pseudo Tags (RPT-WOH). To improve the quality of weakly-supervised tags, we design the real-valued pseudo tag matrix and learn it by exploiting the correlation between the previous and new tags. Furthermore, we propose a memory-based similarity learning which could effectively maintain the semantic correlation between old and new data. In addition, we propose an effective and efficient discrete online optimization algorithm making RPT-WOH easily scalable to large-scale data. Extensive experiments conducted on two benchmark datasets demonstrate that RPT-WOH offers satisfactory performance.

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics.

Contrastive Learning with Bidirectional Transformers for Sequential Recommendation

Contrastive learning with Transformer-based sequence encoder has gained predominance for sequential recommendation. It maximizes the agreements between paired sequence augmentations that share similar semantics. However, existing contrastive learning approaches in sequential recommendation mainly center upon left-to-right unidirectional Transformers as base encoders, which are suboptimal for sequential recommendation because user behaviors may not be a rigid left-to-right sequence. To tackle that, we propose a novel framework named Contrastive learning with Bidirectional Transformers for sequential recommendation (CBiT). Specifically, we first apply the slide window technique for long user sequences in bidirectional Transformers, which allows for a more fine-grained division of user sequences. Then we combine the cloze task mask and the dropout mask to generate high-quality positive samples and perform multi-pair contrastive learning, which demonstrates better performance and adaptability compared with the normal one-pair contrastive learning. Moreover, we introduce a novel dynamic loss reweighting strategy to balance between the cloze task loss and the contrastive loss. Experiment results on three public benchmark datasets show that our model outperforms state-of-the-art models for sequential recommendation. Our code is available at this link:

Optimal Action Space Search: An Effective Deep Reinforcement Learning Method for Algorithmic Trading

Algorithmic trading is a crucial yet challenging task in the financial domain, where trading decisions are made sequentially from milliseconds to days based on the historical price movements and trading frequency. To model such a sequential decision making process in the dynamic financial markets, Deep Reinforcement Learning (DRL) based methods have been applied and demonstrated their success in finding trading strategies that achieve profitable returns. However, the financial markets are complex imperfect information games with high-level of noise and uncertainties which usually make the exploration policy of DRL less effective. In this paper, we propose an end-to-end DRL method that explores solutions on the whole graph via a probabilistic dynamic programming algorithm. Specifically, we separate the state into environment state and position state, and model the position state transition as a directed acyclic graph. To obtain reliable gradients for model training, we adopt a probabilistic dynamic programming algorithm to explore solutions over the whole graph instead of sampling a path. By avoiding the sampling procedure, we propose an efficient training algorithm and overcome the efficiency problem in most existing DRL methods. Furthermore, our method is compatible with most recurrent neural network architecture, which makes our method easy to implement and very effective in practice. Extensive experiments have been conducted on two real-world stock datasets. Experimental results demonstrate that our method can generate stable trading strategies for both high-frequency and low-frequency trading, significantly outperforming the baseline DRL methods on annualized return and Sharpe ratio.

Inferring Sensitive Attributes from Model Explanations

Model explanations provide transparency into a trained machine learning model's blackbox behavior to a model builder. They indicate the influence of different input attributes to its corresponding model prediction. The dependency of explanations on input raises privacy concerns for sensitive user data. However, current literature has limited discussion on privacy risks of model explanations.

We focus on the specific privacy risk of attribute inference attack wherein an adversary infers sensitive attributes of an input (e.g., Race and Sex) given its model explanations. We design the first attribute inference attack against model explanations in two threat models where model builder either (a) includes the sensitive attributes in training data and input or (b) censors the sensitive attributes by not including them in the training data and input.

We evaluate our proposed attack on four benchmark datasets and four state-of-the-art algorithms. We show that an adversary can successfully infer the value of sensitive attributes from explanations in both the threat models accurately. Moreover, the attack is successful even by exploiting only the explanations corresponding to sensitive attributes. These suggest that our attack is effective against explanations and poses a practical threat to data privacy.

On combining the model predictions (an attack surface exploited by prior attacks) with explanations, we note that the attack success does not improve. Additionally, the attack success on exploiting model explanations is better compared to exploiting only model predictions. These suggest that model explanations are a strong attack surface to exploit for an adversary.

Higher-order Clustering and Pooling for Graph Neural Networks

Graph Neural Networks achieve state-of-the-art performance on a plethora of graph classification tasks, especially due to pooling operators, which aggregate learned node embeddings hierarchically into a final graph representation. However, they are not only questioned by recent work showing on par performance with random pooling, but also ignore completely higher-order connectivity patterns. To tackle this issue, we propose HoscPool, a clustering-based graph pooling operator that captures higher-order information hierarchically, leading to richer graph representations. In fact, we learn a probabilistic cluster assignment matrix end-to-end by minimising relaxed formulations of motif spectral clustering in our objective function, and we then extend it to a pooling operator. We evaluate HoscPool on graph classification tasks and its clustering component on graphs with ground-truth community structure, achieving best performance. Lastly, we provide a deep empirical analysis of pooling operators' inner functioning.

Federated K-Private Set Intersection

Private set intersection (PSI) is a popular protocol that allows multiple parties to evaluate the intersection of their sets without revealing them to each other. PSI has numerous practical applications, including privacy preserving data mining and location-based services. In this work, we develop a new approach for the PSI problem within the federated analytics framework. In particular, we consider a setting where a server wants to determine (query) which among its local set of data identifiers appears coupled with the same value in at least K of the N parties. Applications for this framework include but are not limited to: double-filing insurance verification, credit scoring and password checkup on an institutional level. To address the proposed setting, we propose a new protocol Fed-K-PSI that allows the server to answer this query while being oblivious to the data of identifiers that do not satisfy the distributed query at the parties. In addition, Fed-K-PSI also maintains the anonymity of the parties by hiding which K parties satisfied the query, or which value associated with the identifier which caused the query to be successful. Our proposed setting does not lend itself directly to state-of-the-art approaches in PSI based on Oblivious Transfer, since the server does not have a complete representation of a datapoint (only the identifier, but no value). Our proposed approach tackles this problem by constructing a distributed function at the parties, which encodes the datapoints and returns a deterministic known property if and only if the value for a given identifier is the same in at least K of the N parties. We show that Fed-K-PSI achieves a strong information-theoretic privacy guarantee and is resilient to collusion scenarios among honest-but-curious parties. We also evaluate Fed-K-PSI via extensive experiments to study the effect of the different system parameters.

Detecting Significant Differences Between Information Retrieval Systems via Generalized Linear Models

Being able to compare Information Retrieval(IR) systems correctly is pivotal to improving their quality. Among the most popular tools for statistical significance testing, we list t-test and ANOVA that belong to the linear models family. Therefore, given the relevance of linear models for IR evaluation, a great effort has been devoted to studying how to improve them to better compare IR systems.

Linear models rely on assumptions that IR experimental observations rarely meet, e.g. about the normality of the data or the linearity itself. Even though linear models are, in general, resilient to violations of their assumptions, departing from them might reduce the effectiveness of the tests. Hence, we investigate the use of the Generalized Linear Models (GLM) framework, a generalization of the traditional linear modelling that relaxes assumptions about the distribution and the shape of the models. To the best of our knowledge, there has been little or no investigation on the use of GLMs for comparing IR system performance. We discuss how GLM work and how they can be applied in the context of IR evaluation. In particular, we focus on the link function used to build GLMs, which allows for the model to have non-linear shapes.

We conduct thorough experimentation using two TREC collections and several evaluation measures. Overall, we show how the log and logit links are able to identify more and more consistent significant differences (up to 25% more with 50 topics) than the identity link used today and with a comparable, or slightly better, risk of publication bias.

Risk-Aware Bid Optimization for Online Display Advertisement

This research focuses on the bid optimization problem in the real-time bidding setting for online display advertisements, where an advertiser, or the advertiser's agent, has access to the features of the website visitor and the type of ad slots, to decide the optimal bid prices given a predetermined total advertisement budget. We propose a risk-aware data-driven bid optimization model that maximizes the expected profit for the advertiser by exploiting historical data to design upfront a bidding policy, mapping the type of advertisement opportunity to a bid price, and accounting for the risk of violating the budget constraint during a given period of time. After employing a Lagrangian relaxation, we derive a parametrized closed-form expression for the optimal bidding strategy. Using a real-world dataset, we demonstrate that our risk-averse method can effectively control the risk of overspending the budget while achieving a competitive level of profit compared with the risk-neutral model and a state-of-the-art data-driven risk-aware bidding approach.

Smart Contract Scams Detection with Topological Data Analysis on Account Interaction

The skyrocketing market value of cryptocurrencies has prompted more investors to pour funds into cryptocurrencies to seek asset hedging. However, the anonymity of blockchain makes cryptocurrency naturally a tool of choice for criminals to commit smart contract scams. Consequently, smart contract scam detection is particularly critical for investors to avoid economic loss. Previous methods mainly leverage specific code logic of smart contracts and/or design rules based on abnormal transaction behaviors for scam detection. Although these methods gain success at detecting particular scams, they perform worse when applied to scams with highly similar codes. Besides, well-designed decision rules rely on expert knowledge and tedious data collection steps, which causes poor flexibility. To combat these challenges, we consider the problem of smart contract scam detection via mining topological features of account interaction information that dynamically evolves. We adopt interactive features extracted from dynamic interaction information of accounts and propose a framework named TTG-SCSD to utilize the features and Topological Data Analysis for smart contract scams detection. The TTG-SCSD constructs discrete dynamic interaction graphs for each contract and designs interactive features that characterize account behaviors. The features are modeled combined with a topology quantification mechanism to capture contract intentions in transactions. Experimental results on real-world transaction datasets from Ethereum show that TTG-SCSD obtains better generalizability and improves the performance of the bare versions of the comparison methods.

MonitorLight: Reinforcement Learning-based Traffic Signal Control Using Mixed Pressure Monitoring

Although Reinforcement Learning (RL) has achieved significant success in the Traffic Signal Control (TSC), most of them focus on the design of RL elements while the impact of the phase duration is neglected. Due to the lack of exploring dynamic phase duration, the overall performance and convergence rate of RL-based TSC approaches cannot be guaranteed, which may result in poor adaptability of RL methods to different traffic conditions. To address these issues, in this paper, we formulate a novel phase-duration-aware TSC (PDA-TSC) problem and propose an effective RL-based TSC approach, named MonitorLight. Our approach adopts a new traffic indicator, mixed pressure, which enables RL agents to simultaneously analyze the impacts of stationary and moving vehicles on intersections. Based on the observed mixed pressure of intersections, RL agents can autonomously determine whether or not to change the current signals in real-time. In addition, MonitorLight can adjust the control method for scenarios with different real-time requirements and achieve excellent results in different situations. Extensive experiments on both real-world and synthetic datasets demonstrate that MonitorLight outperforms the current state-of-the-art IPDALight by up to 2.84% and 5.71% in average vehicle travel time, respectively. Moreover, our method significantly speeds up the convergence, leading IPDALight by 36.87% and 34.58% in the start to converge episode and jumpstart performance, respectively.

Few-Shot Relational Triple Extraction with Perspective Transfer Network

Few-shot Relational Triple Extraction (RTE) aims at detecting emerging relation types along with their entity pairs from unstructured text with the support of a few labeled samples. Prior arts use conditional random field or nearest-neighbor matching strategy to extract entities and use prototypical networks for extracting relations from sentences. Nevertheless, they fail to utilize the triple-level information to verify the plausibility of extracted relational triples, and ignore the proper transfer among the perspectives of entity, relation and triple. To fill in these gaps, in this work, we put forward a novel perspective transfer network (PTN) to address few-shot RTE. Specifically, PTN starts from the relation perspective by checking the existence of a given relation. Then, it transfers to the entity perspective to locate entity spans with relation-specific support sets. Next, it transfers to the triple perspective to validate the plausibility of extracted relational triples. Finally, it transfers back to the relation perspective to check the next relation, and repeats the aforementioned procedure. By transferring among the perspectives of relation, entity, and triple, PTN not only validates the extracted elements at both local and global levels, but also effectively handles more realistic and difficult few-shot RTE scenarios such as multiple triple extraction and nonexistence of triples. Extensive experimental results on existing dataset and new datasets demonstrate that our approach can significantly improve performance over the state-of-the-arts.

Aries: Accurate Metric-based Representation Learning for Fast Top-k Trajectory Similarity Query

With the prevalence of location-based services (LBS), trajectories are being generated rapidly. As is widely used in LBS, top-k trajectory similarity query serves as a key operation, deeply empowering applications such as travel route recommendation and carpooling. Given the rise of deep learning, trajectory representation has been well-proven to speed up this operator. However, existing representation-based computing modes remain two major problems understudied: the low quality of trajectory representation and insufficient support for various trajectory similarity metrics, which make them difficult to apply in practice. Therefore, we propose an Accurate metric-based representation learning approach for fast top-k trajectory similarity query, named Aries. Specifically, Aries has two sophisticated modules: (1) An novel trajectory embedding strategy enhanced by the bidirectional LSTM encoder and spatial attention mechanism, which can extract more precise and comprehensive knowledge. (2) A deep metric learning network aggregating multiple measures for better top-k query. Extensive experiments conducted on real trajectory dataset show that Aries achieves both impressive accuracy and lower training time compared with state-of-the-art solutions. In particular, it achieves 5x-10x speedup and 10%-20% accuracy improvement over Euclidean, Hausdorff, DTW, and EDR measures. Besides, our method can maintain stable performance when handling various scenarios, without repeated training in order to adapt to diverse similarity metrics.

MGMAE: Molecular Representation Learning by Reconstructing Heterogeneous Graphs with A High Mask Ratio

Masked autoencoder (MAE), as an effective self-supervised learner for computer vision and natural language processing, has been recently applied to molecule representation learning. In this paper, we identify two issues in applying MAE to pre-train Transformer-based models on molecular graphs that existing works have ignored. (1) As only atoms are abstracted as tokens and then reconstructed, the chemical bonds are not decided in the decoded molecule, making molecules with different arrangements of the same atoms indistinguishable. (2) Although a high mask ratio that corresponds to a challenging reconstruction task has been proved beneficial in the vision domain, it cannot be trivially leveraged on molecular graphs as there is less redundancy of information in graph data. To resolve these issues, we propose a novel framework, Molecular Graph Mask AutoEncoder (MGMAE). As the first step in MGMAE, we transform each molecular graph into a heterogeneous atom-bond graph to fully use the bond attributes and design unidirectional position encoding for such graphs. Then we propose a hybrid masking mechanism that exploits the complementary nature between atoms' attributive and spatial features. Meanwhile, we compensate for the mask embedding by a dynamic aggregation representation that exploits the correlations between topologically adjacent tokens. As a result, MGMAE can reconstruct the masked atoms, the masked bonds, and the relative distance among atoms simultaneously, with a high mask ratio. We compare MGMAE with the state-of-the-art methods on various molecular benchmarks and show the competitiveness of MGMAE in both regression and classification tasks.

GraTO: Graph Neural Network Framework Tackling Over-smoothing with Neural Architecture Search

Current Graph Neural Networks (GNNs) suffer from the over-smoothing problem, which results in indistinguishable node representations and low model performance with more GNN layers. Many methods have been put forward to tackle this problem in recent years. However, existing tackling over-smoothing methods emphasize model performance and neglect the over-smoothness of node representations. Additional, different approaches are applied one at a time, while there lacks an overall framework to jointly leverage multiple solutions to the over-smoothing challenge. To solve these problems, we propose GraTO, a framework based on neural architecture search to automatically search for GNNs architecture. GraTO adopts a novel loss function to facilitate striking a balance between model performance and representation smoothness. In addition to existing methods, our search space also includes DropAttribute, a novel scheme for alleviating the over-smoothing challenge, to fully leverage diverse solutions. We conduct extensive experiments on six real-world datasets to evaluate GraTo, which demonstrates that GraTo outperforms baselines in the over-smoothing metrics and achieves competitive performance in accuracy. GraTO is especially effective and robust with increasing numbers of GNN layers. Further experiments bear out the quality of node representations learned with GraTO and the effectiveness of model architecture. We make the code of GraTo available at Github (

DP-HORUS: Differentially Private Hierarchical Count Histograms under Untrusted Server

Hierarchical count histograms is the task of publishing count statistics at different granularity as per hierarchy defined on a dimension table in a data warehouse, which has wide applications in On-line Analytical Processing (OLAP) scenarios. In this paper, we systematically investigate this task subjected to the rigorous privacy-preserving constraint under the untrusted server setting. Our study first reveals that the straightforward baseline approach of the local differential privacy fails to achieve a satisfactory privacy and utility tradeoff. We are thus motivated to propose DP-HORUS, a novel crypto-assisted Differentially Private framework for Hierarchical cOunt histogRams under Untrusted Server. DP-HORUS consists of a series of novel designs, including 1) Encrypted Hierarchical Tree (EHT) structure, which maintains the concept hierarchy in the input data; 2) Random Matrix (RM), which reduces communication and computational cost; 3) To further boosted the utility, we propose DP-HORUS+ encompassing two additional modules of Histograms Structure (HS) and Hierarchical Consistency (HC), which are respectively introduced to reduce the noise caused by data sparsity and to ensure the hierarchy consistency. We provide both theoretical analysis and extensive empirical study on both real-world and synthetic datasets, which demonstrates the superior utility of the proposed methods over the state-of-the-art solutions while ensuring strict privacy guarantee.

KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems

The progress of recommender systems is hampered mainly by evaluation as it requires real-time interactions between humans and systems, which is too laborious and expensive. This issue is usually approached by utilizing the interaction history to conduct offline evaluation. However, existing datasets of user-item interactions are partially observed, leaving it unclear how and to what extent the missing interactions will influence the evaluation. To answer this question, we collect a fully-observed dataset from Kuaishou's online environment, where almost all 1,411 users have been exposed to all 3,327 items. To the best of our knowledge, this is the first real-world fully-observed data with millions of user-item interactions.

With this unique dataset, we conduct a preliminary analysis of how the two factors - data density and exposure bias - affect the evaluation results of multi-round conversational recommendation. Our main discoveries are that the performance ranking of different methods varies with the two factors, and this effect can only be alleviated in certain cases by estimating missing interactions for user simulation. This demonstrates the necessity of the fully-observed dataset. We release the dataset and the pipeline implementation for evaluation at

Consistent, Balanced, and Overlapping Label Trees for Extreme Multi-label Learning

The emerging eXtreme Multi-label Learning (XML) aims to induce multi-label predictive models from big datasets with extremely large numbers of instances, features, and especially labels. To meet the great efficiency challenge of XML, one flexible solution is the methodology of label tree, which, as its name suggests, is technically defined as a tree hierarchy of label subsets, partitioning the original large-scale XML problem into a number of small-scale sub-problems (i.e., denoted by leaf nodes) and then reducing the complexity to logarithmic time. Notably, the expected label trees should accurately find the right leaf nodes for future instances (i.e., effectiveness) and generate balanced leaf nodes (i.e., efficiency). To achieve this, we propose a novel generic method of label tree, namely Consistent, Balanced, and Overlapping Label Tree (CBOLT). To enhance the precision, we employ the weighted clustering to partition non-leaf nodes and allow overlapping label subsets, enabling to alleviate the inconsistent path and disjoint label subset issues. To improve the efficiency, we propose a new concept of a balanced problem scale and implement it with a balanced regularization for non-leaf nodes partition. We conduct extensive experiments on several benchmark XML datasets. Empirical results demonstrate that CBOLT is superior to the existing methods of label trees, and it can be applied to existing XML methods and achieve competitive performance with strong baselines.

PromptORE - A Novel Approach Towards Fully Unsupervised Relation Extraction

Unsupervised Relation Extraction (RE) aims to identify relations between entities in text, without having access to labeled data during training. This setting is particularly relevant for domain specific RE where no annotated dataset is available and for open-domain RE where the types of relations are a priori unknown. Although recent approaches achieve promising results, they heavily depend on hyperparameters whose tuning would most often require labeled data. To mitigate the reliance on hyperparameters, we propose PromptORE, a "Prompt-based Open Relation Extraction" model. We adapt the novel prompt-tuning paradigm to work in an unsupervised setting, and use it to embed sentences expressing a relation. We then cluster these embeddings to discover candidate relations, and we experiment different strategies to automatically estimate an adequate number of clusters. To the best of our knowledge, PromptORE is the first unsupervised RE model that does not need hyperparameter tuning. Results on three general and specific domain datasets show that PromptORE consistently outperforms state-of-the-art models with a relative gain of more than 40% in B3, V-measure and ARI. Qualitative analysis also indicates PromptORE's ability to identify semantically coherent clusters that are very close to true relations.

Modeling Dynamic Heterogeneous Graph and Node Importance for Future Citation Prediction

Accurate citation count prediction of newly published papers could help editors and readers rapidly figure out the influential papers in the future. Though many approaches are proposed to predict a paper's future citation, most ignore the dynamic heterogeneous graph structure or node importance in academic networks. To cope with this problem, we propose a Dynamic heterogeneous Graph and Node Importance network (DGNI) learning framework, which fully leverages the dynamic heterogeneous graph and node importance information to predict future citation trends of newly published papers. First, a dynamic heterogeneous network embedding module is provided to capture the dynamic evolutionary trends of the whole academic network. Then, a node importance embedding module is proposed to capture the global consistency relationship to figure out each paper's node importance. Finally, the dynamic evolutionary trend embeddings and node importance embeddings calculated above are combined to jointly predict the future citation counts of each paper, by a log-normal distribution model according to multi-faced paper node representations. Extensive experiments on two large-scale datasets demonstrate that our model significantly improves all indicators compared to the SOTA models.

Robust Recurrent Classifier Chains for Multi-Label Learning with Missing Labels

Recurrent Classifier Chains (RCCs) are a leading approach for multi-label classification as they directly model the interdependencies between classes. Unfortunately, existing RCCs assume that every training instance is completely labeled with all its ground truth classes. In practice often only a subset of an instance's labels are annotated, while the annotations for other classes aremissing. RCCs fail in this missing label scenario, predicting many false negatives and potentially missing important classes. In this work, we propose Robust-RCC, the first strategy for tackling this open problem of RCCs failing formulti-label missing-label data. Robust-RCC is a new type of deep recurrent classifier chain empowered to model inter-class relationships essential for predicting thecomplete label set most likely to match the ground truth. The key to Robust-RCC is the design of the Multi Incomplete Label Risk (MILR) function, which we prove to be equal in expectation to the true risk of the ground truth full label set despite being computed from incompletely labeled data. Our experimental study demonstrates that Robust-RCC consistently beats six state-of-of-the-art methods by as much as 30% in predicting the true labels.

Spatio-temporal Trajectory Learning using Simulation Systems

Spatio-temporal trajectories are essential factors for systems used in public transport, social ecology, and many other disciplines where movement is a relevant dynamic process. Each trajectory describes multiple state changes over time, induced by individual decision-making, based on psychological and social factors with physical constraints. Since a crucial factor of such systems is to reason about the potential trajectories in a closed environment, the primary problem is the realistic replication of individual decision making. Mental factors are often uncertain, not available or cannot be observed in reality. Thus, models for data generation must be derived from abstract studies using probabilities. To solve these problems, we present Multi-Agent-Trajectory-Learning (MATL), a state transition model to learn and generate human-like Spatio-temporal trajectory data. MATL combines Generative Adversarial Imitation Learning (GAIL) with a simulation system that uses constraints given by an agent-based model (Aℬℳ). We use GAIL to learn policies in conjunction with the Aℬℳ, resulting in a novel concept of individual decision making. Experiments with standard trajectory predictions show that our approach produces similar results to real-world observations.

Gromov-Wasserstein Multi-modal Alignment and Clustering

Multi-modal clustering aims at finding a clustering structure shared by the data of different modalities in an unsupervised way. Currently, solving this problem often relies on two assumptions: i) the multi-modal data own the same latent distribution, and ii) the observed multi-modal data are well-aligned and without any missing modalities. Unfortunately, these two assumptions are often questionable in practice and thus limit the feasibility of many multi-modal clustering methods. In this work, we develop a new multi-modal clustering method based on the Gromovization of optimal transport distance, which relaxes the dependence on the above two assumptions. In particular, given the data of different modalities, whose correspondence is unknown, our method learns the Gromov-Wasserstein (GW) barycenter of their kernel matrices. Driven by the modularity maximization principle, the GW barycenter helps to explore the clustering structure shared by different modalities. Moreover, the GW barycenter is associated with the GW distances between the different modalities to the clusters, and the optimal transport plans corresponding to the GW distances help to achieve the alignment and the clustering of the multi-modal data jointly. Experimental results show that our method outperforms state-of-the-art multi-modal clustering methods, especially when the data are (partially or completely) unaligned. The code is available at

ITSM-GCN: Informative Training Sample Mining for Graph Convolutional Network-based Collaborative Filtering

Recently, graph convolutional network (GCN) has become one of the most popular and state-of-the-art collaborative filtering (CF) methods. Existing GCN-based CF studies have made many meaningful and excellent efforts at loss function design and embedding propagation improvement. Despite their successes, we argue that existing methods have not yet properly explored more effective sampling strategy, including both positive sampling and negative sampling. To tackle this limitation, a novel framework named ITSM-GCN is proposed to carry out our designed Informative Training Sample Mining (ITSM) sampling strategy for the learning of GCN-based CF models. Specifically, we first adopt and improve the dynamic negative sampling (DNS) strategy, which achieves considerable improvements in both training efficiency and recommendation performance. More importantly, we design two potentially positive training sample mining strategies, namely a similarity-based sampler and score-based sampler, to further enhance GCN-based CF. Extensive experiments show that ITSM-GCN significantly outperforms state-of-the-art GCN-based CF models, including LightGCN, SGL-ED and SimpleX. For example, ITSM-GCN improves on SimpleX by 12.0%, 3.0%, and 1.2% on Recall@20 for Amazon-Books, Yelp2018 and Gowalla, respectively.

Evolutionary Preference Learning via Graph Nested GRU ODE for Session-based Recommendation

Session-based recommendation (SBR) aims to predict the user's next action based on the ongoing sessions. Recently, there has been an increasing interest in modeling the user preference evolution to capture the fine-grained user interests. While latent user preferences behind the sessions drift continuously over time, most existing approaches still model the temporal session data in discrete state spaces, which are incapable of capturing the fine-grained preference evolution and result in sub-optimal solutions. To this end, we propose Graph Nested GRU ordinary differential equation (ODE), namely GNG-ODE, a novel continuum model that extends the idea of neural ODEs to continuous-time temporal session graphs. The proposed model preserves the continuous nature of dynamic user preferences, encoding both temporal and structural patterns of item transitions into continuous-time dynamic embeddings. As the existing ODE solvers do not consider graph structure change and thus cannot be directly applied to the dynamic graph, we propose a time alignment technique, called t-Alignment, to align the updating time steps of the temporal session graphs within a batch. Empirical results on three benchmark datasets show that GNG-ODE significantly outperforms other baselines.

Learning Hypersphere for Few-shot Anomaly Detection on Attributed Networks

The existence of anomalies is quite common, but they are hidden within the complex structure and high-dimensional node attributes of the attributed networks. As a latent hazard in existing systems, anomalies can be transformed into important instruction information once we detect them, e.g., computer network admins can react to the leakage of sensitive data if network traffic anomalies are identified. Extensive research in anomaly detection on attributed networks has proposed various techniques, which do improve the quality of data in networks, while they rarely cope with the few-shot anomaly detection problem. Few-shot anomaly detection task with only a few dozen labeled anomalies is more practical since anomalies are rare in number for real-world systems.

We propose a few-shot anomaly detection approach for detecting the anomaly nodes that significantly deviate from the vast majority. Our approach, based on an extension of model-agnostic meta-learning(MAML), is a Learnable Hypersphere Meta-Learning method running on local subgraphs named LHML. LHML learns on a single subgraph, conducts meta-learning on a set of subgraphs, and maintains the radius of a learnable hypersphere across subgraphs to detect anomalies efficiently. The learnable hypersphere is a changing boundary that can be used to identify anomalies in an unbalanced binary-classification setting and quickly adapt to a new subgraph by a few gradient updating steps of MAML. Furthermore, our model runs across subgraphs, making it possible to identify an anomaly without requiring the whole graph nodes as is usually the way but only a handful of nodes around it, which means LHML can scale to large networks. Experimental results show the effective performance of LHML on benchmark datasets.

KiCi: A Knowledge Importance Based Class Incremental Learning Method for Wearable Activity Recognition

Wearable-based human activity recognition (HAR) is commonly employed in real-world scenarios such as health monitoring, auxiliary diagnosis, etc. As implementing activity recognition is a daunting challenge in an open dynamic environment, incremental learning has become a common method to adapt to variable behavior patterns of users and create dynamic modeling in activity recognition. However, catastrophic forgetting is a significant challenge with incremental learning. This is contrary to our expectations of identifying new activity classes while remembering existing ones. To address this problem, we propose a knowledge importance-based class incremental learning method called KiCi and construct an incremental learning model based on the framework of self-iterative knowledge distillation for dynamic activity recognition. To eliminate the prediction bias of the teacher model on the old knowledge, we utilize the trained weights of previous incremental steps generated by the teacher model as the prior knowledge to obtain knowledge importance. Then use it to make the student model have a reasonable trade-off between old and new knowledge and mitigate catastrophic forgetting by avoiding negative transfer. We conduct extensive experiments on four public HAR datasets and our method consistently outperforms the existing state-of-the-art methods by a large margin.

Bootstrap-based Causal Structure Learning

Learning a causal structure from observational data is crucial for data scientists. Recent advances in causal structure learning (CSL) have focused on local-to-global learning, since the local-to-global CSL can be scaled to high-dimensional data. The local-to-global CSL algorithms first learn the local skeletons, then construct the global skeleton, and finally orient edges. In practice, the performance of local-to-global CSL mainly depends on the accuracy of the global skeleton. However, in many real-world settings, owing to inevitable data quality issues (e.g. noise and small sample), existing local-to-global CSL methods often yield many asymmetric edges (e.g., given anasymmetric edge containing variables A and B, the learned skeleton of A contains B, but the learned skeleton of B does not contain A), which make it difficult to construct a high quality global skeleton. To tackle this problem, this paper proposes a <u>B</u>ootstrap sampling based <u>C</u>ausal <u>S</u>tructure <u>L</u>earning (BCSL) algorithm. The novel contribution of BCSL is that it proposes an integrated global skeleton learning strategy that can construct more accurate global skeletons. Specifically, this strategy first utilizes the Bootstrap method to generate multiple sub-datasets, then learns the local skeleton of variables on each asymmetric edge on those sub-datasets, and finally designs a novel scoring function to estimate the learning results on all sub-datasets for correcting the asymmetric edge. Extensive experiments on both benchmark and real datasets verify the effectiveness of the proposed method.

RAGUEL: Recourse-Aware Group Unfairness Elimination

While machine learning and ranking-based systems are in widespread use for sensitive decision-making processes (e.g., determining job candidates, assigning credit scores), they are rife with concerns over unintended biases in their outcomes, which makes algorithmic fairness (e.g., demographic parity, equal opportunity) an objective of interest. 'Algorithmic recourse' offers feasible recovery actions to change unwanted outcomes through the modification of attributes. We introduce the notion of ranked group-level recourse fairness, and develop a 'recourse-aware ranking' solution that satisfies ranked recourse fairness constraints while minimizing the cost of suggested modifications. Our solution suggests interventions that can reorder the ranked list of database records and mitigate group-level unfairness; specifically, disproportionate representation of sub-groups and recourse cost imbalance. This re-ranking identifies the minimum modifications to data points, with these attribute modifications weighted according to their ease of recourse. We then present an efficient block-based extension that enables re-ranking at any granularity (e.g., multiple brackets of bank loan interest rates, multiple pages of search engine results). Evaluation on real datasets shows that, while existing methods may even exacerbate recourse unfairness, our solution – RAGUEL – significantly improves recourse-aware fairness. RAGUEL outperforms alternatives at improving recourse fairness, through a combined process of counterfactual generation and re-ranking, whilst remaining efficient for large-scale datasets.

Multi-Aggregator Time-Warping Heterogeneous Graph Neural Network for Personalized Micro-Video Recommendation

Micro-video recommendation is attracting global attention and becoming a popular daily service for people of all ages. Recently, Graph Neural Networks-based micro-video recommendation has displayed performance improvement for many kinds of recommendation tasks. However, the existing works fail to fully consider the characteristics of micro-videos, such as the high timeliness of news nature micro-video recommendation and sequential interactions of frequently changed interests. In this paper, a novel Multi-aggregator Time-warping Heterogeneous Graph Neural Network (MTHGNN) is proposed for personalized news nature micro-video recommendation based on sequential sessions, where characteristics of micro-videos are comprehensively studied, users' preference is mined via multi-aggregator, the temporal and dynamic changes of users' preference are captured, and timeliness is considered. Through the comparison with the state-of-the-arts, the experimental results validate the superiority of our MTHGNN model.

Rethinking Conversational Recommendations: Is Decision Tree All You Need?

Conversational recommender systems (CRS) dynamically obtain the users' preferences via multi-turn questions and answers. The existing CRS solutions are widely dominated by deep reinforcement learning algorithms. However, deep reinforcement learning methods are often criticized for lacking interpretability and requiring a large amount of training data to perform.

In this paper, we explore a simpler alternative and propose a decision tree based solution to CRS. The underlying challenge in CRS is that the same item can be described differently by different users. We show that decision trees are sufficient to characterize the interactions between users and items, and solve the key challenges in multi-turn CRS: namely which questions to ask, how to rank the candidate items, when to recommend, and how to handle user's negative feedback on the recommendations. Firstly, the training of decision trees enables us to find questions which effectively narrow down the search space. Secondly, by learning embeddings for each item and tree nodes, the candidate items can be ranked based on their similarity to the conversation context encoded by the tree nodes. Thirdly, the diversity of items associated with each tree node allows us to develop an early stopping strategy to decide when to make recommendations. Fourthly, when the user rejects a recommendation, we adaptively choose the next decision tree to improve subsequent questions and recommendations. Extensive experiments on three publicly available benchmark CRS datasets show that our approach provides significant improvement to the state of the art CRS methods.

Stop&Hop: Early Classification of Irregular Time Series

Early classification algorithms help users react faster to their machine learning model's predictions. Early warning systems in hospitals, for example, let clinicians improve their patients' outcomes by accurately predicting infections. While early classification systems are advancing rapidly, a major gap remains: existing systems do not consider irregular time series, which have uneven and often-long gaps between their observations. Such series are notoriously pervasive in impactful domains like healthcare. We bridge this gap and study early classification of irregular time series, a new setting for early classifiers that opens doors to more real-world problems. Our solution, Stop&Hop, uses a continuous-time recurrent network to model ongoing irregular time series in real time, while an irregularity-aware halting policy, trained with reinforcement learning, predicts when to stop and classify the streaming series. By taking real-valued step sizes, the halting policy flexibly decides exactly when to stop ongoing series in real time. This way, Stop&Hop seamlessly integrates information contained in the timing of observations, a new and vital source for early classification in this setting, with the time series values to provide early classifications for irregular time series. Using four synthetic and three real-world datasets, we demonstrate that Stop&Hop consistently makes earlier and more-accurate predictions than state-of-the-art alternatives adapted to this new problem. Our code is publicly available at

Change Detection for Local Explainability in Evolving Data Streams

As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model has been trained and remains stable. As a result, it is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications. In this paper, we discuss the impact of temporal change on local feature attributions. In particular, we show that local attributions can become obsolete each time the predictive model is updated or concept drift alters the data generating distribution. Consequently, local feature attributions in data streams provide high explanatory power only when combined with a mechanism that allows us to detect and respond to local changes over time. To this end, we present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift. CDLEEDS serves as an intuitive extension of attribution-based explanation techniques to identify outdated local attributions and enable more targeted recalculations. In experiments, we also show that the proposed framework can reliably detect both local and global concept drift. Accordingly, our work contributes to a more meaningful and robust explainability in online machine learning.

Modeling Diverse Chemical Reactions for Single-step Retrosynthesis via Discrete Latent Variables

Single-step retrosynthesis is the cornerstone of retrosynthesis planning, which is a crucial task for computer-aided drug discovery. The goal of single-step retrosynthesis is to identify the possible reactants that lead to the synthesis of the target product in one reaction. By representing organic molecules as canonical strings, existing sequence-based retrosynthetic methods treat the product-to-reactant retrosynthesis as a sequence-to-sequence translation problem. However, most of them struggle to identify diverse chemical reactions for a desired product due to the deterministic inference, which contradicts the fact that many compounds can be synthesized through various reaction types with different sets of reactants. In this work, we aim to increase reaction diversity and generate various reactants using discrete latent variables. We propose a novel sequence-based approach, namely RetroDVCAE, which incorporates conditional variational autoencoders into single-step retrosynthesis and associates discrete latent variables with the generation process. Specifically, RetroDVCAE uses the Gumbel-Softmax distribution to approximate the categorical distribution over potential reactions and generates multiple sets of reactants with the variational decoder. Experiments demonstrate that RetroDVCAE outperforms state-of-the-art baselines on both benchmark dataset and homemade dataset. Both quantitative and qualitative results show that RetroDVCAE can model the multi-modal distribution over reaction types and produce diverse reactant candidates.

AutoMARS: Searching to Compress Multi-Modality Recommendation Systems

Web applications utilize Recommendation Systems (RS) to address the problem of consumer over-choices. Recent works have taken advantage of multi-modality or multi-view, input information (such as user interaction, images, texts, rating scores) to boost recommendation system performance compared with using single-modality information. However, the use of multi-modality input demands much higher computational cost and storage capacity. On the other hand, the real-world RS services usually have strict budgets on both time and space for a good customer experience. As a result, the model efficiency of multi-modality recommendation systems has gained increasing importance. While unfortunately, to the best of our knowledge, there is no existing study of a generic compression framework for multi-modality RS. In this paper, we investigate, for the first time, how to compress a multi-modality recommendation system with a fixed budget. Assuming that input information from different modalities are of unequal importance, a good compression algorithm should learn to automatically allocate different resource budgets to each input, based on their importance in maximally preserving recommendation efficacy. To this end, we leverage the tools of neural architecture search (NAS) and distillation and propose Auto Multi-modAlity Recommendation System (AutoMARS), a unified modality-aware model compression framework dedicated to multi-modality recommendation systems. We demonstrate the effectiveness and generality of AutoMARS by testing it on three different Amazon datasets of various sparsity. AutoMARS demonstrates superior multi-modality compression performance than previous state-of-the-art compression methods. For example on the Amazon Beauty dataset, we achieve on average a 20% higher accuracy over previous state-of-the-art methods, while enjoying 65% reduction over baselines. Codes are available at:

Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction

Recent progress in neural information retrieval has demonstrated large gains in quality, while often sacrificing efficiency and interpretability compared to classical approaches. We propose ColBERTer, a neural retrieval model using contextualized late interaction (ColBERT) with enhanced reduction. Along the effectiveness Pareto frontier, ColBERTer dramatically lowers ColBERT's storage requirements while simultaneously improving the interpretability of its token-matching scores. To this end, ColBERTer fuses single-vector retrieval, multi-vector refinement, and optional lexical matching components into one model. For its multi-vector component, ColBERTer reduces the number of stored vectors by learning unique whole-word representations and learning to identify and remove word representations that are not essential to effective scoring. We employ an explicit multi-task, multi-stage training to facilitate using very small vector dimensions. Results on the MS MARCO and TREC-DL collection show that ColBERTer reduces the storage footprint by up to 2.5x, while maintaining effectiveness. With just one dimension per token in its smallest setting, ColBERTer achieves index storage parity with the plaintext size, with very strong effectiveness results. Finally, we demonstrate ColBERTer's robustness on seven high-quality out-of-domain collections, yielding statistically significant gains over traditional retrieval baselines.

Prediction-based One-shot Dynamic Parking Pricing

Many U.S. metropolitan cities are notorious for their severe shortage of parking spots. To this end, we present a proactive prediction-driven optimization framework to dynamically adjust parking prices. We use state-of-the-art deep learning technologies such as neural ordinary differential equations (NODEs) to design our future parking occupancy rate prediction model given historical occupancy rates and price information. Owing to the continuous and bijective characteristics of NODEs, in addition, we design a one-shot price optimization method given a pre-trained prediction model, which requires only one iteration to find the optimal solution. In other words, we optimize the price input to the pre-trained prediction model to achieve targeted occupancy rates in the parking blocks. We conduct experiments with the data collected in San Francisco and Seattle for years. Our prediction model shows the best accuracy in comparison with various temporal or spatio-temporal forecasting models. Our one-shot optimization method greatly outperforms other black-box and white-box search methods in terms of the search time and always returns the optimal price solution.

Can We Have Both Fish and Bear's Paw?: Improving Performance, Reliability, and both of them for Relation Extraction under Label Shift

Neural Relation Extraction (RE) models need large amounts of labeled data for effective training, which mainly comes from automatically labeling by Distant Supervision (DS). Though fast and easy, the label shift problem inevitably happens, i.e., the label distribution of DS-generated training set is quite different from that of the real world (i.e. test set). According to our observations, label shift not only leads to performance diminishment, but also hinders the reliability of DS-RE models by causing bad confidence estimation. In this paper, we make contributions by answering the following three questions: 1) How to improve performance of DS-RE models under label shift? 2) How to make sure their reliability under label shift? 3) How to improve both performance and reliability for DS-RE models under label shift? To the best of our knowledge, this is the first paper to study the performance as well as reliability of DS-RE models under label shift. Experiment results show significant improvements on two real-world datasets and six popular neural RE models, making a step further towards high-performance and reliable RE system under real-world label-shift conditions.

One Rating to Rule Them All?: Evidence of Multidimensionality in Human Assessment of Topic Labeling Quality

Two general approaches are common for evaluating automatically generated labels in topic modeling: direct human assessment; or performance metrics that can be calculated without, but still correlate with, human assessment. However, both approaches implicitly assume that the quality of a topic label is single-dimensional. In contrast, this paper provides evidence that human assessments about the quality of topic labels consist of multiple latent dimensions. This evidence comes from human assessments of four simple labeling techniques. For each label, study participants responded to several items asking them to assess each label according to a variety of different criteria. Exploratory factor analysis shows that these human assessments of labeling quality have a two-factor latent structure. Subsequent analysis demonstrates that this multi-item, two-factor assessment can reveal nuances that would be missed using either a single-item human assessment of perceived label quality or established performance metrics. The paper concludes by suggesting future directions for the development of human-centered approaches to evaluating NLP and ML systems more broadly.

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text. Existing approaches for this task have yielded impressive results when the training and testing data are from the same domain. However, these methods show a drastic decrease in performance when applied to cross-domain settings where the domain of the testing data differs from that of the training data. To address this lack of extensibility and robustness, we propose a novel approach for automatically constructing domain-specific knowledge graphs that contain information relevant to the identification of aspect terms. We introduce a methodology for injecting information from these knowledge graphs into Transformer models, including two alternative mechanisms for knowledge insertion: via query enrichment and via manipulation of attention patterns. We demonstrate state-of-the-art performance on benchmark datasets for cross-domain aspect term extraction using our approach and investigate how the amount of external knowledge available to the Transformer impacts model performance.

Memory Bank Augmented Long-tail Sequential Recommendation

The goal of sequential recommendation is to predict the next item that a user would like to interact with, by capturing her dynamic historical behaviors. However, most existing sequential recommendation methods do not focus on solving the long-tail item recommendation problem that is caused by the imbalanced distribution of item data. To solve this problem, we propose a novel sequential recommendation framework, named MASR (ie <u>M</u>emory Bank <u>A</u>ugmented Long-tail <u>S</u>equential <u>R</u>ecommendation). MASR is an "Open-book'' model that combines novel types of memory banks and a retriever-copy network to alleviate the long-tail problem. During inference, the designed retriever-copy network retrieves related sequences from the training samples and copies the useful information as a cue to improve the recommendation performance on tail items. Two designed memory banks provide reference samples to the retriever-copy network by memorizing the historical samples appearing in the training phase. Extensive experiments have been performed on five real-world datasets to demonstrate the effectiveness of the proposed MASR model. The experimental results indicate that MASR consistently outperforms baseline methods in terms of recommendation performance on tail items.

An Uncertainty-Aware Imputation Framework for Alleviating the Sparsity Problem in Collaborative Filtering

Collaborative Filtering (CF) methods for recommender systems commonly suffer from the data sparsity issue. Data imputation has been widely adopted to deal with this issue. However, existing studies have limitations in the sense that both uncertainty and robustness of imputation have not been taken into account, where there is a high risk that the imputed values are likely to be far from the true values. This paper explores a novel imputation framework, named Uncertainty-Aware Multiple Imputation (UA-MI), which can effectively solve the sparsity issue. Given a (sparse) user-item interaction matrix, our key idea is to quantify uncertainty on each missing entry and then the cells with the lowest uncertainty are selectively imputed. Here, we suggest three strategies for measuring uncertainty in missing user-item interactions, each of which is based on sampling, dropout, and ensemble, respectively. They successfully obtain element-wise mean and variance on the missing entries, where the variance helps determine where in the matrix should be imputed and the corresponding mean values are imputed. Experiments show that our UA-MI framework significantly outperformed the existing imputation strategies

Beyond Learning from Next Item: Sequential Recommendation via Personalized Interest Sustainability

Sequential recommender systems have shown effective suggestions by capturing users' interest drift. There have been two groups of existing sequential models: user- and item-centric models. The user-centric models capture personalized interest drift based on each user's sequential consumption history, but do not explicitly consider whether users' interest in items sustains beyond the training time, i.e., interest sustainability. On the other hand, the item-centric models consider whether users' general interest sustains after the training time, but it is not personalized. In this work, we propose a recommender system taking advantages of the models in both categories. Our proposed model captures personalized interest sustainability, indicating whether each user's interest in items will sustain beyond the training time or not. We first formulate a task that requires to predict which items each user will consume in the recent period of the training time based on users' consumption history. We then propose simple yet effective schemes to augment users' sparse consumption history. Extensive experiments show that the proposed model outperforms 10 baseline models on 11 real-world datasets. The codes are available at:

Discovering Fine-Grained Semantics in Knowledge Graph Relations

Knowledge graphs (KGs) provide structured representation of data in the form of relations between different entities. The semantics of relations between words and entities are often ambiguous, where it is common to find polysemous relations that represent multiple semantics based on the context. This ambiguity in relation semantics also proliferates KG triples. While the guidance from custom-designed ontologies addresses this issue to some extent, our analysis shows that the heterogeneity and complexity of real-world data still results in substantial relation polysemy within popular KGs. The correct semantic interpretation of KG relations is necessary for many downstream applications such as entity classification and question answering. We present the problem of fine-grained relation discovery and a data-driven method towards this task that leverages the vector representations of the knowledge graph entities and relations available from relational learning models. We show that by performing clustering over these vectors, our method is able to not only identify the polysemous relations in knowledge graphs, but also discover the different semantics associated with them. Extensive empirical evaluation shows that fine-grained relations discovered by the proposed approach lead to substantial improvement in the semantics in the Yago and NELL datasets, as compared to baselines. Additional insights from qualitative analyses convey that fine-grained relation discovery is an important yet complex task, especially in the presence of complex ontologies and noisy data.

Accurate Action Recommendation for Smart Home via Two-Level Encoders and Commonsense Knowledge

How can we accurately recommend actions for users to control their devices at home? Action recommendation for smart home has attracted increasing attention due to its potential impact on the markets of Internet of Things (IoT). However, designing an effective action recommender system is challenging because it requires handling context correlations, considering both queried contexts and previous histories of users, and dealing with capricious intentions in history. In this work, we propose SmartSense, an accurate action recommendation method for smart home. For individual action, SmartSense summarizes its device control and temporal contexts in a self-attentive manner, to reflect the importance of the correlation between them. SmartSense then summarizes sequences considering queried contexts in a query-attentive manner to extract the query-related patterns from the sequential actions. SmartSense also transfers the commonsense knowledge from routine data to better handle intentions in action sequences. As a result, SmartSense addresses all three main challenges of action recommendation for smart home, and achieves the state-of-the-art performance giving up to 9.8% higher mAP@1 than the best competitor.

Diverse Effective Relationship Exploration for Cooperative Multi-Agent Reinforcement Learning

In some complex multi-agent environments, the types of relationships between agents are diverse and their intensity changes during the policy learning process. Theoretically, some of these relationships can facilitate cooperative policy learning. However, acquiring these relationships is an intractable problem. To tackle the problem, we propose a diverse effective relationship exploration based multi-agent reinforcement learning (DERE) method. Specifically, a potential fields model is firstly designed to represent relationships between agents. Then to encourage the exploration of effective relationships, we define an information-theoretic objective function. Finally, an intrinsic reward function is designed to optimize the information-theoretic objective, meanwhile, guide agents to learn more effective collaborative policies. Experimental results show that our method outperforms state-of-the-art methods on both super hard StarCraft II micromanagement tasks (SMAC) and Google Research Football (GRF).

Estimating Causal Effects on Networked Observational Data via Representation Learning

In this paper, we study the causal effects estimation problem on networked observational data. We theoretically prove that standard graph machine learning (ML) models, e.g., graph neural networks (GNNs), fail in estimating the causal effects on networks. We show that graph ML models exhibit two distribution mismatches of their objective functions compared to causal effects estimation, leading to the failure of traditional ML models. Motivated by this, we first formulate the networked causal effects estimation as a data-driven multi-task learning problem, and then propose a novel framework NetEst to conduct causal inference in the network setting. NetEst uses GNNs to learn representations for confounders, which are from both a unit's own characteristics and the network effects. The embeddings are then used to sufficiently bridge the distribution gaps via adversarial learning and estimate the observed outcomes simultaneously. Extensive experimental studies on two real-world networks with semi-synthetic data demonstrate the effectiveness of NetEst. We also provide analyses on why and when NetEst works.

Towards Federated Learning against Noisy Labels via Local Self-Regularization

Federated learning (FL) aims to learn joint knowledge from a large scale of decentralized devices with labeled data in a privacy-preserving manner. However, data with noisy labels are ubiquitous in reality since high-quality labeled data require expensive human efforts, which cause severe performance degradation. Although a lot of methods are proposed to directly deal with noisy labels, these methods either require excessive computation overhead or violate the privacy protection principle of FL. To this end, we focus on this issue in FL with the purpose of alleviating performance degradation yielded by noisy labels meanwhile guaranteeing data privacy. Specifically, we propose a Local Self-Regularization method, which effectively regularizes the local training process via implicitly hindering the model from memorizing noisy labels and explicitly narrowing the model output discrepancy between original and augmented instances using self distillation. Experimental results demonstrate that our proposed method can achieve notable resistance against noisy labels in various noise levels on three benchmark datasets. In addition, we integrate our method with existing state-of-the-arts and achieve superior performance on the real-world dataset Clothing1M.The code is available at

Multi-Scale User Behavior Network for Entire Space Multi-Task Learning

Modelling the user's multiple behaviors is an essential part of modern e-commerce, whose widely adopted application is to jointly optimize click-through rate (CTR) and conversion rate (CVR) predictions. Most of existing methods overlook the effect of two key characteristics of the user's behaviors: for each item list, (i) contextual dependence refers to that the user's behaviors on any item are not purely determinated by the item itself but also are influenced by the user's previous behaviors (e.g., clicks, purchases) on other items in the same sequence; (ii) multiple time scales means that users are likely to click frequently but purchase periodically. To this end, we develop a new multi-scale user behavior network named <u>H</u> ierarchical r <u>E</u> current <u>R</u> anking <u>O</u> n the <u>E</u> ntire <u>S</u> pace (HEROES) which incorporates the contextual information to estimate the user multiple behaviors in a multi-scale fashion. Concretely, we introduce a hierarchical framework, where the lower layer models the user's engagement behaviors while the upper layer estimates the user's satisfaction behaviors. The proposed architecture can automatically learn a suitable time scale for each layer to capture the dynamic user's behavioral patterns. Besides the architecture, we also introduce the Hawkes process to form a novel recurrent unit which can not only encode the items' features in the context but also formulate the excitation or discouragement from the user's previous behaviors. We further show that HEROES can be extended to build unbiased ranking systems through combinations with the survival analysis technique. Extensive experiments over three large-scale industrial datasets demonstrate the superiority of our model compared with the state-of-the-art methods. characteristics of the user's behaviors: for each item list, (i) contex- tual dependence refers to that the user's behaviors on any item are not purely determinated by the item itself but also are influenced by the user's previous behaviors (e.g., clicks, purchases) on other items in the same sequence; (ii) multiple time scales means that users are likely to click frequently but purchase periodically. To this end, we develop a new multi-scale user behavior network named Hierarchical rEcurrent Ranking On the Entire Space (HEROES) which incorporates the contextual information to estimate the user multiple behaviors in a multi-scale fashion. Concretely, we intro- duce a hierarchical framework, where the lower layer models the user's engagement behaviors while the upper layer estimates the user's satisfaction behaviors. The proposed architecture can auto- matically learn a suitable time scale for each layer to capture the dynamic user's behavioral patterns. Besides the architecture, we also introduce the Hawkes process to form a novel recurrent unit which can not only encode the items' features in the context but also formulate the excitation or discouragement from the user's previous behaviors. We further show that HEROES can be extended to build unbiased ranking systems through combinations with the survival analysis technique. Extensive experiments over three large- scale industrial datasets demonstrate the superiority of our model compared with the state-of-the-art methods.

Extracting Drug-drug Interactions from Biomedical Texts using Knowledge Graph Embeddings and Multi-focal Loss

The field of Drug-drug interaction (DDI) aims to detect descriptions of interactions between drugs from biomedical texts. Currently, researchers have extracted DDIs using pre-trained language models such as BERT, which often misclassify two kinds of DDI types, "Effect" and "Int", on the DDIExtraction 2013 corpus because of highly similar expressions. The use of knowledge graphs can alleviate this problem by incorporating different relationships for each, thus allowing them to be distinguished. Thus, we propose a novel framework to integrate the neural network with a knowledge graph, where the features from these components are complementary. Specifically, we take text features at different levels into account in the neural network part. This is done by firstly obtaining a word-level position feature using PubMedBERT together with a convolution neural network, secondly, getting a phrase-level key path feature using a dependency parsing tree, thirdly, using PubMedBERT with an attention mechanism to obtain a sentence-level language feature, and finally, fusing these three kinds of representation into a synthesized feature. We also extract a knowledge feature from a drug knowledge graph which takes just a few minutes to construct, then concatenate the synthesized feature with the knowledge feature, feed the result into a multi-layer perceptron and obtain the result by a softmax classifier. In order to achieve a good integration of the synthesized feature and the knowledge feature, we train the model using a novel multifocal loss function, KGE-MFL, which is based on a knowledge graph embedding. Finally we attain state-of-the-art results on the DDIExtraction 2013 dataset (micro F-score 86.24%) and on the ChemProt dataset (micro F-score 77.75%), which proves our framework to be effective for biomedical relation extraction tasks. In particular, we fill the performance gap (more than 5.57%) between methods that rely on and do not rely on knowledge graph embedding on the DDIExtraction 2013 corpus, when predicting the "Int" type. The implementation code is available at

X-GOAL: Multiplex Heterogeneous Graph Prototypical Contrastive Learning

Graphs are powerful representations for relations among objects, which have attracted plenty of attention in both academia and industry. A fundamental challenge for graph learning is how to train an effective Graph Neural Network (GNN) encoder without labels, which are expensive and time consuming to obtain. Contrastive Learning (CL) is one of the most popular paradigms to address this challenge, which trains GNNs by discriminating positive and negative node pairs. Despite the success of recent CL methods, there are still two under-explored problems. Firstly, how to reduce the semantic error introduced by random topology based data augmentations. Traditional CL defines positive and negative node pairs via the node-level topological proximity, which is solely based on the graph topology regardless of the semantic information of node attributes, and thus some semantically similar nodes could be wrongly treated as negative pairs. Secondly, how to effectively model the multiplexity of the real-world graphs, where nodes are connected by various relations and each relation could form a homogeneous graph layer. To solve these problems, we propose a novel multiple<u>x</u> heterogeneous <u>g</u>raph pr<u>o</u>totypical contr<u>a</u>stive <u>l</u>eaning (X-GOAL) framework to extract node embeddings. X-GOAL is comprised of two components: the GOAL framework, which learns node embeddings for each homogeneous graph layer, and an alignment regularization, which jointly models different layers by aligning layer-specific node embeddings. Specifically, the GOAL framework captures the node-level information by a succinct graph transformation technique, and captures the cluster-level information by pulling nodes within the same semantic cluster closer in the embedding space. The alignment regularization aligns embeddings across layers at both node level and cluster level. We evaluate the proposed X-GOAL on a variety of real-world datasets and downstream tasks to demonstrate the effectiveness of the X-GOAL framework.

Can Adversarial Training benefit Trajectory Representation?: An Investigation on Robustness for Trajectory Similarity Computation

Trajectory similarity computation as the fundamental problem for various downstream analytic tasks, such as trajectory classification and clustering, has been extensively studied in recent years. However, how to infer an accurate and robust similarity over two trajectories is difficult due to the some trajectory characteristics in practice, e.g. non-uniform sampling rate, nonmalignant fluctuation, and noise points, etc. To circumvent such challenges, we in this paper introduce the adversarial training idea into the trajectory representation learning for the first time to enhance the robustness and accuracy. Specifically, our proposed method AdvTraj2Vec has two novelties: i) it perturbs the weight parameters of embedding layers to learn a robust model to infer an accurate pairwise similarity over each two trajectories; and ii) it employs the GAN momentum to harness the perturbation extent to which an appropriate trajectory representation can be learned for the similarity computation. Extensive experiments using two real-world trajectory datasets Porto and Beijing validate our proposed AdvTraj2Vec on the robustness and accuracy aspects. The multi-facet results show that our AdvTraj2Vec significantly outperforms the stat-of-the-art methods in terms of different distortions, such as trajectory-point addition, deletion, disturbance, and outlier injection.

Efficient Optimization of Dominant Set Clustering with Frank-Wolfe Algorithms

We study Frank-Wolfe algorithms – standard, pairwise, and away-steps – for efficient optimization of Dominant Set Clustering. We present a unified and computationally efficient framework to employ the different variants of Frank-Wolfe methods, and we investigate its effectiveness via several experimental studies. In addition, we provide explicit convergence rates for the algorithms in terms of the so-called Frank-Wolfe gap. The theoretical analysis has been specialized to Dominant Set Clustering and covers consistently the different variants.

Contrastive Representation Learning for Conversational Question Answering over Knowledge Graphs

This paper addresses the task of conversational question answering (ConvQA) over knowledge graphs (KGs). The majority of existing ConvQA methods rely on full supervision signals with a strict assumption of the availability of gold logical forms of queries to extract answers from the KG. However, creating such a gold logical form is not viable for each potential question in a real-world scenario. Hence, in the case of missing gold logical forms, the existing information retrieval-based approaches use weak supervision via heuristics or reinforcement learning, formulating ConvQA as a KG path ranking problem. Despite missing gold logical forms, an abundance of conversational contexts, such as entire dialog history with fluent responses and domain information, can be incorporated to effectively reach the correct KG path. This work proposes a contrastive representation learning-based approach to rank KG paths effectively. Our approach solves two key challenges. Firstly, it allows weak supervision-based learning that omits the necessity of gold annotations. Second, it incorporates the conversational context (entire dialog history and domain information) to jointly learn its homogeneous representation with KG paths to improve contrastive representations for effective path ranking. We evaluate our approach on standard datasets for ConvQA, on which it significantly outperforms existing baselines on all domains and overall. Specifically, in some cases, the Mean Reciprocal Rank (MRR) and Hit@5 ranking metrics improve by absolute 10 and 18 points, respectively, compared to the state-of-the-art performance.

Sharper Utility Bounds for Differentially Private Models: Smooth and Non-smooth

In this paper, by introducing Generalized Bernstein condition, we propose the first O(√p over n∈ ) high probability excess population risk bound for differentially private algorithms under the assumptions G-Lipschitz, L-smooth, and Polyak-Łojasiewicz condition, based on gradient perturbation method. If we replace the properties G-Lipschitz and L-smooth by α-Hölder smoothness (which can be used in non-smooth setting), the high probability bound comes to O(n-α over 1+2α) w.r.t n, which cannot achieve O (1/n) when α ∈(0,1]. To solve this problem, we propose a variant of gradient perturbation method, max1,g -Normalized Gradient Perturbation (m-NGP). We further show that by normalization, the high probability excess population risk bound under assumptions α-Hölder smooth and Polyak-Łojasiewicz condition can achieve O (√p over n∈), which is the first O (1/n) high probability excess population risk bound w.r.t n for differentially private algorithms under non-smooth conditions. Moreover, experimental results show that m-NGP improves the performance of the differentially private model over real datasets.

Residual Correction in Real-Time Traffic Forecasting

Predicting traffic conditions is tremendously challenging since every road is highly dependent on each other, both spatially and temporally. Recently, to capture this spatial and temporal dependency, specially designed architectures such as graph convolutional networks and temporal convolutional networks have been introduced. While there has been remarkable progress in traffic forecasting, we found that deep-learning-based traffic forecasting models still fail in certain patterns, mainly in event situations (e.g., rapid speed drops). Although it is commonly accepted that these failures are due to unpredictable noise, we found that these failures can be corrected by considering previous failures. Specifically, we observe autocorrelated errors in these failures, which indicates that some predictable information remains. In this study, to capture the correlation of errors, we introduce ResCAL, a residual estimation module for traffic forecasting, as a widely applicable add-on module to existing traffic forecasting models. Our ResCAL calibrates the prediction of the existing models in real time by estimating future errors using previous errors and graph signals. Extensive experiments on METR-LA and PEMS-BAY demonstrate that our ResCAL can correctly capture the correlation of errors and correct the failures of various traffic forecasting models in event situations.

FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

Robustness is becoming another important challenge of federated learning in that the data collection process in each client is naturally accompanied by noisy labels. However, it is far more complex and challenging owing to varying levels of data heterogeneity and noise over clients, which exacerbates the client-to-client performance discrepancy. In this work, we propose a robust federated learning method called FedRN, which exploits k-reliable neighbors with high data expertise or similarity. Our method helps mitigate the gap between low- and high-performance clients by training only with a selected set of clean examples, identified by a collaborative model that is built based on the reliability score over clients. We demonstrate the superiority of FedRN via extensive evaluations on three real-world or synthetic benchmark datasets. Compared with existing robust methods, the results show that FedRN significantly improves the test accuracy in the presence of noisy labels.

SWAG-Net: Semantic Word-Aware Graph Network for Temporal Video Grounding

In this paper, to effectively capture non-sequential dependencies among semantic words for temporal video grounding, we propose a novel framework called Semantic Word-Aware Graph Network (SWAG-Net), which adopts graph-guided semantic word embedding in an end-to-end manner. Specifically, we define semantic word features as node features of semantic word-aware graphs and word-to-word correlations as three edge types (i.e., intrinsic, extrinsic, and relative edges) for diverse graph structures. We then apply Semantic Word-aware Graph Convolutional Networks (SW-GCNs) to the graphs for semantic word embedding. For modality fusion and context modeling, the embedded features and video segment features are merged into bi-modal features, and the bi-modal features are aggregated by incorporating local and global contextual information. Leveraging the aggregated features, the proposed method effectively finds a temporal boundary semantically corresponding to a sentence query in an untrimmed video. We verify that our SWAG-Net outperforms state-of-the-art methods on Charades-STA and ActivityNet Captions datasets.

MARIO: Modality-Aware Attention and Modality-Preserving Decoders for Multimedia Recommendation

We address the multimedia recommendation problem, which utilizes items' multimodal features, such as visual and textual modalities, in addition to interaction information. While a number of existing multimedia recommender systems have been developed for this problem, we point out that none of these methods individually capture the influence of each modality at the interaction level. More importantly, we experimentally observe that the learning procedures of existing works fail to preserve the intrinsic modality-specific properties of items. To address above limitations, we propose an accurate multimedia recommendation framework, named MARIO, based on modality-aware attention and modality-preserving decoders. MARIO predicts users' preferences by considering the individual influence of each modality on each interaction while obtaining item embeddings that preserve the intrinsic modality-specific properties. The experiments on four real-life datasets demonstrate that MARIO consistently and significantly outperforms seven competitors in terms of the recommendation accuracy: MARIO yields up to 14.61% higher accuracy, compared to the best competitor.

Semorph: A Morphology Semantic Enhanced Pre-trained Model for Chinese Spam Text Detection

Chinese spam text detection is essential for social media since these texts affect the user experience of Chinese speakers and pollute the community. The underlying text classification method is employed to explore the unique combinations of characters that represent clues of spam information from annotated or further augmented data. However, based on the diversity of Chinese characters in glyphs, the spammers frequently wrap the spam content in another visually close text to fool the model but make sure people understand. This paper proposes to adopt the essence of human cognition of these adversarial texts into spam text detection models, by designing a pre-trained model to learn the morphology semantics of Chinese characters and represent their contextual meanings from scratch. The model pre-trains on self-supervised Chinese corpus and fine-tunes on spam-annotated community texts. Besides, cooperating with the pre-trained model that can capture the morphological features of Chinese, a new data perturbation method is introduced to guide the optimization towards the direction of recognizing the actual meaning of a text after spammers tamper with partial characters by visually close ones. The experimental results have shown that our proposed methodology can notably improve the performance of spam text detection as well as maintain robustness against adversarial samples.

Loyalty-based Task Assignment in Spatial Crowdsourcing

With the fast-paced development of mobile networks and the widespread usage of mobile devices, Spatial Crowdsourcing (SC) has drawn increasing attention in recent years. SC has the potential for collecting information for a broad range of applications such as on-demand local delivery and on-demand transportation. One of the critical issues in SC is task assignment that allocates location-based tasks (e.g., delivering food and packages) to appropriate moving workers (i.e., intelligent device carriers). In this paper, we study a loyalty-based task assignment problem, which aims to maximize the overall rewards of workers while considering worker loyalty. We propose a two-phase framework to solve the problem, including a worker loyalty prediction and a task assignment phase. In the first phase, we use a model based on an efficient time series prediction method called Prophet and an Entropy Weighting method to extract workers' short-term and long-term loyalty and then predict workers' current loyalty scores. In the task assignment phase, we design a Kuhn-Munkras-based algorithm that achieves the optimal task assignment and an efficient Degree-Reduction-based algorithm with minority first scheme. Extensive experiments offer insight into the effectiveness and efficiency of the proposed solutions.

Legal Charge Prediction via Bilinear Attention Network

The legal charge prediction task aims to judge appropriate charges according to the given fact description in cases. Most existing methods formulate it as a multi-class text classification problem and have achieved tremendous progress. However, the performance on low-frequency charges is still unsatisfactory. Previous studies indicate leveraging the charge label information can facilitate this task, but the approaches to utilizing the label information are not fully explored. In this paper, inspired by the vision-language information fusion techniques in the multi-modal field, we propose a novel model (denoted as LeapBank) by fusing the representations of text and labels to enhance the legal charge prediction task. Specifically, we devise a representation fusion block based on the bilinear attention network to interact the labels and text tokens seamlessly. Extensive experiments are conducted on three real-world datasets to compare our proposed method with state-of-the-art models. Experimental results show that LeapBank obtains up to 8.5% Macro-F1 improvements on the low-frequency charges, demonstrating our model's superiority and competitiveness.

Accelerating CNN via Dynamic Pattern-based Pruning Network

Recently, dynamic pruning methods have been actively researched, as they have shown very effective and remarkable performance in reducing computation complexity of deep neural networks. Nevertheless, most dynamic pruning methods fail to achieve actual acceleration due to the extra overheads caused by indexing and weight-copying to implement the dynamic sparse patterns for every input sample. To address this issue, we propose Dynamic Pattern-based Pruning Network (DPPNet), which preserves the advantages of both static and dynamic networks. First, our method statically prunes the weight kernel into various sparse patterns. Then, the dynamic convolution kernel is generated via aggregating input-dependent attention weights and static kernels. Unlike previous dynamic pruning methods, our novel method dynamically fuses static kernel patterns, enhancing the kernel's representational power without additional overhead. Moreover, our dynamic sparse pattern enables an efficient process using BLAS libraries, accomplishing actual acceleration. We demonstrate the effectiveness of the proposed DPPNet on CIFAR and ImageNet, outperforming the state-of-the-art methods achieving better accuracy with lower computational cost. For example, on ImageNet classification, ResNet34 utilizing DPP module achieves state-of-the-art performance with 65.6% FLOPs reduction and the inference speed increased by 35.9% without loss in accuracy. Code is available at

Sliding Cross Entropy for Self-Knowledge Distillation

Knowledge distillation (KD) is a powerful technique for improving the performance of a small model by leveraging the knowledge of a larger model. Despite its remarkable performance boost, KD has a drawback with the substantial computational cost of pre-training larger models in advance. Recently, a method called self-knowledge distillation has emerged to improve the model's performance without any supervision. In this paper, we present a novel plug-in approach called Sliding Cross Entropy (SCE) method, which can be combined with existing self-knowledge distillation to significantly improve the performance. Specifically, to minimize the difference between the output of the model and the soft target obtained by self-distillation, we split each softmax representation by a certain window size, and reduce the distance between sliced parts. Through this approach, the model evenly considers all the inter-class relationships of a soft target during optimization. The extensive experiments show that our approach is effective in various tasks, including classification, object detection, and semantic segmentation. We also demonstrate SCE consistently outperforms existing baseline methods.

Relational Self-Supervised Learning on Graphs

Over the past few years, graph representation learning (GRL) has been a powerful strategy for analyzing graph-structured data. Recently, GRL methods have shown promising results by adopting self-supervised learning methods developed for learning representations of images. Despite their success, existing GRL methods tend to overlook an inherent distinction between images and graphs, i.e., images are assumed to be independently and identically distributed, whereas graphs exhibit relational information among data instances, i.e., nodes. To fully benefit from the relational information inherent in the graph-structured data, we propose a novel GRL method, called RGRL, that learns from the relational information generated from the graph itself. RGRL learns node representations such that the relationship among nodes is invariant to augmentations, i.e., augmentation-invariant relationship, which allows the node representations to vary as long as the relationship among the nodes is preserved. By considering the relationship among nodes in both global and local perspectives, RGRL overcomes limitations of previous contrastive and non-contrastive methods, and achieves the best of both worlds. Extensive experiments on fourteen benchmark datasets over various downstream tasks demonstrate the superiority of RGRL over state-of-the-art baselines. The source code for RGRL is available at

Maximum Norm Minimization: A Single-Policy Multi-Objective Reinforcement Learning to Expansion of the Pareto Front

In this paper, we propose Maximum Norm Minimization (MNM), a single-policy Multi-Objective Reinforcement Learning (MORL) algorithm to solve the multi-objective RL problem. The main objective of our MNM is to provide the Pareto optimal points constituting the Pareto front in the multi-objective space. First, MNM measures distances among the Pareto optimal points in the current Pareto front and then normalizes the distances based on maximum and minimum reward values for each objective in the multi-objective space. Second, MNM identifies the maximum norm, i.e., the maximum value of the normalized Pareto optimal distances. Then MNM seeks to find a new Pareto optimal point, which corresponds to the middle of the two Pareto optimal points constituting the maximum norm. By iterating these two processes, MNM is able to expand and densify the Pareto front with increasing summation of the Pareto front volumes and decreasing mean-squared distance of the Pareto optimal points. To validate the performance of MNM, we provide the experimental results of five complex robotic multi-objective environments. In particular, we compare the performance of MNM with those of other state-of-the-art methods in terms of the summation of volumes and the mean-squared distance of the Pareto optimal points.

Parallel Skyline Processing Using Space Pruning on GPU

Skyline computation is an essential database operation that has many applications in multi-criteria decision making scenarios such as recommender systems. Existing algorithms have focused on checking point domination, which lack efficiency over large datasets. We propose a grid-based structure that enables grid cell domination checks. We show that only a small constant number of cells need to be checked which is independent from the number of data points. Our structure also enables parallel processing. We thus obtain a highly efficient parallel skyline algorithm named SkyCell, taking advantage of the parallelization power of graphics processing units. Experimental results confirm the effectiveness and efficiency of SkyCell -- it outperforms state-of-the-art algorithms consistently and by up to over two orders of magnitude in the computation time.

Automated Spatio-Temporal Synchronous Modeling with Multiple Graphs for Traffic Prediction

Traffic prediction plays an important role in many intelligent transportation systems. Many existing works design static neural network architecture to capture complex spatio-temporal correlations, which is hard to adapt to different datasets. Although recent neural architecture search approaches have addressed this problem, it still adopts a coarse-grained search with pre-defined and fixed components in the search space for spatio-temporal modeling. In this paper, we propose a novel neural architecture search framework, entitled AutoSTS, for automated spatio-temporal synchronous modeling in traffic prediction. To be specific, we design a graph neural network (GNN) based architecture search module to capture localized spatio-temporal correlations, where multiple graphs built from different perspectives are jointly utilized to find a better message passing way for mining such correlations. Further, we propose a convolutional neural network (CNN) based architecture search module to capture temporal dependencies with various ranges, where gated temporal convolutions with different kernel sizes and convolution types are designed in search space. Extensive experiments on six public datasets demonstrate that our model can achieve 4%-10% improvements compared with other methods.

MDGCF: Multi-Dependency Graph Collaborative Filtering with Neighborhood- and Homogeneous-level Dependencies

Due to the success of graph convolutional networks (GCNs) in effectively extracting features in non-Euclidean spaces, GCNs has become the rising star in implicit collaborative filtering. Existing works, while encouraging, typically adopt simple aggregation operation on the user-item bipartite graph to model user and item representations, but neglect to mine the sufficient dependencies between nodes, e.g., the relationships between users/items and their neighbors (or congeners), resulting in inadequate graph representation learning. To address these problems, we propose a novel Multi-Dependency Graph Collaborative Filtering (MDGCF) model, which mines the neighborhood- and homogeneous-level dependencies to enhance the representation power of graph-based CF models. Specifically, for neighborhood-level dependencies, we explicitly consider both popularity score and preference correlation by designing a joint neighborhood-level dependency weight, based on which we construct a neighborhood-level dependencies graph to capture higher-order interaction features. Besides, by adaptively mining the homogeneous-level dependencies among users and items, we construct two homogeneous graphs, based on which we further aggregate features from homogeneous users and items to supplement their representations, respectively. Extensive experiments on three real-world benchmark datasets demonstrate the effectiveness of the proposed MDGCF. Further experiments reveal that our model can capture rich dependencies between nodes for explaining user behaviors.

CoPatE: A Novel Contrastive Learning Framework for Patent Embeddings

Patents are legal rights issued to inventors to protect their inventions for a certain period and play an important role in today's artificial innovation. With the ever-increasing number of patents each year, an effective and efficient patent management and search system is indispensable for determining how different an invention is from prior works from the vast amount of patent data. However, the chnologists are using now is still based on the strategy of traditional keyword-based Boolean, which requires complex bool expressions. This type of strategy leads to poor performance and costs too much labor power to filter in post-processing. To address these issues, we proposed CoPatE: a novel Contrastive Learning Framework for Patent Embeddings to capture the high-level semantics of the large-scale patents, where a patent semantic compression module learns the informative claims to reduce the computational complexity, and a tags auxiliary learning module is to enhance the semantics of a patent from the structure to learn the high-quality patent embeddings. The CoPatE is trained with the patents from USPTO from 2013 to 2020 and tested by the patents from 2021 with the CPC scheme. The experimental results demonstrate that our model achieves a 17.7% increase at Recall@100 compared to the second-best method on the patent retrieval task and achieves 64.5% at Micro-F1 in the patent classification task.

SK2: Integrating Implicit Sentiment Knowledge and Explicit Syntax Knowledge for Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) plays an indispensable role in web mining and retrieval system as it involves a wide range of tasks, including aspect term extraction, opinion term extraction, aspect sentiment classification, etc. Early works are merely applicable to a part of these tasks, leading to computation-unfriendly models and a pipeline framework. Recently, a unified framework has been proposed to learn all these ABSA tasks in an end-to-end fashion. Despite its versatility, its performance is still sub-optimal since ABSA tasks depend heavily on both sentiment and syntax knowledge, but existing task-specific knowledge integration methods are hardly applicable to such a unified framework. Therefore, we propose a brand-new unified framework for ABSA in this work, which incorporates both implicit sentiment knowledge and explicit syntax knowledge to better complete all ABSA tasks. To effectively incorporate implicit sentiment knowledge, we first design a self-supervised pre-training procedure that is general enough to all ABSA tasks. It consists of conjunctive words prediction (CWP) task, sentiment-word polarity prediction (SPP) task, attribute nouns prediction (ANP) task, and sentiment-oriented masked language modeling (SMLM) task. Empowered by the pre-training procedure, our framework acquires strong abilities in sentiment representation and sentiment understanding. Meantime, considering a subtle syntax variation can significantly affect ABSA, we further explore a sparse relational graph attention network (SR-GAT) to introduce explicit aspect-oriented syntax knowledge. By combining both worlds of knowledge, our unified model can better represent and understand the input texts towards all ABSA tasks. Extensive experiments show that our proposed framework achieves consistent and significant improvements on all ABSA tasks.

SPOT: Knowledge-Enhanced Language Representations for Information Extraction

Knowledge-enhanced pre-trained models for language representation have been shown to be more effective in knowledge base construction tasks (i.e.,~relation extraction) than language models such as BERT. These knowledge-enhanced language models incorporate knowledge into pre-training to generate representations of entities or relationships. However, existing methods typically represent each entity with a separate embedding. As a result, these methods struggle to represent out-of-vocabulary entities and a large amount of parameters, on top of their underlying token models (i.e., the transformer), must be used and the number of entities that can be handled is limited in practice due to memory constraints. Moreover, existing models still struggle to represent entities and relationships simultaneously. To address these problems, we propose a new pre-trained model that learns representations of both entities and relationships from token spans and span pairs in the text respectively. By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models. We pre-trained our model with the knowledge graph extracted from Wikipedia and test it on a broad range of supervised and unsupervised information extraction tasks. Results show that our model learns better representations for both entities and relationships than baselines, while in supervised settings, fine-tuning our model outperforms RoBERTa consistently and achieves competitive results on information extraction tasks.

Multi-agent Transformer Networks for Multimodal Human Activity Recognition

Human activity recognition has become an important challenge yet to resolve while also having promising benefits in various applications for years. Existing approaches have made great progress by applying deep-learning and attention-based methods. However, the deep learning-based approaches may not fully exploit the features to resolve multimodal human activity recognition tasks. Also, the potential of attention-based methods still has not been fully explored to better extract the multimodal spatial-temporal relationship and produce robust results. In this work, we propose Multi-agent Transformer Network (MATN), a multi-agent attention-based deep learning algorithm, to address the above issues in multimodal human activity recognition. We first design a unified representation learning layer to encode the multimodal data, which preprocesses the data in a generalized and efficient way. Then we develop a multimodal spatial-temporal transformer module that applies the attention mechanism to extract the salient spatial-temporal features. Finally, we use a multi-agent training module to collaboratively select the informative modalities and predict the activity labels. We have extensively conducted experiments to evaluate MATN's performance on two public multimodal human activity recognition datasets. The results show that our model has achieved competitive performance compared to the state-of-the-art approaches, which also demonstrates scalability, effectiveness, and robustness.

Frequent Itemset Mining with Local Differential Privacy

With the development of the Internet, a large amount of transaction data (e.g., shopping records, web browsing history), which represents user data, has been generated. By collecting user transaction data and learning specific patterns and association rules from it, service providers can provide better services. However, because of the increasing privacy awareness and the formulation of laws on data protection, collecting data directly from users will raise privacy concerns. The concept of local differential privacy (LDP), which provides strict data privacy protection on the user side and allows effective statistical analysis on the server side, is able to protect user privacy and perform statistics on sensitive issues at the same time. This paper adopts padding-and-sampling-based frequent oracle (PSFO), combined with an interactive query-response method satisfying local differential privacy, to identify frequent itemsets in an efficient and accurate way. Therefore, this paper proposes FIML, an improved algorithm for finding frequent itemsets in the LDP setting of transaction data. The data collector generates frequent candidate sets based on the results of the previous stage and uses them for querying, and users randomize their responses in a reduced domain to achieve local differential privacy. Extensive experiments on real-world and synthetic datasets show that the FIML algorithm can find frequent itemsets more efficiently with the same privacy protection and computational cost.

AdaDebunk: An Efficient and Reliable Deep State Space Model for Adaptive Fake News Early Detection

Automatically detecting fake news as early as possible becomes increasingly necessary. Conventional approaches of fake news early detection (FNED) verify news' veracity with a predefined and indiscriminate detection position, which depends on domain experience and leads to unstable performance. More advanced methods address this problem with a proposed concept of adaptive detection position (ADP), i.e. the position where the veracity of the news record can be concluded. Yet these methods either lack theoretical reliability or weaken complex dependencies among multi-aspect clues, thus failing to provide practical and reasonable detection. This work focuses on the adaptive FNED problem and proposes a novel efficient and reliable deep state space model, namely AdaDebunk, which models the complex probabilistic dependencies. Specifically, a Bayes' theorem-based dynamic inference algorithm is designed to infer the ADPs and veracity, supporting the accumulation of multi-aspect clues. Besides, a training mechanism with hybrid loss is also designed to solve the over-/under-fitting problems, which further trades off the performance and generalization ability. Experiments on two real-world fake news datasets are conducted to evaluate the effectiveness and superiority of AdaDebunk. Compared with the state-of-the-art baselines, AdaDebunk achieves a 10% increase in F1 performance. Meanwhile, a case study is provided to demonstrate the reliability of AdaDebunk as well as our research motivation.

Heterogeneous Graph Attention Network for Drug-Target Interaction Prediction

Identification of drug-target interactions (DTIs) is crucial for drug discovery and drug repositioning. Existing graph neural network (GNN) based methods only aggregate information from directly connected nodes restricted in a drug-related or a target-related network, and are incapable of capturing long-range dependencies in the biological heterogeneous graph. In this paper, we propose the heterogeneous graph attention network (HGAN) to capture the complex structures and rich semantics in the biological heterogeneous graph for DTI prediction. HGAN enhances heterogeneous graph structure learning from both the intra-layer perspective and the inter-layer perspective. Concretely, we develop an enhanced graph attention diffusion layer (EGADL), which efficiently builds connections between node pairs that may not be directly connected, enabling information passing from important nodes multiple hops away. By stacking multiple EGADLs, we further enlarge the receptive field from the inter-layer perspective. HGAN advances 15 state-of-the-art methods on two heterogeneous biological datasets, achieving the results near to 1 in terms of AUC and AUPR. We also find that enlarging receptive fields from the inter-layer perspective (stacking layers) is more effective than that from the intra-layer perspective (attention diffusion) for HGAN to achieve promising DTI prediction performances. The code is available at

℘-MinHash Algorithm for Continuous Probability Measures: Theory and Application to Machine Learning

This paper studies the scale-invariant "probability Jaccard'' (ProbJ), noted as ℐ, which is another variant of weighted Jaccard similarity. The standard and commonly used Jaccard index is not invariant of data scaling. Thus, the probability Jaccard can be a potentially useful extension to probability distributions. Before our paper, the problem of hashing the ℐ for continuous probability measures is an open problem, where rigorous definitions and analysis are still absent in literature. In our work, we solve this problem systematically and completely. Specifically, we formalize the definition of ℐ in continuous measure space, and propose a general ℘-MinHash sampling algorithm which generates samples following any target distribution, and preserves ℐ between two distributions by the hash collision. In addition, a refined early stopping rule is proposed under a practical boundedness assumption. We validate the theory through simulation and experiments, and demonstrate the application of our method in machine learning problems.

GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

We propose using "powered generalized min-max'' (pGMM) hashed (linearized) via the "generalized consistent weighted sampling'' (GCWS) for training (deep) neural networks (hence the name "GCWSNet''). The pGMM and several related kernels were proposed in 2017. We demonstrate that pGMM hashed by GCWS provide a numerically stable scheme for applying power transformation on the original data, regardless of the magnitude of p and the data. Our experiments show that GCWSNet often improves the accuracy. It is also evident that GCWSNet converges substantially faster, reaching reasonable accuracy with merely one epoch of the training process. This property is much desired because many applications, such as advertisement click-through rate (CTR) prediction models, or data streams (i.e., data seen only once), often train just one epoch. Another beneficial side effect is that the computations of the first layer of the neural networks become additions instead of multiplications because the input data become binary and highly sparse.

Empirical comparisons with (normalized) random Fourier features (NRFF) are provided. We also propose to reduce the model size of GCWSNet by count-sketch and develop the theory for analyzing the impact of using count-sketch on the accuracy of GCWS. Our analysis shows that an "8-bit'' strategy should provide the good trade-off between accuracy and model size. There are other ways to take advantage of GCWS. For example, one can apply GCWS on the last layer to boost the accuracy of trained deep neural nets.

Gromov-Wasserstein Guided Representation Learning for Cross-Domain Recommendation

Cross-Domain Recommendation (CDR) has attracted increasing attention in recent years as a solution to the data sparsity issue. The fundamental paradigm of prior efforts is to train a mapping function based on the overlapping users/items and then apply it to the knowledge transfer. However, due to the commercial privacy policy and the sensitivity of user data, it is unrealistic to explicitly share the user mapping relations and behavior data. Therefore, in this paper, we consider a more practical cross-domain scenario, where there is no explicit overlap between the source and target domains in terms of users/items. Since the user sets of both domains are drawn from the entire population, there may be commonalities between their user characteristics, resulting in comparable user preference distributions. Thus, without the mapping relations at user level, it is feasible to model this distribution-level relation to transfer knowledge between domains. To this end, we propose a novel framework that improves the effect of representation learning on the target domain by aligning the representation distributions between the source and target domains. In addition, GWCDR can be easily integrated with existing single-domain collaborative filtering methods to achieve cross-domain recommendation. Extensive experiments on two pairs of public bidirectional datasets demonstrate the effectiveness of our proposed framework in enhancing the recommendation performance.

Spatiotemporal-aware Session-based Recommendation with Graph Neural Networks

Session-based recommendation (SBR) aims to recommend items based on user behaviors in a session. For the online life service platforms, such as Meituan, both the user's location and the current time primarily cause the different patterns and intents in user behaviors. Hence, spatiotemporal context plays a significant role in the recommendation on those platforms, which motivates an important problem of spatiotemporal-aware session-based recommendation (STSBR). Since the spatiotemporal context is introduced, there are two critical challenges: 1) how to capture session-level relations of spatiotemporal context (inter-session view), and 2) how to model the complex user decision-making process at a specific location and time (intra-session view). To address them, we propose a novel solution named STAGE in this paper. Specifically, STAGE first constructs a global information graph to model the multi-level relations among all sessions, and a session decision graph to capture the complex user decision process for each session. STAGE then performs inter-session and intra-session embedding propagation on the constructed graphs with the proposed graph attentive convolution (GAC) to learn representations from the above two perspectives. Finally, the learned representations are combined with spatiotemporal-aware soft-attention for final recommendation. Extensive experiments on two datasets from Meituan demonstrate the superiority of STAGE over state-of-the-art methods. Further studies also verify that each component is effective.

Dynamic Network Embedding via Temporal Path Adjacency Matrix Factorization

Network embedding has been widely investigated to learn low dimensional nodes representation of networks, and serves for many downstream machine learning tasks. Previous network embedding studies mainly focus on static networks, and cannot adapt well to the characteristics of dynamic networks which are evolving over time. Some works on dynamic network embedding have tried to improve the computation efficiency for incremental updates of embedding vectors, while others have made efforts to utilize temporal information to enhance the quality of embedding vectors. However, few existing works can fulfill both efficiency and quality requirements. In this article, a novel dynamic network embedding model named TPANE (Temporal Path Adjacency Matrix based Network Embedding) is proposed. It employs a new network proximity measure: Temporal Path Adjacency, which is capable of capturing the temporal dependency between edges as well as being incrementally computed in an efficient way. It evaluates the similarity between nodes via the count of temporal paths between them, rather than making random sampling approximation, and adopts matrix factorization to obtain embedding vectors. Link prediction experiments on various real-world dynamic networks have been conducted to show the superior performance of TPANE against other state-of-the-art methods. Time consumption analysis also shows that TPANE is more efficient in incremental updates.

TrajFormer: Efficient Trajectory Classification with Transformers

Transformers have been an efficient alternative to recurrent neural networks in many sequential learning tasks. When adapting transformers to modeling trajectories, we encounter two major issues. First, being originally designed for language modeling, transformers assume regular intervals between input tokens, which contradicts the irregularity of trajectories. Second, transformers often suffer high computational costs, especially for long trajectories. In this paper, we address these challenges by presenting a novel transformer architecture entitled TrajFormer. Our model first generates continuous point embeddings by jointly considering the input features and the information of spatio-temporal intervals, and then adopts a squeeze function to speed up the representation learning. Moreover, we introduce an auxiliary loss to ease the training of transformers using the supervision signals provided by all output tokens. Extensive experiments verify that our TrajFormer achieves a preferable speed-accuracy balance compared to existing approaches.

Quantifying and Mitigating Popularity Bias in Conversational Recommender Systems

Conversational recommender systems (CRS) have shown great success in accurately capturing a user's current and detailed preference through the multi-round interaction cycle while effectively guiding users to a more personalized recommendation. Perhaps surprisingly, conversational recommender systems can be plagued by popularity bias, much like traditional recommender systems. In this paper, we systematically study the problem of popularity bias in CRSs. We demonstrate the existence of popularity bias in existing state-of-the-art CRSs from an exposure rate, a success rate, and a conversational utility perspective, and propose a suite of popularity bias metrics designed specifically for the CRS setting. We then introduce a debiasing framework with three unique features: (i) Popularity-Aware Focused Learning, to reduce the popularity-distorting impact on preference prediction; (ii) Cold-Start Item Embedding Reconstruction via Attribute Mapping, to improve the modeling of cold-start items; and (iii) Dual-Policy Learning, to better guide the CRS when dealing with either popular or unpopular items. Through extensive experiments on two frequently used CRS datasets, we find the proposed model-agnostic debiasing framework not only mitigates the popularity bias in state-of-the-art CRSs but also improves the overall recommendation performance.

Cascade Variational Auto-Encoder for Hierarchical Disentanglement

While deep generative models pave the way for many emerging applications, decreased interpretability for larger model sizes and complexities hinders their generalizability to wide domains such as economy, security, healthcare, etc. Considering this obstacle, a common practice is to learn interpretable representations through latent feature disentanglement, aiming for exposing a set of mutually independent factors of data variations. However, existing methods either fail to catch the trade-off between the synthetic data quality and model interpretability, or consider the first-order feature disentangling only, overlooking the fact that a subset of salient features can carry decomposable semantic meanings and hence be of high-order in nature. Hence, we in this paper propose a novel generative modeling paradigm by introducing a Bayesian network-based regularize on a cascade Variational Auto-Encoder (VAE). Specifically, this regularizer guides the learner to discover a representation space that comprises both first-order disentangled features and high-order salient features, with the feature interplay captured by the Bayesian structure. Experiments demonstrate that this regularizer gives us free control over the representation space and can guide the learner to discover decomposable semantic meanings by capturing the interplay among independent factors. Meanwhile, we benchmark extensive experiments on six widely-used vision datasets, and the results exhibit that our approach outperforms the state-of-the-art VAE competitors in terms of the trade-off between the synthetic data quality and model interpretability. Although our design is framed in the VAE regime, it in effect is generic and can be better amenable to both GANs and VAEs in terms of letting them concurrently enjoy both high model interpretability and high synthesis quality.

High-quality Task Division for Large-scale Entity Alignment

Entity Alignment (EA) aims to match equivalent entities that refer to the same real-world objects and is a key step for Knowledge Graph (KG) fusion. Most neural EA models cannot be applied to large-scale real-life KGs due to their excessive consumption of GPU memory and time. One promising solution is to divide a large EA task into several subtasks such that each subtask only needs to match two small subgraphs of the original KGs. However, it is challenging to divide the EA task without losing effectiveness. Existing methods display low coverage of potential mappings, insufficient evidence in context graphs, and largely differing subtask sizes.

In this work, we design the DivEA framework for large-scale EA with high-quality task division. To include in the EA subtasks a high proportion of the potential mappings originally present in the large EA task, we devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models. Unique to our counterpart discovery method is the explicit modelling of the chance of a potential mapping. We also introduce an evidence passing mechanism to quantify the informativeness of context entities and find the most informative context graphs with flexible control of the subtask size. Extensive experiments show that DivEA achieves higher EA performance than alternative state-of-the-art solutions.

Predicting Intraoperative Hypoxemia with Hybrid Inference Sequence Autoencoder Networks

We present an end-to-end model using streaming physiological time series to predict near-term risk for hypoxemia, a rare, but life-threatening condition known to cause serious patient harm during surgery. Inspired by the fact that a hypoxemia event is defined based on a future sequence of low SpO2 (i.e., blood oxygen saturation) instances, we propose the hybrid inference network (hiNet) that makes hybrid inference on both future low SpO2 instances and hypoxemia outcomes. hiNet integrates 1) a joint sequence autoencoder that simultaneously optimizes a discriminative decoder for label prediction, and 2) two auxiliary decoders trained for data reconstruction and forecast, which seamlessly learn contextual latent representations that capture the transition from present states to future states. All decoders share a memory-based encoder that helps capture the global dynamics of patient measurement. For a large surgical cohort of 72,081 surgeries at a major academic medical center, our model outperforms strong baselines including the model used by the state-of-the-art hypoxemia prediction system. With its capability to make real-time predictions of near-term hypoxemic at clinically acceptable alarm rates, hiNet shows promise in improving clinical decision making and easing burden of perioperative care.

Task Assignment with Federated Preference Learning in Spatial Crowdsourcing

Spatial Crowdsourcing (SC) is ubiquitous in the online world today. As we have transitioned from crowdsourcing applications (e.g., Wikipedia) to SC applications (e.g., Uber), there is a substantial precedent that SC systems have a responsibility not only to effective task assignment but also to privacy protection. To address these often-conflicting responsibilities, we propose a framework, Task Assignment with Federated Preference Learning, which performs task assignment based on worker preferences while keeping the data decentralized and private in each platform center (e.g., each delivery center of an SC company). The framework includes two phases, i.e., a federated preference learning and a task assignment phase. Specifically, in the first phase, we design a local preference model for each platform center based on historical data. Meanwhile, the horizontal federated learning with a client-server structure is introduced to collaboratively train these local preference models under the orchestration of a central server. The task assignment phase aims to achieve effective and efficient task assignment by considering workers' preferences. Extensive evaluations over real data show the effectiveness and efficiency of the paper's proposals.

DA-Net: Distributed Attention Network for Temporal Knowledge Graph Reasoning

Predicting future events in dynamic knowledge graphs has attracted significant attention. Existing work models the historical information in a holistic way, which achieves satisfactory performance. However, in real-world scenarios, the influence of historical information on future events is changing over time. Therefore, it is difficult to distinguish the historical information of different roles by invariably embedding historical entities with simple vector stacking. Furthermore, it is laborious to explicitly learn a distributed representation of each historical repetitive fact at different timestamps. This poses a challenge to the widely adopted codec-based architectures. In this paper, we propose a novel model for predicting future events, namely Distributed Attention Network (DA-Net). Rather than obtaining the fixed representations of historical events, DA-Net attempts to learn the distributed attention of future events on repetitive facts at different historical timestamps inspired by human cognitive theory. In human cognitive theory, when humans make a decision, similar historical events are replayed during memory recall. Based on memory, the original intention is adjusted according to their recent knowledge developments, making the action more reasonable to the context. Experiments on four benchmark datasets demonstrate a substantial improvement of DA-Net on multiple evaluation metrics.

Unsupervised Hierarchical Graph Pooling via Substructure-Sensitive Mutual Information Maximization

Graph pooling plays a vital role in learning graph embeddings. Due to the lack of label information, unsupervised graph pooling has received much attention, primarily via mutual information (MI). However, most existing MI-based pooling methods only preserve node features while overlooking the hierarchical substructural information. In this paper, we propose SMIP, a novel unsupervised hierarchical graph pooling method based on substructure-sensitive MI maximization. SMIP reconstructs a hard-style substructure encoder based on cluster-based pooling paradigm, and trains it with two substructure-sensitive MI-based objectives, i.e., node-substructure MI and node-node MI. The node-substructure MI guides to transfer maximum node feature information into corresponded substructures and the node-node MI guarantees a more accurate node allocation. Moreover, to avoid extra computation of augmented graphs and prevent noise information during MI estimation, we propose a local-scope contrastive MI estimation method, making SMIP more potent in capturing intrinsic features of the input graph. Experiments on six benchmark graph classification datasets demonstrate that our hierarchical deep learning approach outperforms all state-of-the-art unsupervised GNN-based methods and even surpasses the performance of nine supervised ones. Generalization study shows that the proposed substructure-sensitive MI objective can be successfully embedded into other cluster-based pooling methods to improve their performance.

Efficient Learning with Pseudo Labels for Query Cost Estimation

Query cost estimation, which is to estimate the query plan cost and query execution cost, is of utmost importance to query optimizers. Query plan cost estimation heavily relies on accurate cardinality estimation, and query execution cost estimation gives good hints on query latency, both of which are challenging in database management systems. Despite decades of research, existing studies either over-simplify the models only using histograms and polynomial calculation that leads to inaccurate estimates, or over-complicate them by using cumbersome neural networks with the requirements for large amounts of training data hence poor computational efficiency. Besides, most of the studies ignore the diversity of query plan structures. In this work, we propose a plan-based query cost estimation framework, called Saturn, which can e<u>S</u>timate c<u>a</u>rdinality and la<u>t</u>ency acc<u>ur</u>ately and efficie<u>n</u>tly, for any query plan structures. Saturn first encodes each query plan tree into a compressed vector by using a traversal-based query plan autoencoder to cope with diverse plan structures. The compressed vectors can be leveraged to distinguish different query types, which is highly useful for downstream tasks. Then a pseudo label generator is designed to acquire all cardinality and latency labels with the execution part of the query plans in the training workload, which can significantly reduce the overhead of collecting the real cardinality and latency labels. Finally, a chain-wise transfer learning module is proposed to estimate the cardinality and latency of the query plan in a pipeline paradigm, which further enhances the efficiency. An extensive empirical study on benchmark data offers evidence that Saturn outperforms the state-of-the-art proposals in terms of accuracy, efficiency, and generalizability for query cost estimation.

HeGA: Heterogeneous Graph Aggregation Network for Trajectory Prediction in High-Density Traffic

Trajectory prediction enables the fast and accurate response of autonomous driving navigation in complex and dense traffics. In this paper, we present a novel trajectory prediction network called <u>He</u>terogeneous <u>G</u>raph <u>A</u>ggregation (HeGA) for high-density heterogeneous traffic, where the traffic agents of various categories interact densely with each other. To predict the trajectory of a target agent, HeGA first automatically selects neighbors that interact with it by our proposed adaptive neighbor selector, and then aggregates their interactions based on a novel two-phase aggregation transformer block. At last, the historical residual connection LSTM enhances the historical information awareness and decodes the spatial coordinates as the prediction results. Extensive experiments on real data demonstrate that the proposed network significantly outperforms the existing state-of-the-art competitors by over 27% on average displacement error (ADE) and over 31% on final displacement error (FDE). We also deploy HeGA in a state-of-the-art framework for autonomous driving, demonstrating its superior applicability based on three simulated environments with different densities and complexities.

I Know What You Do Not Know: Knowledge Graph Embedding via Co-distillation Learning

Knowledge graph (KG) embedding seeks to learn vector representations for entities and relations. Conventional models reason over graph structures, but they suffer from the issues of graph incompleteness and long-tail entities. Recent studies have used pre-trained language models to learn embeddings based on the textual information of entities and relations, but they cannot take advantage of graph structures. In the paper, we show empirically that these two kinds of features are complementary for KG embedding. To this end, we propose CoLE, a Co-distillation Learning method for KG Embedding that exploits the complementarity of graph structures and text information. Its graph embedding model employs Transformer to reconstruct the representation of an entity from its neighborhood subgraph. Its text embedding model uses a pre-trained language model to generate entity representations from the soft prompts of their names, descriptions and relational neighbors. To let the two models promote each other, we propose co-distillation learning that allows them to distill selective knowledge from each other's prediction logits. In our co-distillation learning, each model serves as both a teacher and a student. Experiments on benchmark datasets demonstrate that the two models outperform their related baselines, and the ensemble method CoLE with co-distillation learning advances the state-of-the-art of KG embedding.

Social Graph Transformer Networks for Pedestrian Trajectory Prediction in Complex Social Scenarios

Pedestrian trajectory prediction is essential for many modern applications, such as abnormal motion analysis and collision avoidance for improved traffic safety. Previous studies still face challenges in embracing high social interaction, dynamics, and multi-modality for achieving high accuracy with long-time predictions. We propose Social Graph Transformer Networks for multi-modal prediction of pedestrian trajectories, where we combine Graph Convolutional Network and Transformer Network by generating stable resolution pseudo-images from Spatio-temporal graphs through a designed stacking and interception method. Specifically, we adopt adjacency matrices to obtain Spatio-temporal features and Transformer for long-time trajectory predictions. As such, we retrain the advantages of both, i.e., the ability to aggregate information over an arbitrary number of neighbors and to conduct complex time-dependent data processing. Our experimental results show that our model reduces the final displacement error and achieves state-of-the-art in multiple metrics. The module's effectiveness is demonstrated through ablation experiments.

Improving Personality Consistency in Conversation by Persona Extending

Endowing chatbots with a consistent personality plays a vital role for agents to deliver human-like interactions. However, existing personalized approaches commonly generate responses in light of static predefined personas depicted with textual description, which may severely restrict the interactivity of human and the chatbot, especially when the agent needs to answer the query excluded in the predefined personas, which is so-called out-of-predefined persona problem (named OOP for simplicity). To alleviate the problem, in this paper we propose a novel retrieval-to-prediction paradigm consisting of two subcomponents, namely, (1) Persona Retrieval Model (PRM), it retrieves a persona from a global collection based on a Natural Language Inference (NLI) model, the inferred persona is consistent with the predefined personas; and (2) Posterior-scored Transformer (PS-Transformer), it adopts a persona posterior distribution that further considers the actual personas used in the ground response, maximally mitigating the gap between training and inferring. Furthermore, we present a dataset called IT-ConvAI2 that first highlights the OOP problem in personalized dialogue. Extensive experiments on both IT-ConvAI2 and ConvAI2 demonstrate that our proposed model yields considerable improvements in both automatic metrics and human evaluations.

Are Gradients on Graph Structure Reliable in Gray-box Attacks?

Graph edge perturbations are dedicated to damaging the prediction of graph neural networks by modifying the graph structure. Previous gray-box attackers employ gradients from the surrogate model to locate the vulnerable edges to perturb the graph structure. However, unreliability exists in gradients on graph structures, which is rarely studied by previous works. In this paper, we discuss and analyze the errors caused by the unreliability of the structural gradients. These errors arise from rough gradient usage due to the discreteness of the graph structure and from the unreliability in the meta-gradient on the graph structure. In order to address these problems, we propose a novel attack model with methods to reduce the errors inside the structural gradients. We propose edge discrete sampling to select the edge perturbations associated with hierarchical candidate selection to ensure computational efficiency. In addition, semantic invariance and momentum gradient ensemble are proposed to address the gradient fluctuation on semantic-augmented graphs and the instability of the surrogate model. Experiments are conducted in untargeted gray-box poisoning scenarios and demonstrate the improvement in the performance of our approach.

Learning Chinese Word Embeddings By Discovering Inherent Semantic Relevance in Sub-characters

Learning Chinese word embeddings is important in many tasks of Chinese language information processing, such as entity linking, entity extraction, and knowledge graph. A Chinese word consists of Chinese characters, which can be decomposed into sub-characters (radical, component, stroke, etc). Similar to roots in English words, sub-characters also indicate the origins and basic semantics of Chinese characters. So, many researches follow the approaches designed for learning embeddings of English words to improve Chinese word embeddings. However, some Chinese characters sharing the same sub-characters have different meanings. Furthermore, with more cultural interaction and the popularization of the Internet and web, many neologisms, such as transliterated loanwords and network terms, are emerging, which are only close to the pronunciation of their characters, but far from their semantics. Here, a tripartite weighted graph is proposed to model the semantic relationship among words, characters, and sub-characters, in which the semantic relationship is evaluated according to the Chinese linguistic information. So, the semantic relevance hidden in lower components (sub-characters, characters) can be used to further distinguish the semantics of corresponding higher components (characters, words). Then, the tripartite weighted graph is fed into our Chinese word embedding modelinsideCC to reveal the semantic relationship among different language components, and learn the embeddings of words. Extensive experimental results on multiple corpora and datasets verify that our proposed methods outperform the state-of-the-art counterparts by a significant margin.

Dual-Task Learning for Multi-Behavior Sequential Recommendation

Recently, sequential recommendation has become a research hotspot while multi-behavior sequential recommendation (MBSR) that exploits users' heterogeneous interactions in sequences has received relatively little attention. Existing works often overlook the complementary effect of different perspectives when addressing the MBSR problem. In addition, there are two specific challenges remained to be addressed. One is the heterogeneity of a user's intention and the context information, the other one is the sparsity of the interactions of target behavior. To release the potential of multi-behavior interaction sequences, we propose a novel framework named NextIP that adopts a dual-task learning strategy to convert the problem to two specific tasks, i.e., <u>next</u>-<u>i</u>tem prediction and <u>p</u>urchase prediction. For next-item prediction, we design a target-behavior aware context aggregator (TBCG), which utilizes the next behavior to guide all kinds of behavior-specific item sub-sequences to jointly predict the next item. For purchase prediction, we design a behavior-aware self-attention (BSA) mechanism to extract a user's behavior-specific interests and treat them as negative samples to learn the user's purchase preferences. Extensive experimental results on two public datasets show that our NextIP performs significantly better than the state-of-the-art methods.

HySAGE: A Hybrid Static and Adaptive Graph Embedding Network for Context-Drifting Recommendations

The recent popularity of edge devices and Artificial Intelligent of Things (AIoT) has driven a new wave of contextual recommendations, such as location based Point of Interest (PoI) recommendations and computing resource-aware mobile app recommendations. In many such recommendation scenarios, contexts are drifting over time. For example, in a mobile game recommendation, contextual features like locations, battery, and storage levels of mobile devices are frequently drifting over time. However, most existing graph-based collaborative filtering methods are designed under the assumption of static features. Therefore, they would require frequent retraining and/or yield graphical models burgeoning in sizes, impeding their suitability for context-drifting recommendations.

In this work, we propose a specifically tailor-made Hybrid Static and Adaptive Graph Embedding (HySAGE) network for context-drifting recommendations. Our key idea is to disentangle the relatively static user-item interaction and rapidly drifting contextual features. Specifically, our proposed HySAGE network learns a relatively static graph embedding from user-item interaction and an adaptive embedding from drifting contextual features. These embeddings are incorporated into an interest network to generate the user interest in some certain context. We adopt an interactive attention module to learn the interactions among static graph embeddings, adaptive contextual embeddings, and user interest, helping to achieve a better final representation. Extensive experiments on real-world datasets demonstrate that HySAGE significantly improves the performance of the existing state-of-the-art recommendation algorithms.

OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction

Click-through rate (CTR) prediction model usually consists of three components: embedding table, feature interaction layer, and classifier. Learning embedding table plays a fundamental role in CTR prediction from the view of the model performance and memory usage. The embedding table is a two-dimensional tensor, with its axes indicating the number of feature values and the embedding dimension, respectively. To learn an efficient and effective embedding table, recent works either assign various embedding dimensions for feature fields and reduce the number of embeddings respectively or mask the embedding table parameters. However, all these existing works cannot get an optimal embedding table. On the one hand, various embedding dimensions still require a large amount of memory due to the vast number of features in the dataset. On the other hand, decreasing the number of embeddings usually suffers from performance degradation, which is intolerable in CTR prediction. Finally, pruning embedding parameters will lead to a sparse embedding table, which is hard to be deployed. To this end, we propose an optimal embedding table learning framework OptEmbed, which provides a practical and general method to find an optimal embedding table for various base CTR models. Specifically, we propose pruning the redundant embeddings regarding corresponding features' importance by learnable pruning thresholds. Furthermore, we consider assigning various embedding dimensions as one single candidate architecture. To efficiently search the optimal embedding dimensions, we design a uniform embedding dimension sampling scheme to equally train all candidate architectures, meaning architecture-related parameters and learnable thresholds are trained simultaneously in one supernet. We then propose an evolution search method based on the supernet to find the optimal embedding dimensions for each field. Experiments on public datasets show that OptEmbed can learn a compact embedding table which can further improve the model performance.

Faithful Abstractive Summarization via Fact-aware Consistency-constrained Transformer

Abstractive summarization is a classic task in Natural Language Generation (NLG), which aims to produce a concise summary of the original document. Recently, great efforts have been made on sequence-to-sequence neural networks to generate abstractive sum- maries with a high level of fluency. However, prior arts mainly focus on the optimization of token-level likelihood, while the rich semantic information in documents has been largely ignored. In this way, the summarization results could be vulnerable to hallucinations, i.e., the semantic-level inconsistency between a summary and corresponding original document. To deal with this challenge, in this paper, we propose a novel fact-aware abstractive summarization model, named Entity-Relation Pointer Generator Network (ERPGN). Specially, we attempt to formalize the facts in original document as a factual knowledge graph, and then generate the high-quality summary via directly modeling consistency between summary and the factual knowledge graph. To that end, we first leverage two pointer net- work structures to capture the fact in original documents. Then, to enhance the traditional token-level likelihood loss, we design two extra semantic-level losses to measure the disagreement between a summary and facts from its original document. Extensive experi- ments on public datasets demonstrate that our ERPGN framework could outperform both classic abstractive summarization models and the state-of-the-art fact-aware baseline methods, with significant improvement in terms of faithfulness.

DEMO: Disentangled Molecular Graph Generation via an Invertible Flow Model

Molecular graph generation via deep generative models has attracted increasing attention. This is a challenging problem because it requires optimizing a given objective under a huge search space while obeying the chemical valence rules. Although recently developed molecular generation models have achieved promising results on generating novel, valid and unique molecules, few efforts have been made toward interpretable molecular graph generation. In this work, we propose DEMO, a flow-based model for <u>D</u>is<u>E</u>ntangled <u>M</u>olecular graph generati<u>O</u>n in a completely unsupervised manner, which is able to generate molecular graphs w.r.t. the learned disentangled latent factors that are relevant to molecular semantic features and interpretable structural patterns. Specifically, DEMO is composed of a VAE-encoder and a flow-generator. The VAE-encoder focuses on extracting global features of molecular graphs, and the flow-generator aims at disentangling these features to be corresponding to certain types of understandable molecular structure features while learning data distributions. To generate molecular graphs, DEMO simply runs the flow-generator in the reverse order due to the reversibility of the flow-based models. Extensive experimental results on two benchmark datasets demonstrate that DEMO outperforms the state-of-the-art methods in molecular generation, and takes the first step in interpretable molecular graph generation.

NEST: Simulating Pandemic-like Events for Collaborative Filtering by Modeling User Needs Evolution

We outline a simulation-based study of the effect rapid population-scale concept drifts have on Collaborative Filtering (CF) models. We create a framework for analyzing the effects of macro-trends in population dynamics on the behavior of such models. Our framework characterizes population-scale concept drifts in item preferences and provides a lens to understand the influence events, such as a pandemic, have on CF models. Our experimental results show the initial impact on CF performance at the initial stage of such events, followed by an aggravated population herding effect during the event. The herding introduces a popularity bias that may benefit affected users, but which comes at the expense of a normal user experience. We propose an adaptive ensemble method that can effectively apply optimal algorithms to cope with the change brought about by different stages of the event.

Towards Robust False Information Detection on Social Networks with Contrastive Learning

Constructing a robust conversation graph based false information detection model is crucial for real social platforms. Recently, graph neural network (GNN) methods for false information detection have achieved significant advances. However, we empirically find that slight perturbations in the conversation graph can cause the predictions of existing models to collapse. To address this problem, we present RDCL, a contrastive learning framework for false information detection on social networks, to obtain robust detection results. RDCL leverages contrastive learning to maximize the consistency between perturbed graphs from the same original graph and minimize the distance between perturbed and original graphs from the same class, forcing the model to improve resistance to data perturbations. Moreover, we prove the importance of hard positive samples for contrastive learning and propose a hard positive sample pairs generation method (HPG) for conversation graphs, which can generate stronger gradient signals to improve the contrastive learning effect and make the model more robust. Experiments on various GNN encoders and datasets show that RDCL outperforms the current state-of-the-art models.

Knowledge-Sensed Cognitive Diagnosis for Intelligent Education Platforms

Cognitive diagnosis is a fundamental issue of intelligent education platforms, whose goal is to reveal the mastery of students on knowledge concepts. Recently, certain efforts have been made to improve the diagnosis precision, by designing deep neural networks-based diagnostic functions or incorporating more rich context features to enhance the representation of students and exercises. However, how to interpretably infer the student's mastery over non-interactive knowledge concepts (i.e., knowledge concepts not related to his/her exercising records) still remains challenging, especially when not giving relations between knowledge concepts. To this end, we propose a Knowledge-Sensed Cognitive Diagnosis (KSCD) framework, aiming at learning intrinsic relations among knowledge concepts from student response logs and incorporating them for inferring students' mastery over all knowledge concepts in an end-to-end manner. Specifically, we firstly project students, exercises and knowledge concepts into embedding representation matrices, where the intrinsic relations among knowledge concepts are reflected in the knowledge embedding representation matrix. Then, the knowledge-sensed student knowledge mastery vector and exercise factor vectors are obtained by the multiply product of their embedding representations and the knowledge embedding representation matrix, which make the student's mastery of non-interactive knowledge concepts be interpretably inferred. Finally, we can utilize classical student-exercise interaction functions to predict student's exercising performance and jointly train the model. In additional, we also design a new function to better model the student-exercise interactions. Extensive experimental results on two real-world datasets clearly show the significant performance gain of our KSCD framework, especially in predicting students' mastery over non-interactive knowledge concepts, by comparing to state-of-the-art cognitive diagnosis models (CDMs).

MORN: Molecular Property Prediction Based on Textual-Topological-Spatial Multi-View Learning

Predicting molecular properties has significant implications for the discovery and generation of drugs and further research in the domain of medicinal chemistry. Learning representations of molecules plays a central role in deep learning-driven property prediction. However, the diversity of molecular features (e.g., chemical system languages, structure notations) brings inconsistency in molecular representation. Moreover, the scarcity of labeled molecular data limits the accuracy of the molecular property prediction model. To address the above issues, we proposed a two-stage method, named MORN, for learning molecular representations for molecular property prediction from a multi-view perspective. In the first stage, textual-topological-spatial multi-views were proposed to learn the molecular representations, so as to capture both chemical system language and structure notation features simultaneously. In the second stage, an adaptive strategy was used to fuse molecular representations learned from multi-views to predict molecular properties. To alleviate the limitation of the scarcity of labeled molecular data, the label restriction was introduced in both multi-view representation learning and fusion stages. The performance of MORN was assessed by seven benchmark molecular datasets and one self-built molecular dataset. Experimental results demonstrated that MORN is effective in molecular property prediction.

Scattered or Connected? An Optimized Parameter-efficient Tuning Approach for Information Retrieval

Pre-training and fine-tuning have achieved significant advances in the information retrieval (IR). A typical approach is to fine-tune all the parameters of large-scale pre-trained models (PTMs) on downstream tasks. As the model size and the number of tasks increase greatly, such approach becomes less feasible and prohibitively expensive. Recently, a variety of parameter-efficient tuning methods have been proposed in natural language processing (NLP) that only fine-tune a small number of parameters while still attaining strong performance. Yet there has been little effort to explore parameter-efficient tuning for IR.

In this work, we first conduct a comprehensive study of existing parameter-efficient tuning methods at both the retrieval and re-ranking stages. Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1% of the original model parameters. More importantly, we find that the existing methods are just parameter-efficient, but not learning-efficient as they suffer from unstable training and slow convergence. To analyze the underlying reason, we conduct a theoretical analysis and show that the separation of the inserted trainable modules makes the optimization difficult. To alleviate this issue, we propose to inject additional modules alongside the pre-trained models (PTMs) to make the original scattered modules connected. In this way, all the trainable modules can form a pathway to smooth the loss surface and thus help stabilize the training process. Experiments at both retrieval and re-ranking stages show that our method outperforms existing parameter-efficient methods significantly, and achieves comparable or even better performance over full fine-tuning.

Hierarchical Spatio-Temporal Graph Neural Networks for Pandemic Forecasting

The spread of COVID-19 throughout the world has led to cataclysmic consequences on the global community, which poses an urgent need to accurately understand and predict the trajectories of the pandemic. Existing research has relied on graph-structured human mobility data for the task of pandemic forecasting. To perform pandemic forecasting of COVID-19 in the United States, we curate Large-MG, a large-scale mobility dataset that contains 66 dynamic mobility graphs, with each graph having over 3k nodes and an average of 540k edges. One drawback with existing Graph Neural Networks (GNNs) for pandemic forecasting is that they generally perform information propagation in a flat way and thus ignore the inherent community structure in a mobility graph. To bridge this gap, we propose a Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to perform pandemic forecasting, which learns both spatial and temporal information from a sequence of dynamic mobility graphs. HiSTGNN consists of two network architectures. One is a hierarchical graph neural network (HiGNN) that constructs a two-level neural architecture: county-level and region-level, and performs information propagation in a hierarchical way. The other network architecture is a Transformer-based model that captures the temporal dynamics among the sequence of learned node representations from HiGNN. Additionally, we introduce a joint learning objective to further optimize HiSTGNN. Extensive experiments have demonstrated HiSTGNN's superior predictive power of COVID-19 new case/death counts compared with state-of-the-art baselines.

Adaptive Re-Ranking with a Corpus Graph

Search systems often employ a re-ranking pipeline, wherein documents (or passages) from an initial pool of candidates are assigned new ranking scores. The process enables the use of highly-effective but expensive scoring functions that are not suitable for use directly in structures like inverted indices or approximate nearest neighbour indices. However, re-ranking pipelines are inherently limited by the recall of the initial candidate pool; documents that are not identified as candidates for re-ranking by the initial retrieval function cannot be identified. We propose a novel approach for overcoming the recall limitation based on the well-established clustering hypothesis. Throughout the re-ranking process, our approach adds documents to the pool that are most similar to the highest-scoring documents up to that point. This feedback process adapts the pool of candidates to those that may also yield high ranking scores, even if they were not present in the initial pool. It can also increase the score of documents that appear deeper in the pool that would have otherwise been skipped due to a limited re-ranking budget. We find that our Graph-based Adaptive Re-ranking (GAR) approach significantly improves the performance of re-ranking pipelines in terms of precision- and recall-oriented measures, is complementary to a variety of existing techniques (e.g., dense retrieval), is robust to its hyperparameters, and contributes minimally to computational and storage costs. For instance, on the MS MARCO passage ranking dataset, GAR can improve the nDCG of a BM25 candidate pool by up to 8% when applying a monoT5 ranker.

Jointly Contrastive Representation Learning on Road Network and Trajectory

Road network and trajectory representation learning are essential for traffic systems since the learned representation can be directly used in various downstream tasks (e.g., traffic speed inference, travel time estimation). However, most existing methods only contrast within the same scale, i.e., treating road network and trajectory separately, which ignores valuable inter-relations. In this paper, we aim to propose a unified framework that jointly learns the road network and trajectory representations end-to-end. We design domain-specific augmentations for road-road contrast and trajectory-trajectory contrast separately, i.e., road segment with its contextual neighbors and trajectory with its detour replaced and dropped alternatives, respectively. On top of that, we further introduce the road-trajectory cross-scale contrast to bridge the two scales by maximizing the total mutual information. Unlike the existing cross-scale contrastive learning methods on graphs that only contrast a graph and its belonging nodes, the contrast between road segment and trajectory is elaborately tailored via novel positive sampling and adaptive weighting strategies. We conduct prudent experiments based on two real-world datasets with four downstream tasks, demonstrating improved performance and effectiveness.

Cascade-based Echo Chamber Detection

Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints---i.e., social network structure and propagations of information---through a set of latent communities, characterized by a degree of echo-chamber behavior and by an opinion polarity. Specifically, echo chambers are modeled as communities that are permeable to pieces of information with similar ideological polarity, and impermeable to information of opposed leaning: this allows discriminating echo chambers from communities that lack a clear ideological alignment.

To learn the model parameters we propose a scalable, stochastic adaptation of the Generalized Expectation Maximization algorithm, that optimizes the joint likelihood of observing social connections and information propagation. Experiments on synthetic data show that our algorithm is able to correctly reconstruct ground-truth latent communities with their degree of echo-chamber behavior and opinion polarity. Experiments on real-world data about polarized social and political debates, such as the Brexit referendum or the COVID-19 vaccine campaign, confirm the effectiveness of our proposal in detecting echo chambers. Finally, we show how our model can improve accuracy in auxiliary predictive tasks, such as stance detection and prediction of future propagations.

Mining Reaction and Diffusion Dynamics in Social Activities

Large quantifies of online user activity data, such as weekly web search volumes, which co-evolve with the mutual influence of several queries and locations, serve as an important social sensor. It is an important task to accurately forecast the future activity by discovering latent interactions from such data, i.e., the ecosystems between each query and the flow of influences between each area. However, this is a difficult problem in terms of data quantity and complex patterns covering the dynamics. To tackle the problem, we propose FluxCube, which is an effective mining method that forecasts large collections of co-evolving online user activity and provides good interpretability. Our model is the expansion of a combination of two mathematical models: a reaction-diffusion system provides a framework for modeling the flow of influences between local area groups and an ecological system models the latent interactions between each query. Also, by leveraging the concept of physics-informed neural networks, FluxCube achieves high interpretability obtained from the parameters and high forecasting performance, together. Extensive experiments on real datasets showed that FluxCube outperforms comparable models in terms of the forecasting accuracy, and each component in FluxCube contributes to the enhanced performance. We then show some case studies that FluxCube can extract useful latent interactions between queries and area groups.

Network Aware Forecasting for eCommerce Supply Planning

A real world supply chain planning starts with the demand forecasting as a key input. In most scenarios, especially in fields like e-commerce where demand patterns are complex and are large scale, demand forecasting is done independent of supply chain constraints. There have been a plethora of methods, old and recent, for generating accurate forecasts. However, to the best of our knowledge, none of the methods take supply chain constraints into account during forecasting. In this paper, we are primarily interested in supply chain aware forecasting methods that does not impose any restrictions on demand forecasting process. We assume that the base forecasts follow a distribution from exponential family and are provided as input to supply chain planning by specifying the distribution form and parameters. With this in mind, following are the contributions of our paper. First, we formulate the supply chain aware forecast improvement of a base forecast as finding the game theoretically optimal parameters satisfying the supply chain constraints. Second, for regular distributions from exponential family, we show that this translates to projecting base forecast onto the (convex) set defined by supply constraints, which is at least as accurate as the base forecasts. Third, we note that using off the shelf convex solvers does not scale for large instances of supply chain, which is typical in e-commerce settings. We propose algorithms that scale better with problem size. We propose a general gradient descent based approach that works across different distributions from exponential family. We also propose a network flow based exact algorithm for Laplace distribution (which relates to mean absolute error, which is the most commonly used metric in forecasting). Finally, we substantiate the theoretical results with extensive experiments on a real life e-commerce data set as well as a range of synthetic data sets.

Domain-Agnostic Contrastive Representations for Learning from Label Proportions

We study the weak supervision learning problem of Learning from Label Proportions (LLP) where the goal is to learn an instance-level classifier using proportions of various class labels in a bag -- a collection of input instances that often can be highly correlated. While representation learning for weakly-supervised tasks is found to be effective, they often require domain knowledge. To the best of our knowledge, representation learning for tabular data (unstructured data containing both continuous and categorical features) are not studied. In this paper, we propose to learn diverse representations of instances within the same bags to effectively utilize the weak bag-level supervision. We propose a domain agnostic LLP method, called "Self Contrastive Representation Learning for LLP" (SelfCLR-LLP) that incorporates a novel self--contrastive function as an auxiliary loss to learn representations on tabular data for LLP. We show that diverse representations for instances within the same bags aid efficient usage of the weak bag-level LLP supervision. We evaluate the proposed method through extensive experiments on real-world LLP datasets from e-commerce applications to demonstrate the effectiveness of our proposed SelfCLR-LLP. In this paper, we propose to learn diverse representations of instances within the same bags to effectively utilize the weak bag-level supervision. We propose a domain agnostic LLP method, called "Self Contrastive Representation Learning for LLP" (SelfCLR-LLP) that incorporates a novel self--contrastive function as an auxiliary loss to learn representations on tabular data for LLP. We show that diverse representations for instances within the same bags aid efficient usage of the weak bag-level LLP supervision. We evaluate the proposed method through extensive experiments on real-world LLP datasets from e-commerce applications to demonstrate the effectiveness of our proposed SelfCLR-LLP.

Rationale Aware Contrastive Learning Based Approach to Classify and Summarize Crisis-Related Microblogs

Recent fashion of information propagation on Twitter makes the platform a crucial conduit for tactical data and emergency responses during disasters. However, the real-time information about crises is immersed in a large volume of emotional and irrelevant posts. It brings the necessity to develop an automatic tool to identify disaster-related messages and summarize the information for data consumption and situation planning. Besides, explainability of the methods is crucial in determining their applicability in real-life scenarios. Recent studies also highlight the importance of learning a good latent representation of tweets for several downstream tasks. In this paper, we take advantage of state-of-the-art methods, such as transformers and contrastive learning to build an interpretable classifier. Our proposed model classifies Twitter messages into different humanitarian categories and also extracts rationale snippets as supporting evidence for output decisions. The contrastive learning framework helps to learn better representations of tweets by bringing the related tweets closer in the embedding space. Furthermore, we employ classification labels and rationales to efficiently generate summaries of crisis events. Extensive experiments over different crisis datasets show that (i). our classifier obtains the best performance-interpretability trade-off, (ii). the proposed summarizer shows superior performance (1.4%-22% improvement) with significantly less computation cost than baseline models.

Automatic Meta-Path Discovery for Effective Graph-Based Recommendation

Heterogeneous Information Networks (HINs) are labeled graphs that depict relationships among different types of entities (e.g., users, movies and directors). For HINs,meta-path-based recommenders (MPRs) utilize meta-paths (i.e., abstract paths consisting of node and link types) to predict user preference, and have attracted a lot of attention due to their explainability and performance. We observe that the performance of MPRs is highly sensitive to the meta-paths they use, but existing works manually select the meta-paths from many possible ones. Thus, to discover effective meta-paths automatically, we propose the Reinforcement learning-based Meta-path Selection (RMS) framework. Specifically, we define a vector encoding for meta-paths and design a policy network to extend meta-paths. The policy network is trained based on the results of downstream recommendation tasks and an early stopping approximation strategy is proposed to speed up training. (RMS) is a general model, and it can work with all existing MPRs. We also propose a new MPR called RMS-HRec, which uses an attention mechanism to aggregate information from the meta-paths. We conduct extensive experiments on real datasets. Compared with the manually selected meta-paths, the meta-paths identified by (RMS) consistently improve recommendation quality. Moreover, RMS-HRec outperforms state-of-the-art recommender systems by an average of 7% in hit ratio. The codes and datasets are available on

MetaTrader: An Reinforcement Learning Approach Integrating Diverse Policies for Portfolio Optimization

Portfolio management is a fundamental problem in finance. It involves periodic reallocations of assets to maximize the expected returns within an appropriate level of risk exposure. Deep reinforcement learning (RL) has been considered a promising approach to solving this problem owing to its strong capability in sequential decision making. However, due to the non-stationary nature of financial markets, applying RL techniques to portfolio optimization remains a challenging problem. Extracting trading knowledge from various expert strategies could be helpful for agents to accommodate the changing markets. In this paper, we propose MetaTrader, a novel two-stage RL-based approach for portfolio management, which learns to integrate diverse trading policies to adapt to various market conditions. In the first stage, MetaTrader incorporates an imitation learning objective into the reinforcement learning framework. Through imitating different expert demonstrations, MetaTrader acquires a set of trading policies with great diversity. In the second stage, MetaTrader learns a meta-policy to recognize the market conditions and decide on the most proper learned policy to follow. We evaluate the proposed approach on three real-world index datasets and compare it to state-of-the-art baselines. The empirical results demonstrate that MetaTrader significantly outperforms those baselines in balancing profits and risks. Furthermore, thorough ablation studies validate the effectiveness of the components in the proposed approach.

Rank List Sensitivity of Recommender Systems to Interaction Perturbations

Prediction models can exhibit sensitivity with respect to training data: small changes in the training data can produce models that assign conflicting predictions to individual data points during test time. In this work, we study this sensitivity in recommender systems, where users' recommendations are drastically altered by minor perturbations in other unrelated users' interactions. We introduce a measure of stability for recommender systems, called Rank List Sensitivity (RLS), which measures how rank lists generated by a given recommender system at test time change as a result of a perturbation in the training data. We develop a method, CASPER, which uses cascading effect to identify the minimal and systematical perturbation to induce higher instability in a recommender system. Experiments on four datasets show that recommender models are overly sensitive to minor perturbations introduced randomly or via CASPER - even perturbing one random interaction of one user drastically changes the recommendation lists of all users.Importantly, with CASPER perturbation, the models generate more unstable recommend ations for low-accuracy users (i.e., those who receive low-quality recommendations) than high-accuracy ones.

Asymmetrical Context-aware Modulation for Collaborative Filtering Recommendation

Modern learnable collaborative filtering recommendation models generate user and item representations by deep learning methods (e.g. graph neural networks) for modeling user-item interactions. However, most of them may still have unsatisfied performances due to two issues. Firstly, some models assume that the representations of users or items are fixed when modeling interactions with different objects. However, a user may have different interests in different items, and an item may also have different attractions to different users. Thus the representations of users and items should depend on their contexts to some extent. Secondly, existing models learn representations for user and item by symmetrical dual methods which have identical or similar operations. Symmetrical methods may fail to sufficiently and reasonably extract the features of user and item as their interaction data have diverse semantic properties. To address the above issues, a novel model called Asymmetrical context-awaRe modulation for collaBorative filtering REcommendation (ARBRE) is proposed. It adopts simplified GNNs on collaborative graphs to capture homogeneous user preferences and item attributes, then designs two asymmetrical context-aware modulation models to learn dynamic user interests and item attractions, respectively. The learned representations from user domain and item domain are input pair-wisely into 4 Multi-Layer Perceptrons in different combinations to model user-item interactions. Experimental results on three real-world datasets demonstrate the superiority of ARBRE over various state-of-the-arts.

Sequence Prediction under Missing Data: An RNN Approach without Imputation

Missing data scenarios are very common in ML applications in general and time-series/sequence applications are no exceptions. This paper pertains to a novel Recurrent Neural Network (RNN) based solution for sequence prediction under missing data. Our method is distinct from all existing approaches. It tries to encode the missingness patterns in the data directly without trying to impute data either before or during model building. Our encoding is lossless and achieves compression. It can be employed for both sequence classification and forecasting. We focus on forecasting here in a general context of multi-step prediction in presence of possible exogenous inputs. In particular, we propose novel variants of Encoder-Decoder (Seq2Seq) RNNs for this. The encoder here adopts the above mentioned pattern encoding, while at the decoder which has a different structure, multiple variants are feasible. We demonstrate the utility of our proposed architecture via multiple experiments on both single and multiple sequence (real) data-sets. We consider both scenarios where (i)data is naturally missing and (ii)data is synthetically masked.

Analysis of Knowledge Transfer in Kernel Regime

Knowledge transfer is shown to be a very successful technique for training neural classifiers: together with the ground truth data, it uses the "privileged information" (PI) obtained by a "teacher" network to train a "student" network. It has been observed that classifiers learn much faster and more reliably via knowledge transfer. However, there has been little or no theoretical analysis of this phenomenon. To bridge this gap, we propose to approach the problem of knowledge transfer by regularizing the fit between the teacher and the student with PI provided by the teacher. Using tools from dynamical systems theory, we show that when the student is an extremely wide two layer network, we can analyze it in the kernel regime and show that it is able to interpolate between PI and the given data. This characterization sheds new light on the relation between the training error and capacity of the student relative to the teacher. Another contribution of the paper is a quantitative statement on the convergence of student network. We prove that the teacher reduces the number of required iterations for a student to learn, and consequently improves the generalization power of the student. We give corresponding experimental analysis that validates the theoretical results and yield additional insights.

SVD-GCN: A Simplified Graph Convolution Paradigm for Recommendation

With the tremendous success of Graph Convolutional Networks (GCNs), they have been widely applied to recommender systems and have shown promising performance. However, most GCN-based methods rigorously stick to a common GCN learning paradigm and suffer from two limitations: (1) the limited scalability due to the high computational cost and slow training convergence; (2) the notorious over-smoothing issue which reduces performance as stacking graph convolution layers. We argue that the above limitations are due to the lack of a deep understanding of GCN-based methods. To this end, we first investigate what design makes GCN effective for recommendation. By simplifying LightGCN, we show the close connection between GCN-based and low-rank methods such as Singular Value Decomposition (SVD) and Matrix Factorization (MF), where stacking graph convolution layers is to learn a low-rank representation by emphasizing (suppressing) components with larger (smaller) singular values. Based on this observation, we replace the core design of GCN-based methods with a flexible truncated SVD and propose a simplified GCN learning paradigm dubbed SVD-GCN, which only exploits K-largest singular vectors for recommendation. To alleviate the over-smoothing issue, we propose a renormalization trick to adjust the singular value gap, resulting in significant improvement. Extensive experiments on three real-world datasets show that our proposed SVD-GCN not only significantly outperforms state-of-the-arts but also achieves over 100x and 10x speedups over LightGCN and MF, respectively.

X-MOL: Explainable AI for Molecular Analysis

Prediction on graphs is an important task used in domains such as social networks, recommender systems and molecular analysis. However, most machine learning models for graph prediction and classification---especially recent deep learning models on graphs---are hardly explainable, making it difficult to understand why certain results are generated and why the model believes they present certain desired properties. Understanding the why behind the machine learning models is extremely important because scientific research not only cares about the know how, but also (or even more) cares about the know why. Good explanations can greatly help users or scientists to understand how the AI model works and thus to derive insights and build trust over the model. In this paper, we introduce an Explainable Counterfactual Reinforcement Learning (XCRL) framework, which is a reinforcement learning (RL) framework that uses counterfactual reasoning to augment graph prediction models and create human understandable explanations. The framework is able to generate pre-hoc explanations and is applicable to explain any graph prediction model including recent graph neural network models. The exploration and step-by-step process of the RL model naturally lends itself towards explainability, meanwhile, counterfactuals have been proven to be a powerful tool to explore datasets and also generate explanations. Thus, our framework is able to integrate the explainability of RL with the reasoning and understandability of counterfactuals to increase both accuracy and understandability of the graph classification and prediction results.

Malicious Repositories Detection with Adversarial Heterogeneous Graph Contrastive Learning

GitHub, as the largest social coding platform, has attracted an increasing number of cybercriminals to disseminate malware by posting malicious code repositories. To address the imminent problem, some tools were developed to detect malicious repositories based on the code content. However, most of them ignore the rich relational information among repositories and usually require abundant labeled data to train the model. To this end, one effective way is to exploit unlabeled data to pre-train a model which considers both structural relation and code content of repositories, and further transfer the pre-trained model to the downstream tasks with labeled repository data. In this paper, we propose a novel model adversarial contrastive learning on heterogeneous graph (CLA-HG) to detect malicious repository in GitHub. First of all, CLA-HG builds a heterogeneous graph (HG) to comprehensively model repository data. Afterwards, to exploit unlabeled information in HG, CLA-HG introduces a dual-stream graph contrastive learning mechanism that distinguishes both adversarial subgraph pairs and standard subgraph pairs to pre-train graph neural networks using unlabeled data. Finally, the pre-trained model is fine-tuned to the downstream malicious repository detection task enhanced by a knowledge distillation (KD) module. Extensive experiments on two collected datasets from GitHub demonstrate the effectiveness of CLA-HG in comparison with state-of-the-art methods and popular commercial anti-malware products.

A Multi-Interest Evolution Story: Applying Psychology in Query-based Recommendation for Inferring Customer Intention

The query-based recommendation now is becoming a basic research topic in the e-commerce scenario. Generally, given a query that a user typed, it aims to provide a set of items that the user may be interested in. In this task, the customer intention (i.e., browsing or purchase) is an important factor to configure the corresponding recommendation strategy for better shopping experiences (i.e., providing diverse items when the user prefers to browse or recommending specific items when detecting the user is willing to purchase). Though necessary, this is usually overlooked in previous works. In addition, the diversity and evolution of user interests also bring challenges to inferring user intentions correctly.

In this paper, we propose a predecessor task to infer two important customer intentions, which are purchasing and browsing respectively, and we introduce a novel Psychological Intention Prediction Model (PIPM for short) to address this issue. Inspired by cognitive psychology, we first devise a multi-interest extraction module to adaptively extract interests from the user-item interaction sequence. After this, we design an interest evolution layer to model the evolution of the mined multiple interests. Finally, we aggregate all evolved multiple interests to infer users' intentions in his/her next visit. Extensive experiments are conducted on a large-scale Taobao industrial dataset. The results demonstrate that PIPM gains a significant improvement on AUC and GAUC than state-of-the-art baselines. Notably, PIPM has been deployed on the Taobao e-commerce platform and obtained over 10% improvement on PCTR.

Reinforced Continual Learning for Graphs

Graph Neural Networks (GNNs) have become the backbone for a myriad of tasks pertaining to graphs and similar topological data structures. While many works have been established in domains related to node and graph classification/regression tasks, they mostly deal with a single task. Continual learning on graphs is largely unexplored and existing graph continual learning approaches are limited to the task-incremental learning scenarios. This paper proposes a graph continual learning strategy that combines the architecture-based and memory-based approaches. The structural learning strategy is driven by reinforcement learning, where a controller network is trained in such a way to determine an optimal number of nodes to be added/pruned from the base network when new tasks are observed, thus assuring sufficient network capacities. The parameter learning strategy is underpinned by the concept of Dark Experience replay method to cope with the catastrophic forgetting problem. Our approach is numerically validated with several graph continual learning benchmark problems in both task-incremental learning and class-incremental learning settings. Compared to recently published works, our approach demonstrates improved performance in both the settings. The implementation code can be found at

RSD: A Reinforced Siamese Network with Domain Knowledge for Early Diagnosis

The availability of electronic health record data makes it possible to develop automatic disease diagnosis approaches. In this paper, we study the early diagnosis of diseases. As being a difficult task (even for experienced doctors), early diagnosis of diseases poses several challenges that are not well solved by prior studies, including insufficient training data, dynamic and complex signs of complications and trade-off between earliness and accuracy.

To address these challenges, we propose a <u>R</u>einforced <u>S</u>iamese network with <u>D</u>omain knowledge regularization approach, namely RSD, to achieve high performance for early diagnosis. The RSD approach consists of a diagnosis module and a control module. The diagnosis module adopts any EHR Encoder as a basic framework to extract representations, and introduces two improved training strategies. To overcome the insufficient sample problem, we design a Siamese network architecture to enhance the model learning. Furthermore, we propose a domain knowledge regularization strategy to guide the model learning with domain knowledge. Based on the diagnosis module, our control module learns to automatically determine whether making a disease alert to the patients based on the diagnosis results. Through carefully designed architecture, rewards and policies, it is able to effectively balance earliness and accuracy for diagnosis. Experimental results have demonstrated the effectiveness of our approach on both diagnosis prediction and early diagnosis. We also perform extensive analysis experiments to verify the robustness of the proposed approach.

Cross-Network Social User Embedding with Hybrid Differential Privacy Guarantees

Integrating multiple online social networks (OSNs) has important implications for many downstream social mining tasks, such as user preference modelling, recommendation, and link prediction. However, it is unfortunately accompanied by growing privacy concerns about leaking sensitive user information. How to fully utilize the data from different online social networks while preserving user privacy remains largely unsolved. To this end, we propose a Cross-network Social User Embedding framework, namely DP-CroSUE, to learn the comprehensive representations of users in a privacy-preserving way. We jointly consider information from partially aligned social networks with differential privacy guarantees. In particular, for each heterogeneous social network, we first introduce a hybrid differential privacy notion to capture the variation of privacy expectations for heterogeneous data types. Next, to find user linkages across social networks, we make unsupervised user embedding-based alignment in which the user embeddings are achieved by the heterogeneous network embedding technology. To further enhance user embeddings, a novel cross-network GCN embedding model is designed to transfer knowledge across networks through those aligned users. Extensive experiments on three real-world datasets demonstrate that our approach makes a significant improvement on user interest prediction tasks as well as defending user attribute inference attacks from embedding.

From Known to Unknown: Quality-aware Self-improving Graph Neural Network For Open Set Social Event Detection

State-of-the-art Graph Neural Networks (GNNs) have achieved tremendous success in social event detection tasks when restricted to a closed set of events. However, considering the large amount of data needed for training and the limited ability of a neural network in handling previously unknown data, it is hard for existing GNN-based methods to operate in an open set setting. To address this problem, we design a Quality-aware Self-improving Graph Neural Network (QSGNN) which extends the knowledge from known to unknown by leveraging the best of known samples and reliable knowledge transfer. Specifically, to fully exploit the labeled data, we propose a novel supervised pairwise loss with an additional orthogonal inter-class relation constraint to train the backbone GNN encoder. The learnt, already-known events further serve as strong reference bases for the unknown ones, which greatly prompts knowledge acquisition and transfer. When the model is generalized to unknown data, to ensure the effectiveness and reliability, we further leverage the reference similarity distribution vectors for pseudo pairwise label generation, selection and quality assessment. Following the diversity principle of active learning, our method selects diverse pair samples with the generated pseudo labels to fine-tune the GNN encoder. Besides, we propose a novel quality-guided optimization in which the contributions of pseudo labels are weighted based on consistency. Experimental results validate that our model achieves state-of-the-art results and extends well to unknown events.

Flow-based Perturbation for Cause-effect Inference

A new causal discovery method is introduced to solve the bivariate causal discovery problem. The proposed algorithm leverages the expressive power of flow-based models and tries to learn the complex relationship between two variables. Algorithms have been developed to infer the causal direction according to empirical perturbation errors obtained from an invertible flow-based function. Theoretical results as well as experimental studies are presented to verify the proposed approach. Empirical evaluations demonstrate that our proposed method could outperform baseline methods on both synthetic and real-world datasets.

Unbiased Learning to Rank with Biased Continuous Feedback

It is a well-known challenge to learn an unbiased ranker with biased feedback. Unbiased learning-to-rank(LTR) algorithms, which are verified to model the relative relevance accurately based on noisy feedback, are appealing candidates and have already been applied in many applications with single categorical labels, such as user click signals. Nevertheless, the existing unbiased LTR methods cannot properly handle continuous feedback, which are essential for many industrial applications, such as content recommender systems.

To provide personalized high-quality recommendation results, recommender systems need model both categorical and continuous biased feedback, such as click and dwell time. As unbiased LTR methods could not handle these continuous feedback and pair-wise learning without debiasing often performs worse than point-wise on biased feedback, which is also verified in our experiments, training multiple point-wise rankers to predict the absolute value of multiple objectives and leveraging a distinct shallow tower to estimate and alleviate the impact of position bias has been the mainstream approach in major industrial recommendation applications. However, with such a training paradigm, the optimization target differs a lot from the ranking metrics valuing the relative order of top-ranked items rather than the prediction precision of each item. Moreover, as the existing system tends to recommend more relevant items at higher positions, it is difficult for the shallow tower based methods to precisely attribute the user feedback to the impact of position or relevance. Therefore, there exists an exciting opportunity for us to get enhanced performance if we manage to solve the aforementioned issues.

Accordingly, we design a novel unbiased LTR algorithm to tackle the challenges, which innovatively models position bias in the pairwise fashion and introduces the pairwise trust bias to separate the position bias, trust bias, and user relevance explicitly and can work for both continuous and categorical feedback. Experiment results on public benchmark datasets and internal live traffic of a large-scale recommender system at Tencent News show superior results for continuous labels and also competitive performance for categorical labels of the proposed method.

Deep Extreme Mixture Model for Time Series Forecasting

Time Series Forecasting (TSF) has been a topic of extensive research, which has many real world applications such as weather prediction, stock market value prediction, traffic control etc. Many machine learning models have been developed to address TSF, yet, predicting extreme values remains a challenge to be effectively addressed. Extreme events occur rarely, but tend to cause a huge impact, which makes extreme event prediction important. Assuming light tailed distributions, such as Gaussian distribution, on time series data does not do justice to the modeling of extreme points. To tackle this issue, we develop a novel approach towards improving attention to extreme event prediction. Within our work, we model time series data distribution, as a mixture of Gaussian distribution and Generalized Pareto distribution (GPD). In particular, we develop a novel Deep eXtreme Mixture Model (DXtreMM) for univariate time series forecasting, which addresses extreme events in time series. The model consists of two modules: 1) Variational Disentangled Auto-encoder (VD-AE) based classifier and 2) Multi Layer Perceptron (MLP) based forecaster units combined with Generalized Pareto Distribution (GPD) estimators for lower and upper extreme values separately. VD-AE Classifier model predicts the possibility of occurrence of an extreme event given a time segment, and forecaster module predicts the exact value. Through extensive set of experiments on real-world datasets we have shown that our model performs well for extreme events and is comparable with the existing baseline methods for normal time step forecasting.

Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?

Fact-checking is one of the effective solutions in fighting online misinformation. However, traditional fact-checking is a process requiring scarce expert human resources, and thus does not scale well on social media because of the continuous flow of new content to be checked. Methods based on crowdsourcing have been proposed to tackle this challenge, as they can scale with a smaller cost, but, while they have shown to be feasible, have always been studied in controlled environments. In this work, we study the first large-scale effort of crowdsourced fact-checking deployed in practice, started by Twitter with the Birdwatch program. Our analysis shows that crowdsourcing may be an effective fact-checking strategy in some settings, even comparable to results obtained by human experts, but does not lead to consistent, actionable results in others. We processed 11.9k tweets verified by the Birdwatch program and report empirical evidence of i) differences in how the crowd and experts select content to be fact-checked, ii) how the crowd and the experts retrieve different resources to fact-check, and iii) the edge the crowd shows in fact-checking scalability and efficiency as compared to expert checkers.

PLAID: An Efficient Engine for Late Interaction Retrieval

Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. To dramatically speed up the search latency of late interaction, we introduce the Performance-optimized Late Interaction Driver (PLAID) engine. Without impacting quality, PLAID swiftly eliminates low-scoring passages using a novel centroid interaction mechanism that treats every passage as a lightweight bag of centroids. PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7x on a GPU and 45x on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2 to achieve latency of tens of milliseconds on a GPU and tens or just few hundreds of milliseconds on a CPU at large scale, even at the largest scales we evaluate with 140M passages.

Towards Principled User-side Recommender Systems

Traditionally, recommendation algorithms have been designed for service developers. However, recently, a new paradigm called user-side recommender systems has been proposed and they enable web service users to construct their own recommender systems without access to trade-secret data. This approach opens the door to user-defined fair systems even if the official recommender system of the service is not fair. While existing methods for user-side recommender systems have addressed the challenging problem of building recommender systems without using log data, they rely on heuristic approaches, and it is still unclear whether constructing user-side recommender systems is a well-defined problem from theoretical point of view. In this paper, we provide theoretical justification of user-side recommender systems. Specifically, we see that hidden item features can be recovered from the information available to the user, making the construction of user-side recommender system well-defined. However, this theoretically grounded approach is not efficient. To realize practical yet theoretically sound recommender systems, we propose three desirable properties of user-side recommender systems and propose an effective and efficient user-side recommender system, Consul, based on these foundations. We prove that Consul satisfies all three properties, whereas existing user-side recommender systems lack at least one of them. In the experiments, we empirically validate the theory of feature recovery via numerical experiments. We also show that our proposed method achieves an excellent trade-off between effectiveness and efficiency and demonstrate via case studies that the proposed method can retrieve information that the provider's official recommender system cannot.

Hierarchically Fusing Long and Short-Term User Interests for Click-Through Rate Prediction in Product Search

Estimating Click-Through Rate (CTR) is a vital yet challenging task in personalized product search. However, existing CTR methods still struggle in the product search settings due to the following three challenges including how to more effectively extract users' short-term interest with respect to multiple aspects, how to extract and fuse users' long-term interest with short-term interest, how to address the entangling characteristic of long and short-term interests. To resolve these challenges, in this paper, we propose a new approach named Hierarchical Interests Fusing Network (HIFN), which consists of four basic modules namely Short-term Interest Extractor (SIE), Long-term Interest Extractor (LIE), Interest Fusion Module (IFM) and Interest Disentanglement Module (IDM). Specifically, SIE is proposed to extract user's short-term interest by integrating three fundamental interest encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interest by devising an attention mechanism with respect to the short-term interest from SIE module. In IFM, the achieved long and short-term interests are further fused in an adaptive manner, followed by concatenating it with original raw context features for the final prediction result. Last but not least, considering the entangling characteristic of long and short-term interests, IDM further devises a self-supervised framework to disentangle long- and short-term interests. Extensive offline and online evaluations on a real-world e-commerce platform demonstrate the superiority of HIFN over state-of-the-art methods.

A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS

Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Moreover, we propose a metric, contextual user satisfaction, to evaluate the experience under the proactive interaction mechanism. At last, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.

Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function

Task-oriented dialog systems enable users to accomplish tasks using natural language. State-of-the-art systems respond to users in the same way regardless of their personalities, although personalizing dialogues can lead to higher levels of adoption and better user experiences. Building personalized dialog systems is an important, yet challenging endeavor, and only a handful of works took on the challenge. Most existing works rely on supervised learning approaches and require laborious and expensive labeled training data for each user profile. Additionally, collecting and labeling data for each user profile is virtually impossible. In this work, we propose a novel framework, P-ToD, to personalize task-oriented dialog systems capable of adapting to a wide range of user profiles in an unsupervised fashion using a zero-shot generalizable reward function. P-ToD uses a pre-trained GPT-2 as a backbone model and works in three phases. Phase one performs task-specific training. Phase two kicks off unsupervised personalization by leveraging the proximal policy optimization algorithm that performs policy gradients guided by the zero-shot generalizable reward function. Our novel reward function can quantify the quality of the generated responses even for unseen profiles. The optional final phase fine-tunes the personalized model using a few labeled training examples. We conduct extensive experimental analysis using the personalized bAbI dialogue benchmark for five tasks and up to 180 diverse user profiles. The experimental results demonstrate that P-ToD, even when it had access to zero labeled examples, outperforms state-of-the-art supervised personalization models and achieves competitive performance on BLEU and ROUGE metrics when compared to a strong fully-supervised GPT-2 baseline.

Perturbation Effect: A Metric to Counter Misleading Validation of Feature Attribution

This paper provides evidence indicating that the most commonly used metric for validating feature attribution methods in eXplainable AI (XAI) is misleading when applied to time series data. To evaluate whether an XAI method attributes importance to relevant features, these are systematically perturbed while measuring the impact on the performance of the classifier. The assumption is that a drastic performance reduction with increasing perturbation of relevant features indicates that these are indeed relevant. We demonstrate empirically that this assumption is incomplete without considering low relevance features in the used metrics. We introduce a novel metric, the Perturbation Effect Size, and demonstrate how it complements existing metrics to offer a more faithful assessment of importance attribution. Finally, we contribute a comprehensive evaluation of attribution methods on time series data, considering the influence of perturbation methods and region size selection.

Cross-domain Recommendation via Adversarial Adaptation

Data scarcity, e.g., labeled data being either unavailable or too expensive, is a perpetual challenge of recommendation systems. Cross-domain recommendation leverages the label information in the source domain to facilitate the task in the target domain. However, in many real-world cross-domain recommendation systems, the source domain and the target domain are sampled from different data distributions, which obstructs the cross-domain knowledge transfer. In this paper, we propose to specifically align the data distributions between the source domain and the target domain to alleviate imbalanced sample distribution and thus challenge the data scarcity issue in the target domain. Technically, our proposed approach builds an adversarial adaptation (AA) framework to adversarially train the target model together with a pre-trained source model. A domain discriminator plays the two-player minmax game with the target model and guides the target model to learn domain-invariant features that can be transferred across domains. At the same time, the target model is calibrated to learn domain-specific information of the target domain. With such a formulation, the target model not only learns domain-invariant features for knowledge transfer, but also preserves domain-specific information for target recommendation. We apply the proposed method to address the issues of insufficient data and imbalanced sample distribution in real-world Click-Through Rate (CTR)/Conversion Rate (CVR) predictions on a large-scale dataset. Specifically, we formulate our approach as a plug-and-play module to boost existing recommendation systems. Extensive experiments verify that the proposed method is able to significantly improve the prediction performance on the target domain. For instance, our method can boost PLE with a performance improvement of 13.88% in terms of Area Under Curve (AUC) compared with single-domain PLE.

Graph Based Long-Term And Short-Term Interest Model for Click-Through Rate Prediction

Click-through rate (CTR) prediction aims to predict the probability that the user will click an item, which has been one of the key tasks in online recommender and advertising systems. In such systems, rich user behavior (viz. long- and short-term) has been proved to be of great value in capturing user interests. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long-term and short-term user behavior data. But there are still some unresolved issues. More specially, (1) rule and truncation based methods to extract information from long-term behavior are easy to cause information loss, and (2) single feedback behavior regardless of scenario to extract information from short-term behavior lead to information confusion and noise. To fill this gap, we propose a Graph based Long-term and Short-term interest Model, termed GLSM. It consists of a multi-interest graph structure for capturing long-term user behavior, a multi-scenario heterogeneous sequence model for modeling short-term information, then an adaptive fusion mechanism to fused information from long-term and short-term behaviors. Comprehensive experiments on real-world datasets, GLSM achieved SOTA score on offline metrics. At the same time, the GLSM algorithm has been deployed in our industrial application, bringing 4.9% CTR and 4.3% GMV lift, which is significant to the business

A Self-supervised Riemannian GNN with Time Varying Curvature for Temporal Graph Learning

Representation learning on temporal graphs has drawn considerable research attention owing to its fundamental importance in a wide spectrum of real-world applications. Though a number of studies succeed in obtaining time-dependent representations, it still faces significant challenges. On the one hand, most of the existing methods restrict the embedding space with a certain curvature. However, the underlying geometry in fact shifts among the positive curvature hyperspherical, zero curvature Euclidean and negative curvature hyperbolic spaces in the evolvement over time. On the other hand, these methods usually require abundant labels to learn temporal representations, and thereby notably limit their wide use in the unlabeled graphs of the real applications. To bridge this gap, we make the first attempt to study the problem of self-supervised temporal graph representation learning in the general Riemannian space, supporting the time-varying curvature to shift among hyperspherical, Euclidean and hyperbolic spaces. In this paper, we present a novel self-supervised Riemannian graph neural network (SelfℛGNN). Specifically, we design a curvature-varying Riemannian GNN with a theoretically grounded time encoding, and formulate a functional curvature over time to model the evolvement shifting among the positive, zero and negative curvature spaces. To enable the self-supervised learning, we propose a novel reweighting self-contrastive approach, exploring the Riemannian space itself without augmentation, and propose an edge-based self-supervised curvature learning with the Ricci curvature. Extensive experiments show the superiority of SelfRGNN, and moreover, the case study shows the time-varying curvature of temporal graph in reality.

Serpens: Privacy-Preserving Inference through Conditional Separable of Convolutional Neural Networks

With the extensive usage of convolutional neural networks (CNNs), privacy issues within practical applications have attracted much attention, especially when deep learning services are provided by third-party clouds. Many private inference schemes have been proposed, but their overheads are still too large. In this work, we find that the inference procedure of CNNs can be separated and performed synergistically by many parties. Following this observation, we present a pair of novel notions, namely separable and conditional separable, to tell whether a layer in CNNs can be exactly computed over multiple parties or not. Besides, we also prove that CNNs are conditionally separable. Accordingly, we propose Serpens, a private inference framework under multi-server settings. Serpens reduces the overhead of linear layers to almost zero, and now the computing bottleneck is ReLU. To address that, we design two secure ReLU protocols based on homomorphic encryption and random masks for two- and three-server settings. Experimental results show that Serpens is 78x-105x faster than the state-of-the-art private inference scheme in the two-server setting, and the superiority of Serpens is even larger in the three-server setting, only 11x-64x slower than performing the same inference over plaintext images.

Position-aware Structure Learning for Graph Topology-imbalance by Relieving Under-reaching and Over-squashing

Topology-imbalance is a graph-specific imbalance problem caused by the uneven topology positions of labeled nodes, which significantly damages the performance of GNNs. What topology-imbalance means and how to measure its impact on graph learning remain under-explored. In this paper, we provide a new understanding of topology-imbalance from a global view of the supervision information distribution in terms of under-reaching and over-squashing, which motivates two quantitative metrics as measurements. In light of our analysis, we propose a novel position-aware graph structure learning framework named PASTEL, which directly optimizes the information propagation path and solves the topology-imbalance issue in essence. Our key insight is to enhance the connectivity of nodes within the same class for more supervision information, thereby relieving the under-reaching and over-squashing phenomena. Specifically, we design an anchor-based position encoding mechanism, which better incorporates relative topology position and enhances the intra-class inductive bias by maximizing the label influence. We further propose a class-wise conflict measure as the edge weights, which benefits the separation of different node classes. Extensive experiments demonstrate the superior potential and adaptability of PASTEL in enhancing GNNs' power in different data annotation scenarios

DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities

Reinforcement learning (RL) techniques have shown great success in many challenging quantitative trading tasks, such as portfolio management and algorithmic trading. Especially, intraday trading is one of the most profitable and risky tasks because of the intraday behaviors of the financial market that reflect billions of rapidly fluctuating capitals. However, a vast majority of existing RL methods focus on the relatively low frequency trading scenarios (e.g., day-level) and fail to capture the fleeting intraday investment opportunities due to two major challenges: 1) how to effectively train profitable RL agents for intraday investment decision-making, which involves high-dimensional fine-grained action space; 2) how to learn meaningful multi-modality market representation to understand the intraday behaviors of the financial market at tick-level.

Motivated by the efficient workflow of professional human intraday traders, we propose DeepScalper, a deep reinforcement learning framework for intraday trading to tackle the above challenges. Specifically, DeepScalper includes four components: 1) a dueling Q-network with action branching to deal with the large action space of intraday trading for efficient RL optimization; 2) a novel reward function with a hindsight bonus to encourage RL agents making trading decisions with a long-term horizon of the entire trading day; 3) an encoder-decoder architecture to learn multi-modality temporal market embedding, which incorporates both macro-level and micro-level market information; 4) a risk-aware auxiliary task to maintain a striking balance between maximizing profit and minimizing risk. Through extensive experiments on real-world market data spanning over three years on six financial futures (2 stock index and 4 treasury bond), we demonstrate that DeepScalper significantly outperforms many state-of-the-art baselines in terms of four financial criteria. Furthermore, we conduct a series of exploratory and ablative studies to analyze the contributions of each component in DeepScalper.

RobustFed: A Truth Inference Approach for Robust Federated Learning

Federated learning is a prominent framework that enables clients (e.g., mobile devices or organizations) to collaboratively train a global model under a central server's orchestration while keeping local data private. However, the aggregation step in federated learning is vulnerable to adversarial attacks as the central server cannot enforce clients' behavior. As a result, the performance of the global model and convergence of the training process can be affected under such attacks. To mitigate this vulnerability, existing works have proposed robust aggregation methods such as median based aggregation instead of averaging. While they ensure some robustness against Byzantine attacks, they are still vulnerable to label flipping and Gaussian noise attacks. In this paper, we propose a novel robust aggregation algorithm inspired by the truth inference methods in crowdsourcing by incorporating the clients' reliability into aggregation. We evaluate our solution on three real-world datasets with a variety of machine learning models. Experimental results show that our solution ensures robust federated learning and is resilient to various types of attacks, including noisy data attacks, Byzantine attacks, and label flipping attacks.

Temporality- and Frequency-aware Graph Contrastive Learning for Temporal Network

Graph contrastive learning (GCL) methods aim to learn more distinguishable representations by contrasting positive and negative samples. They have received increasing attention in recent years due to their wide application in recommender systems and knowledge graphs. However, almost all GCL methods are applied to static networks and can not be extended to temporal networks directly. Furthermore, recent GCL models treat low- and high-frequency nodes equally in overall training objectives, which hinders the prediction precision. To solve the aforementioned problems, in this paper, we propose a <u>T</u>emporality- and <u>F</u>requency-aware <u>G</u>raph <u>C</u>ontrastive <u>L</u>earning for temporal networks (TF-GCL). Specifically, to learn more diverse representations for infrequent nodes and fully explore temporal information, we first generate two augmented views from the input graph based on topological and temporal perspectives. We then design a temporality and frequency-aware objective function to maximize the agreement between node representations of the two views. Experimental results demonstrate that TF-GCL remarkably achieves more robust node representations and significantly outperforms the state-of-the-art methods on six temporal link prediction benchmark datasets. Considering the reproducibility, we release our code on Github.

Domain Adversarial Spatial-Temporal Network: A Transferable Framework for Short-term Traffic Forecasting across Cities

Accurate real-time traffic forecast is critical for intelligent transportation systems (ITS) and it serves as the cornerstone of various smart mobility applications. Though this research area is dominated by deep learning, recent studies indicate that the accuracy improvement by developing new model structures is becoming marginal. Instead, we envision that the improvement can be achieved by transferring the ''forecasting-related knowledge" across cities with different data distributions and network topologies. To this end, this paper aims to propose a novel transferable traffic forecasting framework: Domain Adversarial Spatial-Temporal Network (DASTNet). DASTNet is pre-trained on multiple source networks and fine-tuned with the target network's traffic data. Specifically, we leverage the graph representation learning and adversarial domain adaptation techniques to learn the domain-invariant node embeddings, which are further incorporated to model the temporal traffic data. To the best of our knowledge, we are the first to employ adversarial multi-domain adaptation for network-wide traffic forecasting problems. DASTNet consistently outperforms all state-of-the-art baseline methods on three benchmark datasets. The trained DASTNet is applied to Hong Kong's new traffic detectors, and accurate traffic predictions can be delivered immediately (within one day) when the detector is available. Overall, this study suggests an alternative to enhance the traffic forecasting methods and provides practical implications for cities lacking historical traffic data. Source codes of DASTNet are available at

CROLoss: Towards a Customizable Loss for Retrieval Models in Recommender Systems

In large-scale recommender systems, retrieving top N relevant candidates accurately with resource constrain is crucial. To evaluate the performance of such retrieval models, Recall@N, the frequency of positive samples being retrieved in the top N ranking, is widely used. However, most of the conventional loss functions for retrieval models such as softmax cross-entropy and pairwise comparison methods do not directly optimize Recall@N. Moreover, those conventional loss functions cannot be customized for the specific retrieval size N required by each application and thus may lead to sub-optimal performance. In this paper, we proposed the Customizable Recall@N Optimization Loss (CROLoss), a loss function that can directly optimize the Recall@N metrics and is customizable for different choices of N. This proposed CROLoss formulation defines a more generalized loss function space, covering most of the conventional loss functions as special cases. Furthermore, we develop the Lambda method, a gradient-based method that invites more flexibility and can further boost the system performance. We evaluate the proposed CROLoss on two public benchmark datasets. The results show that CROLoss achieves SOTA results over conventional loss functions for both datasets with various choices of retrieval size N. CROLoss has been deployed onto our online E-commerce advertising platform, where a fourteen-day online A/B test demonstrated that CROLoss contributes to a significant business revenue growth of 4.75%.

Temporal Contrastive Pre-Training for Sequential Recommendation

Recently, pre-training based approaches are proposed to leverage self-supervised signals for improving the performance of sequential recommendation. However, most of existing pre-training recommender systems simply model the historical behavior of a user as a sequence, while lack of sufficient consideration on temporal interaction patterns that are useful for modeling user behavior.

In order to better model temporal characteristics of user behavior sequences, we propose a Temporal Contrastive Pre-training method for Sequential Recommendation (TCPSRec for short). Based on the temporal intervals, we consider dividing the interaction sequence into more coherent subsequences, and design temporal pre-training objectives accordingly. Specifically, TCPSRec models two important temporal properties of user behavior, i.e., invariance and periodicity. For invariance, we consider both global invariance and local invariance to capture the long-term preference and short-term intention, respectively. For periodicity, TCPSRec models coarse-grained periodicity and fine-grained periodicity at the subsequence level, which is more stable than modeling periodicity at the item level. By integrating the above strategies, we develop a unified contrastive learning framework with four specially designed pre-training objectives for fusing temporal information into sequential representations. We conduct extensive experiments on six real-world datasets, and the results demonstrate the effectiveness and generalization of our proposed method.

Dr. Can See: Towards a Multi-modal Disease Diagnosis Virtual Assistant

Artificial Intelligence-based clinical decision support is gaining ever-growing popularity and demand in both the research and industry communities. One such manifestation is automatic disease diagnosis, which aims to assist clinicians in conducting symptom investigations and disease diagnoses. When we consult with doctors, we often report and describe our health conditions with visual aids. Moreover, many people are unacquainted with several symptoms and medical terms, such as mouth ulcer and skin growth. Therefore, visual form of symptom reporting is a necessity. Motivated by the efficacy of visual form of symptom reporting, we propose and build a novel end-to-end Multi-modal Disease Diagnosis Virtual Assistant (MDD-VA) using reinforcement learning technique. In conversation, users' responses are heavily influenced by the ongoing dialogue context, and multi-modal responses appear to be of no difference. We also propose and incorporate a Context-aware Symptom Image Identification module that leverages discourse context in addition to the symptom image for identifying symptoms effectively. Furthermore, we first curate a multi-modal conversational medical dialogue corpus in English that is annotated with intent, symptoms, and visual information. The proposed MDD-VA outperforms multiple uni-modal baselines in both automatic and human evaluation, which firmly establishes the critical role of symptom information provided by visuals . The dataset and code are available at

A Context-Enhanced Generate-then-Evaluate Framework for Chinese Abbreviation Prediction

As a popular form of lexicalization, abbreviation is widely used in both oral and written language and plays an important role in various Natural Language Processing applications. However, current approaches cannot ensure that the predicted abbreviation preserves the meaning of its full form and maintains fluency. In this paper, we introduce a fresh perspective to evaluate the quality of abbreviations within their textual contexts with pre-trained language model. To this end, we propose a novel two-stage generate-then-evaluate framework enhanced by context, which consists of a generation model to generate multiple candidate abbreviations and an evaluation model to evaluate their quality within their contexts. Experimental results show that our framework consistently outperforms all the existing approaches, achieving 53.2% Hit@1 performance with a 5.6 points improvement compared to its previous best result. Our code and data are publicly available at

Dense Retrieval with Entity Views

Pre-trained language models like BERT have been demonstrated to be both effective and efficient ranking methods when combined with approximate nearest neighbor search, which can quickly match dense representations of queries and documents. However, pretrained language models alone do not fully capture information about uncommon entities. In this work, we investigate methods for enriching dense query and document representations with entity information from an external source. Our proposed method identifies groups of entities in a text and encodes them into a dense vector representation, which is then used to enrich BERT's vector representation of the text. To handle documents that contain many loosely-related entities, we devise a strategy for creating multiple entity representations that reflect different views of a document. For example, a document about a scientist may cover aspects of her personal life and recent work, which correspond to different views of the entity. In an evaluation on MS MARCO benchmarks, we find that enriching query and document representations in this way yields substantial increases in effectiveness.

Intersection of Parallels as an Early Stopping Criterion

A common way to avoid overfitting in supervised learning is early stopping, where a held-out set is used for iterative evaluation during training to find a sweet spot in the number of training steps that gives maximum generalization. However, such a method requires a disjoint validation set, thus part of the labeled data from the training set is usually left out for this purpose, which is not ideal when training data is scarce. Furthermore, when the training labels are noisy, the performance of the model over a validation set may not be an accurate proxy for generalization. In this paper, we propose a method to spot an early stopping point in the training iterations of an overparameterized (NN) without the need for a validation set. We first show that in the overparameterized regime the randomly initialized weights of a linear model converge to the same direction during training. Using this result, we propose to train two parallel instances of a linear model, initialized with different random seeds, and use their intersection as a signal to detect overfitting. In order to detect intersection, we use the cosine distance between the weights of the parallel models during training iterations. Noticing that the final layer of a NN is a linear map of pre-last layer activations to output logits, we build on our criterion for linear models and propose an extension to multi-layer networks, using the new notion of counterfactual weights. We conduct experiments on two areas that early stopping has noticeable impact on preventing overfitting of a NN: (i) learning from noisy labels; and (ii) learning to rank in information retrieval. Our experiments on four widely used datasets confirm the effectiveness of our method for generalization. For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against in almost all of the tested cases.

Adaptive Multi-Source Causal Inference from Observational Data

We propose a new approach to estimate causal effects from observational data. We leverage multiple data sources which share similar causal mechanisms with the scarce target observations to help infer causal effects in the target domain. The data sources may be available in sequence or some unplanned order. Causal inference can be carried out without prior knowledge of the data discrepancy between the source and target observations. We introduce three levels of knowledge transfer through modelling the outcomes, treatments, and confounders to achieve consistent positive transfer. We incorporate parametric transfer factors to adaptively control the transfer strength, thus achieving a fair and balanced knowledge transfer between the sources and the target. We also empirically show the effectiveness of the proposed method as compared with recent baselines.

Modeling Inter-Dependence Between Time and Mark in Multivariate Temporal Point Processes

Temporal Point Processes (TPP) are probabilistic generative frameworks. They model discrete event sequences localized in continuous time. Generally, real-life events reveal descriptive information, known as marks. Marked TPPs model time and marks of the event together for practical relevance. Conditioned on past events, marked TPPs aim to learn the joint distribution of the time and the mark of the next event. For simplicity, conditionally independent TPP models assume time and marks are independent given event history. They factorize the conditional joint distribution of time and mark into the product of individual conditional distributions. This structural limitation in the design of TPP models hurt the predictive performance on entangled time and mark interactions. In this work, we model the conditional inter-dependence of time and mark to overcome the limitations of conditionally independent models. We construct a multivariate TPP conditioning the time distribution on the current event mark in addition to past events. Besides the conventional intensity-based models for conditional joint distribution, we also draw on flexible intensity-free TPP models from the literature. The proposed TPP models outperform conditionally independent and dependent models in standard prediction tasks. Our experimentation on various datasets with multiple evaluation metrics highlights the merit of the proposed approach.

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Visual question answering is an important task in both natural language and vision understanding. However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the given image, such as 'What color are her eyes?'. The human generated crowdsourcing questions are relatively simple and sometimes have the bias toward certain entities or attributes [1, 55].

In this paper, we introduce a new question answering dataset based on image-ChiQA. It contains the real-world queries issued by internet users, combined with several related open-domain images. The system should determine whether the image could answer the question or not. Different from previous VQA datasets, the questions are real-world image-independent queries that are more various and unbiased. Compared with previous image-retrieval or image-caption datasets, the ChiQA not only measures the relatedness but also measures the answerability, which demands more fine-grained vision and language reasoning.

ChiQA contains more than 40K questions and more than 200K question-images pairs. A three-level 2/1/0 label is assigned to each pair indicating perfect answer, partially answer and irrelevant. Data analysis shows ChiQA requires a deep understanding of both language and vision, including grounding, comparisons, and reading. We evaluate several state-of-the-art visual-language models such as ALBEF, demonstrating that there is still a large room for improvements on ChiQA.

Target Interest Distillation for Multi-Interest Recommendation

Sequential recommendation aims at predicting the next item that the user may be interested in given the historical interaction sequence. Typical neural models derive a single history embedding to represent the user's interests. Moving one step forward, recent studies point out that multiple sequence embeddings can help to better capture multi-faceted user interests. However, when ranking candidate items, these methods usually adopt the greedy inference strategy. This approach uses the best matching interest for each candidate item to calculate the ranking score, neglecting the target interest distribution in different contexts, which might lead to incompatibility with the current user intent. In this paper, we propose to enhance multi-interest recommendation by predicting the target user interest with a separate interest predictor and a specifically designed distillation loss. The proposed framework consists of two modules: the 1) multi-interest extractor to generate multiple embeddings regarding different user interests; and the 2) target-interest predictor to predict the interest distribution in the current context, which will be further utilized to dynamically aggregate multi-interest embeddings. To provide explicit supervision signals to the target-interest predictor, we devise a target-interest distillation loss that uses the similarity between the target item and multi-interest embeddings as the soft label of the target interest. This helps the target-interest predictor to accurately predict the user interest at the inference stage and enhances its generalization ability. Extensive experiments on three real-world datasets show the effectiveness and flexibility of the proposed framework.

Explanation Guided Contrastive Learning for Sequential Recommendation

Recently, contrastive learning has been applied to the sequential recommendation task to address data sparsity caused by users with few item interactions and items with few user adoptions. Nevertheless, the existing contrastive learning-based methods fail to ensure that the positive (or negative) sequence obtained by some random augmentation (or sequence sampling) on a given anchor user sequence remains to be semantically similar (or different). When the positive and negative sequences turn out to be false positive and false negative respectively, it may lead to degraded recommendation performance. In this work, we address the above problem by proposing Explanation Guided Augmentations (EGA) and Explanation Guided Contrastive Learning for Sequential Recommendation (EC4SRec) model framework. The key idea behind EGA is to utilize explanation method(s) to determine items' importance in a user sequence and derive the positive and negative sequences accordingly. EC4SRec then combines both self-supervised and supervised contrastive learning over the positive and negative sequences generated by EGA operations to improve sequence representation learning for more accurate recommendation results. Extensive experiments on four real-world benchmark datasets demonstrate that EC4SRec outperforms the state-of-the-art sequential recommendation methods and two recent contrastive learning-based sequential recommendation methods, CL4SRec and DuoRec. Our experiments also show that EC4SRec can be easily adapted for different sequence encoder backbones (e.g., GRU4Rec and Caser), and improve their recommendation performance.

Generative-Free Urban Flow Imputation

Urban flow imputation, which aims to infer the missing flows of some locations based on the available flows of surrounding areas, is critically important to various smart city related applications such as urban planning and public safety. Although many methods are proposed to impute time series data, they may not be feasible to be directly applied on urban flow data due to the following reasons. First, urban flows have the complex spatial and temporal correlations which are much harder to be captured compared with time series data. Second, the urban flow data can be random missing (i.e., missing randomly in terms of times and locations) or block missing (i.e., missing for all locations in a particular time slot). Thus it is difficult for existing methods to work well on both scenarios. In this paper, we for the first time study the urban flow imputation problem and propose a generative-free Attention-based Spatial-Temporal Combine and Mix Completion Network model (AST-CMCN for short) to effectively address it. Specifically, AST-CMCN consists of a Spatial and Temporal Completion Network (SATCNet for short) and a Spatial-Temporal Mix Completion Network (STMCNet for short). SATCNet is composed of stacked GRUAtt modules to capture the geographical and temporal correlations of the urban flows, separately. STMCNet is designed to capture the complex spatial-temporal associations jointly between historical urban flows and current data. A Message Passing module is also proposed to capture new spatial-temporal patterns that never appear in the historical data. Extensive experiments on two large real-world datasets validate the effectiveness and efficiency of our method compared with the state-of-the-art baselines.

Interpretable Emotion Analysis Based on Knowledge Graph and OCC Model

Sentiment analysis or opinion mining has been significant for information extraction from the text. At the same time, emotion psychology also proposed many appraisal theories for emotional evaluations and concrete predictions. While sentiment analysis focuses on identifying the polarity, appraisal theories of emotion can define different emotions and view emotions as process rather than states. In real life, the mechanism of emotional generations and interactions is complicated. Only plausible polarity can't provide enough explanations for the emotional mechanism. Hence an explainable model is in demand during emotion inference and dynamical analysis. In this paper, an analysis framework is constructed for interpreting casual association based on the emotional logic. Knowledge graph is introduced into the appraisal theories for inferring the emotions and predicting the action tendency. The emotion knowledge graph levels: concept level and case level. The concept level can be built manually as an abstract based on the appraisal model of Ortony, Clore & Collins (OCC model). The inference and predictions can be implemented at this level. The case level includes entities, objects, events and cognitive relations between them that extract from the text through the modular functions. The elements in the case level can be linked to the abstract types in the concept level for the emotional inference. We test this emotional analysis framework on several datasets from the appraisal theory and the text of drama works. The results demonstrate that our framework can make better inferences on emotions and good interpretability for human beings.

AdaGCL: Adaptive Subgraph Contrastive Learning to Generalize Large-scale Graph Training

Training graph neural networks (GNNs) with good generalizability on large-scale graphs is a challenging problem. Existing methods mainly divide the input graph into multiple subgraphs and train them in different batches to improve training scalability. However, the local batches obtained by such a strategy could contain topological bias compared with the complete graph structure. It has been studied that the topological bias results in more significant gaps between training and testing performances, or worse generalization robustness. A straightforward solution is to utilize contrastive learning, and train node embeddings to be robust and invariant among the augmented imperfect graphs. However, most of the existing work are inefficient by contrasting extensive node pairs at the large-scale graph. With random data augmentation, they may deteriorate the embedding process by transforming well-sampled batches into meaningless graph structures.

To bridge the gap between large-scale graph training and contrastive learning, we propose adaptive subgraph contrastive learning (AdaGCL). Given a batch of sampled subgraphs, we propose subgraph-granularity contrastive loss to compare the anchor node with a limited number of subgraphs, which reduces the computation cost. AdaGCL tailors two key components for batch training: (1) Batch-aware view generation to keep the intrinsic individual subgraph structures of batch to learn the informative node embeddings; (2) Batch-aware pair sampling to construct the positive and negative contrasting subgraphs based on anchor node label. Experiments show that AdaGCL can scale up to graphs with millions of nodes, and delivers the consistent improvement than the existing methods on various benchmark datasets. Furthermore, AdaGCL has comparable running time with the state-of-the-art contrastive learning methods that focus on improving efficiency. Finally, ablation studies of the two components of AdaGCL demonstrate their effectiveness to generalize the batch training. The code is in:

ContrastVAE: Contrastive Variational AutoEncoder for Sequential Recommendation

Aiming at exploiting the rich information in user behaviour sequences, sequential recommendation has been widely adopted in real-world recommender systems. However, current methods suffer from the following issues: 1) sparsity of user-item interactions, 2) uncertainty of sequential records, 3) long-tail items. In this paper, we propose to incorporate contrastive learning into the framework of Variational AutoEncoders to address these challenges simultaneously. Firstly, we introduce ContrastELBO, a novel training objective that extends the conventional single-view ELBO to two-view case and theoretically builds a connection between VAE and contrastive learning from a two-view perspective. Then we propose Contrastive Variational AutoEncoder (ContrastVAE in short), a two-branched VAE model with contrastive regularization as an embodiment of ContrastELBO for sequential recommendation. We further introduce two simple yet effective augmentation strategies named model augmentation and variational augmentation to create a second view of a sequence and thus making contrastive learning possible. Experiments on four benchmark datasets demonstrate the effectiveness of ContrastVAE and the proposed augmentation methods. Codes are available at

Imbalanced Graph Classification via Graph-of-Graph Neural Networks

Graph Neural Networks (GNNs) have achieved unprecedented success in identifying categorical labels of graphs. However, most existing graph classification problems with GNNs follow the protocol of balanced data splitting, which misaligns with many real-world scenarios in which some classes have much fewer labels than others. Directly training GNNs under this imbalanced scenario may lead to uninformative representations of graphs in minority classes, and compromise the overall classification performance, which signifies the importance of developing effective GNNs towards handling imbalanced graph classification. Existing methods are either tailored for non-graph structured data or designed specifically for imbalanced node classification while few focus on imbalanced graph classification. To this end, we introduce a novel framework, Graph-of-Graph Neural Networks (G2GNN), which alleviates the graph imbalance issue by deriving extra supervision globally from neighboring graphs and locally from stochastic augmentations of graphs. Globally, we construct a graph of graphs (GoG) based on kernel similarity and perform GoG propagation to aggregate neighboring graph representations. Locally, we employ topological augmentation via masking node features or dropping edges with self-consistency regularization to generate stochastic augmentations of each graph that improve the model generalizability. Extensive graph classification experiments conducted on seven benchmark datasets demonstrate our proposed G2GNN outperforms numerous baselines by roughly 5% in both F1-macro and F1-micro scores.

Latent Coreset Sampling based Data-Free Continual Learning

Catastrophic forgetting poses a major challenge in continual learning where the old knowledge is forgotten when the model is updated on new tasks. Existing solutions tend to solve this challenge through generative models or exemplar-replay strategies. However, such methods may not alleviate the issue that the low-quality samples are generated or selected for the replay, which would directly reduce the effectiveness of the model, especially in the class imbalance, noise, or redundancy scenarios. Accordingly, how to select a suitable coreset during continual learning becomes significant in such setting. In this work, we propose a novel approach that leverages continual coreset sampling (CCS) to address these challenges. We aim to select the most representative subsets during each iteration. When the model is trained on new tasks, it closely approximates/matches the gradient of both the previous and current tasks with respect to the model parameters. This way, adaptation of the model to new datasets could be more efficient. Furthermore, different from the old data storage for maintaining the old knowledge, our approach choose to preserving them in the latent space. We augment the previous classes in the embedding space as the pseudo sample vectors from the old encoder output, strengthened by the joint training with selected new data. It could avoid data privacy invasions in a real-world application when we update the model. Our experiments validate the effectiveness of our proposed approach over various CV/NLP datasets under against current baselines, and we also indicate the obvious improvement of model adaptation and forgetting reduction in a data-free manner.

Bandit Learning in Many-to-One Matching Markets

The problem of two-sided matching markets is well-studied in social science and economics. Some recent works study how to match while learning the unknown preferences of agents in one-to-one matching markets. However, in many cases like the online recruitment platform for short-term workers, a company can select more than one agent while an agent can only select one company at a time. These short-term workers try many times in different companies to find the most suitable jobs for them. Thus we consider a more general bandit learning problem in many-to-one matching markets where each arm has a fixed capacity and agents make choices with multiple rounds of iterations. We develop algorithms in both centralized and decentralized settings and prove regret bounds of order O(log T) and Olog2 T) respectively. Extensive experiments show the convergence and effectiveness of our algorithms.

Multi-level Contrastive Learning Framework for Sequential Recommendation

Sequential recommendation (SR) aims to predict the subsequent behaviors of users by understanding their successive historical behaviors. Recently, some methods for SR are devoted to alleviating the data sparsity problem (i.e., limited supervised signals for training), which take account of contrastive learning to incorporate self-supervised signals into SR. Despite their achievements, it is far from enough to learn informative user/item embeddings due to the inadequacy modeling of complex collaborative information and co-action information, such as user-item relation, user-user relation, and item-item relation. In this paper, we study the problem of SR and propose a novel multi-level contrastive learning framework for sequential recommendation, named MCLSR. Different from the previous contrastive learning-based methods for SR, MCLSR learns the representations of users and items through a cross-view contrastive learning paradigm from four specific views at two different levels (i.e., interest- and feature-level). Specifically, the interest-level contrastive mechanism jointly learns the collaborative information with the sequential transition patterns, and the feature-level contrastive mechanism re-observes the relation between users and items via capturing the co-action information (i.e., co-occurrence). Extensive experiments on four real-world datasets show that the proposed MCLSR outperforms the state-of-the-art methods consistently.

Dynamic Hypergraph Learning for Collaborative Filtering

Hypergraph-based collaborative filtering for recommendations has emerged as an important research topic due to its ability to model complex relations among users and items. However, most existing methods typically construct the hypergraph structures using heuristics (e.g., motifs and jump connections) based on existing graphs (e.g., user-item bipartite graphs and social networks). From a learning perspective, we argue that the fixed heuristic topology of hypergraph may become a limitation and thus potentially compromise the recommendation performance. To tackle this issue, we propose a novel dynamic hypergraph learning framework for collaborative filtering (DHLCF), which learns hypergraph structures and makes recommendations collectively in a unified framework. In the hypergraph learning process, we solve two main challenges, i.e., 1) optimization issue and 2) regularization issue. Firstly, we propose a differentiable hypergraph learner to adaptively learn the optimized hypergraph structures dynamically for the hypergraph convolutions during the training process. Secondly, to better regularize dynamic hypergraph learning, we introduce a novel hypergraph learning objective, which forces the learned hypergraphs to retain the original graph topology. Extensive experiments on public datasets from different domains are provided to show that our proposed model significantly outperforms strong baselines.

Dynamic Transfer Gaussian Process Regression

In this paper, we work on a challenging dynamic transfer regression problem where domains come in a streaming manner. At each time stage, a new domain emerges and is taken as the target domain while all the domains in previous time stages are taken as source domains. We propose a transfer Gaussian process model GPdk with a novel dynamic transfer kernel DyTK to handle the dynamic transfer regression problem. Specifically, DyTK is with a sequential form to fit the domain stream. To adaptively control the knowledge transfer strength, DyTK is designed to be capable of modeling the inter-domain relatedness of every inter-domain pair. A theorem that ensures DyTK to be positive semi-definite is then proposed. We also theoretically analyze the transfer performance of GPdk by deriving its generalization error bounds. The error bounds further motivate us to propose a parameter reuse strategy to alleviate the scalability issue of GPdk along time. Extensive experiments on both synthetic and real-world datasets show the effectiveness of GPdk in handling dynamic transfer regression problems.

Certified Robustness to Word Substitution Ranking Attack for Neural Ranking Models

Neural ranking models (NRMs) have achieved promising results in information retrieval. NRMs have also been shown to be vulnerable to adversarial examples. A typical Word Substitution Ranking Attack (WSRA) against NRMs was proposed recently, in which an attacker promotes a target document in rankings by adding human-imperceptible perturbations to its text. This raises concerns when deploying NRMs in real-world applications. Therefore, it is important to develop techniques that defend against such attacks for NRMs. In empirical defenses adversarial examples are found during training and used to augment the training set. However, such methods offer no theoretical guarantee on the models' robustness and may eventually be broken by other sophisticated WSRAs. To escape this arms race, rigorous and provable certified defense methods for NRMs are needed.

To this end, we first define the Certified Top-K Robustness for ranking models since users mainly care about the top ranked results in real-world scenarios. A ranking model is said to be Certified Top-K Robust on a ranked list when it is guaranteed to keep documents that are out of the top K away from the top K under any attack. Then, we introduce a Certified Defense method, named CertDR, to achieve certified top-K robustness against WSRA, based on the idea of randomized smoothing. Specifically, we first construct a smoothed ranker by applying random word substitutions on the documents, and then leverage the ranking property jointly with the statistical property of the ensemble to provably certify top-K robustness. Extensive experiments on two representative web search datasets demonstrate that CertDR can significantly outperform state-of-the-art empirical defense methods for ranking models.

RelpNet: Relation-based Link Prediction Neural Network

Node-based link prediction methods have occupied a dominant position in the graph link prediction task. These methods commonly aggregate node features from the subgraph to generate the potential link representation. However, in constructing subgraphs, these methods extract each node's local neighborhood from the target node pair separately without considering the correlation between them and the whole node pair. As a result, many nodes in the subgraph may have little contribution to predicting the potential edge. Aggregating these node features will reduce the model's accuracy and efficiency. In addition, these methods indirectly represent the potential link by the node embeddings in the subgraph. We argue that this formalism is not the best choice for link prediction. In this paper, we propose a relation-based link prediction neural network named RelpNet, which aggregates edge features along the structural interactions between two target nodes and directly represents their relationship. RelpNet first extracts paths between the target node pair as structural interactions, which have strong correlations with the whole node pair and fewer nodes and edges than node-based methods' subgraph. To aggregate edge embeddings along the links between edges, we propose transforming the paths into a line graph. Then, the Tree-LSTM model is adopted to transfer and aggregate the node embeddings in the line graph as a comprehensive representation of the target node pair. We evaluate RelpNet on 7 benchmark datasets against 15 popular and state-of-the-art approaches, and the results demonstrate its significant superiority and high training efficiency.

Adapting Triplet Importance of Implicit Feedback for Personalized Recommendation

Implicit feedback is frequently used for developing personalized recommendation services due to its ubiquity and accessibility in real-world systems. In order to effectively utilize such information, most research adopts the pairwise ranking method on constructed training triplets (user, positive item, negative item) and aims to distinguish between positive items and negative items for each user. However, most of these methods treat all the training triplets equally, which ignores the subtle difference between different positive or negative items. On the other hand, even though some other works make use of the auxiliary information (e.g., dwell time) of user behaviors to capture this subtle difference, such auxiliary information is hard to obtain. To mitigate the aforementioned problems, we propose a novel training framework named Triplet Importance Learning (TIL), which adaptively learns the importance score of training triplets. We devise two strategies for the importance score generation and formulate the whole procedure as a bilevel optimization, which does not require any rule-based design. We integrate the proposed training procedure with several Matrix Factorization (MF)- and Graph Neural Network (GNN)-based recommendation models, demonstrating the compatibility of our framework. Via a comparison using three real-world datasets with many state-of-the-art methods, we show that our proposed method outperforms the best existing models by 3-21% in terms of Recall@k for the top-k recommendation.

Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval

Cross-modal hashing (CMH) has been widely used in multimedia retrieval applications for its low storage cost and fast indexing speed. Thanks to the success of deep learning, cross-modal hashing has made significant progress with high-quality deep features. However, the modal gap is still a crucial bottleneck for existing cross-modal hashing methods: the commonly used convolutional neural network and bag-of-words encoders are customized for single modal prior, limiting the models to learn semantics representation in a cross-modal space. To overcome modality heterogeneity, we propose a shared transformer encoder (UniHash) to unify the cross-modal hashing into the same semantic space. A contrastive label correlation learning (CLC) loss using the category labels as modality bridge is designed together to improve the representation quality. Moreover, we take advantage of the multi-hot label space and propose a negative label generation (NegLG) strategy to get richer and uniformly distributed negative labels for contrast. Extensive experiments on three benchmarks verify the advantage of our proposed method. Besides, the proposed UniHash outperforms state-of-the-art cross-modal hashing methods significantly, establishing a new important baseline for the cross-modal hashing research. Codes are released

Leveraging Multiple Types of Domain Knowledge for Safe and Effective Drug Recommendation

Predicting drug combinations according to patients' electronic health records is an essential task in intelligent healthcare systems, which can assist clinicians in ordering safe and effective prescriptions. However, existing work either missed/underutilized the important information lying in the drug molecule structure in drug encoding or has insufficient control over Drug-Drug Interactions (DDIs) rates within the predictions. To address these limitations, we propose CSEDrug, which enhances the drug encoding and DDIs controlling by leveraging multi-faceted drug knowledge, including molecule structures of drugs, Synergistic DDIs (SDDIs), and Antagonistic DDIs (ADDIs). We integrate these types of knowledge into CSEDrug by a graph-based drug encoder and multiple loss functions, including a novel triplet learning loss and a comprehensive DDI controllable loss. We evaluate the performance of CSEDrug in terms of accuracy, effectiveness, and safety on the public MIMIC-III dataset. The experimental results demonstrate that CSEDrug outperforms several state-of-the-art methods and achieves a 2.93% and a 2.77% increase in the Jaccard similarity scores and F1 scores, meanwhile, a 0.68% reduction of the ADDI rate (safer drug combinations), and 0.69% improvement of the SDDI rate (more effective drug combinations).

FedCDR: Federated Cross-Domain Recommendation for Privacy-Preserving Rating Prediction

The cold-start problem, faced when providing recommendations to newly joined users with no historical interaction record existing in the platform, is one of the most critical problems that negatively impact the performance of a recommendation system. Fortunately, cross-domain recommendation~(CDR) is a promising approach for solving this problem, which can exploit the knowledge of these users from source domains to provide recommendations in the target domain. However, this method requires that the central server has the interaction behaviour data in both domains of all the users, which prevents users from participating due to privacy issues.

In this work, we propose FedCDR, a federated learning based cross-domain recommendation system that effectively trains the recommendation model while keeping users' raw data and private user-specific parameters located on their own devices. Unlike existing CDR models, a personal module and a transfer module are designed to adapt to the extremely heterogeneous data on the participating devices. Specifically, the personal module extracts private user features for each user, while the transfer module is responsible for transferring the knowledge between the two domains. Moreover, in order to provide personalized recommendations with less storage and communication costs while effectively protecting privacy, we design a personalized update strategy for each client and a personalized aggregation strategy for the server. In addition, we conduct comprehensive experiments on the representative Amazon 5-cores datasets for three popular rating prediction tasks to evaluate the effectiveness of FedCDR. The results show that FedCDR outperforms the state-of-the-art methods in mean absolute error (MAE) and root mean squared error (RMSE). For example, in task Movie&Music, FedCDR can effectively improve the performance up to 65.83% and 55.45% on MAE and RMSE, respectively, when the new users are in the movie domain.

Incorporating Peer Reviews and Rebuttal Counter-Arguments for Meta-Review Generation

Peer review is an essential part of the scientific process in which the research papers are assessed by several reviewers. The author rebuttal phase, which is held at most top conferences, provides an opportunity for the authors to defend their work against the arguments made by the reviewers. The strengths and the weaknesses pointed out by the reviewers, as well as the authors' responses, will be evaluated by the area chair. The final decisions generally accompany meta-reviews regarding the reason for acceptance/rejection. Previous research has studied the generation of meta-review using transformer-based summarization models. However, few of them consider the rebuttals' content and the interaction between reviews and rebuttals' arguments, where the argumentation persuasiveness plays an important role in affecting the final decision. To generate a comprehensive meta-review that well organizes reviewers' opinions and authors' responses, we present a novel generation model that is capable of explicitly modeling the complicated argumentation structure from not only arguments between the reviewers and the authors but also the inter-reviewer discussions. Experimental results show that our model outperforms baselines in terms of both automatic evaluation and human evaluation, demonstrating the effectiveness of our approach.

A Gumbel-based Rating Prediction Framework for Imbalanced Recommendation

Rating prediction is a core problem in recommender systems to quantify users' preferences towards items. However, rating imbalance naturally roots in real-world user ratings that cause biased predictions and lead to poor performance on tail ratings. While existing approaches in the rating prediction task deploy weighted cross-entropy to re-weight training samples, such approaches commonly assume a normal distribution, a symmetrical and balanced space. In contrast to the normal assumption, we propose a novel Gumbel-based Variational Network framework (GVN) to model rating imbalance and augment feature representations by the Gumbel distributions. We propose a Gumbel-based variational encoder to transform features into non-normal vector space. Second, we deploy a multi-scale convolutional fusion network to integrate comprehensive views of users and items from the rating matrix and user reviews. Third, we adopt a skip connection module to personalize final rating predictions. We conduct extensive experiments on five datasets with both errors- and ranking-based metrics. Experiments on ranking and regression evaluation tasks prove that the GVN can effectively achieve state-of-the-art performance across the datasets and reduce the biased predictions of tail ratings. We compare with various distributions (e.g., normal and Poisson) and demonstrate the effectiveness of Gumbel-based methods on class-imbalance modeling. The code is available at

RISE: A Velocity Control Framework with Minimal Impacts based on Reinforcement Learning

Velocity control in autonomous driving is an emerging technology that has achieved rapid progress over the last decade. However, existing velocity control models are developed in single-lane scenarios and ignore the negative impacts caused by harsh velocity changes. In this work, we propose a velocity control framework based on reinforcement learning, called RISE (contRol velocIty for autonomouS vEhicle). In multi-lane circumstances, RISE improves velocity decisions regarding the autonomous vehicle itself, while minimizing impacts on rear vehicles. To achieve multiple objectives, we propose a hybrid reward function to rate each velocity decision from four aspects: safety, efficiency, comfort, and negative impact to guide the autonomous vehicle. Among these reward factors, the negative impact is used to penalize the harsh actions of the autonomous vehicle, thus prompting it to reduce the negative impacts on its rear vehicles. To detect the latent perturbations among surrounding vehicles in multiple lanes, we propose an attention-based encoder to learn the positions and interactions from an impact graph. Extensive experiments evidence that RISE enables safe driving, and outperforms state-of-the-art methods in efficiency, comfort, and alleviating negative impacts.

Representation Matters When Learning From Biased Feedback in Recommendation

The logged feedback for training recommender systems is usually subject to selection bias, which could not reflect real user preference. Thus, many efforts have been made to learn the de-biased recommender system from biased feedback. However, existing methods for dealing with selection bias are usually affected by the error of propensity weight estimation, have high variance, or assume access to uniform data, which is expensive to be collected in practice. In this work, we address these issues by proposing Learning De-biased Representations (LDR), a framework derived from the representation learning perspective. LDR bridges the gap between propensity weight estimation (WE) and unbiased weighted learning (WL) and provides an end-to-end solution that iteratively conducts WE and WL. We show LDR can effectively alleviate selection bias with bounded variance. We also perform theoretical analysis on the statistical properties of LDR, such as its bias, variance, and generalization performance. Extensive experiments on both semi-synthetic and real-world datasets demonstrate the effectiveness of LDR.

MARINA: An MLP-Attention Model for Multivariate Time-Series Analysis

The proliferation of real-time monitoring applications such as Artificial Intelligence for IT Operations (AIOps) and the Internet of Things (IoT) has led to the generation of a vast amount of time-series data. To extract the underlying value of the data, both the industry and the academia are in dire need of efficient and effective methods for time-series analysis. To this end, in this paper, we propose a Multi-layer perceptron (<u>M</u>LP)-<u>a</u>ttention based multivariate time-se<u>ri</u>es a<u>na</u>lysis model MARINA. MARINA is designed to simultaneously learn the temporal and spatial correlations among multivariate time-series. Also, the model is versatile in that it is suitable for major time-series analysis tasks such as forecasting and anomaly detection. Through extensive comparisons with the representative multivariate time-series forecasting and anomaly detection algorithms, MARINA is shown to achieve state-of-the-art (SOTA) performance in both forecasting and anomaly detection tasks.

Large-scale Entity Alignment via Knowledge Graph Merging, Partitioning and Embedding

Entity alignment is a crucial task in knowledge graph fusion. However, most entity alignment approaches have the scalability problem. Recent methods address this issue by dividing large KGs into small blocks for embedding and alignment learning in each. However, such a partitioning and learning process results in an excessive loss of structure and alignment. Therefore, in this work, we propose a scalable GNN-based entity alignment approach to reduce the structure and alignment loss from three perspectives. First, we propose a centrality-based subgraph generation algorithm to recall some landmark entities serving as the bridges between different subgraphs. Second, we introduce self-supervised entity reconstruction to recover entity representations from incomplete neighborhood subgraphs, and design cross-subgraph negative sampling to incorporate entities from other subgraphs in alignment learning. Third, during the inference process, we merge the embeddings of subgraphs to make a single space for alignment search. Experimental results on the benchmark OpenEA dataset and the proposed large DBpedia1M dataset verify the effectiveness of our approach.

AutoQGS: Auto-Prompt for Low-Resource Knowledge-based Question Generation from SPARQL

This study investigates the task of knowledge-based question generation (KBQG). Conventional KBQG works generated questions from fact triples in the knowledge graph, which could not express complex operations like aggregation and comparison in SPARQL. Moreover, due to the costly annotation of large-scale SPARQL-question pairs, KBQG from SPARQL under low-resource scenarios urgently needs to be explored. Recently, since the generative pre-trained language models (PLMs) typically trained in natural language (NL)-to-NL paradigm have been proven effective for low-resource generation, e.g., T5 and BART, how to effectively utilize them to generate NL-question from non-NL SPARQL is challenging. To address these challenges, AutoQGS, an auto-prompt approach for low-resource KBQG from SPARQL, is proposed. Firstly, we put forward to generate questions directly from SPARQL for KBQG task to handle complex operations. Secondly, we propose an auto-prompter trained on large-scale unsupervised data to rephrase SPARQL into NL description, smoothing the low-resource transformation from non-NL SPARQL to NL question with PLMs. Experimental results on the WebQuestionsSP, ComlexWebQuestions 1.1, and PathQuestions show that our model achieves state-of-the-art performance, especially in low-resource settings. Furthermore, a corpora of 330k factoid complex question-SPARQL pairs is generated for further KBQG research.

Dually Enhanced Propensity Score Estimation in Sequential Recommendation

Sequential recommender systems train their models based on a large amount of implicit user feedback data and may be subject to biases when users are systematically under/over-exposed to certain items. Unbiased learning based on inverse propensity scores (IPS), which estimate the probability of observing a user-item pair given the historical information, has been proposed to address the issue. In these methods, propensity score estimation is usually limited to the view of item, that is, treating the feedback data as sequences of items that interacted with the users. However, the feedback data can also be treated from the view of user, as the sequences of users that interact with the items. Moreover, the two views can jointly enhance the propensity score estimation. Inspired by the observation, we propose to estimate the propensity scores from the views of user and item, called Dually Enhanced Propensity Score Estimation (DEPS). Specifically, given a target user-item pair and the corresponding item and user interaction sequences, DEPS first constructs a time-aware causal graph to represent the user-item observational probability. According to the graph, two complementary propensity scores are estimated from the views of item and user, respectively, based on the same set of user feedback data. Finally, two transformers are designed to make use of the two propensity scores and make the final preference prediction. Theoretical analysis showed the unbiasedness and variance of DEPS. Experimental results on three publicly available benchmarks and a proprietary industrial dataset demonstrated that DEPS can significantly outperform the state-of-the-art baselines.

Taxonomy-Enhanced Graph Neural Networks

Despite the recent success of Graph Neural Networks (GNNs), their learning pipeline is guided only by the input graph and the desired output of certain tasks, failing to capture useful patterns when not enough data are presented. Existing attempts incorporate auxiliary knowledge to mitigate this issue, most of which are not in a unified structure or hard to obtain. Noticing that nodes in graphs usually form implicit hierarchical structures, we proposed to integrate category taxonomies into the learning process of GNNs. A category taxonomy is a form of domain knowledge with a hierarchical tree structure, which is widely adopted in real-world scenarios. In this paper, we introduce Taxonomy-Enhanced Graph Neural Networks (Taxo-GNN). Specifically, we jointly optimize the taxonomy representation and node representation tasks, where categories in taxonomy are mapped to Gaussian distributions and nodes are embedded with the GNN framework. To characterize the bidirectional interaction between the taxonomy and the graph, the model is comprised of two modules, namely information distillation for taxonomy and knowledge fusion to graph. Information is first distilled from the graph and aligned with the hierarchical structure of the taxonomy in a bottom-to-top mechanism.After that, knowledge brought by the taxonomy is in turn fused to the graph convolution process, in the form of taxonomy-aware aggregation weights and taxonomy-augmented contexts. Extensive experiments on real-world datasets in multiple downstream tasks verify the effectiveness of our model.

Traffic Speed Imputation with Spatio-Temporal Attentions and Cycle-Perceptual Training

The phenomena of data missing are common in the field of traffic, yet existing solutions for data imputation are not sufficient due to challenges of data sparsity, complex traffic situations and the lack of complete ground truths. In this paper, we propose a novel solution called STCPA for the speed imputation problem. STCPA captures complex traffic correlations among the spatial and temporal dimensions via the attention mechanism, which helps mitigate the data sparsity issue. In addition, STCPA adopts an imputation cycle consistency constraint for providing reliable supervisions on unobserved entries, which improves the training. Furthermore, it incorporates an extra Road-aware Perceptual Loss, which helps encourage to preserve more meaningful semantics for imputation. Extensive experiments are conducted on two real-world datasets, namely, Chengdu and New York, to demonstrate the effectiveness of STCPA, e.g., it outperforms the best baseline by 7.64% and 5.00% on Chengdu and New York datasets, respectively. The code is available at

Match-Prompt: Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning

Text matching is a fundamental technique in both information retrieval and natural language processing. Text matching tasks share the same paradigm that determines the relationship between two given texts. The relationships vary from task to task, e.g. relevance in document retrieval, semantic alignment in paraphrase identification and answerable judgment in question answering. However, the essential signals for text matching remain in a finite scope, i.e. exact matching, semantic matching, and inference matching. Ideally, a good text matching model can learn to capture and aggregate these signals for different matching tasks to achieve competitive performance, while recent state-of-the-art text matching models, e.g. Pre-trained Language Models (PLMs), are hard to generalize. It is because the end-to-end supervised learning on task-specific dataset makes model overemphasize the data sample bias and task-specific signals instead of the essential matching signals, which ruins the generalization of model to different tasks. To overcome this problem, we adopt a specialization-generalization training strategy and refer to it as Match-Prompt. In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens. In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks. High diverse matching tasks avoid model fitting the data sample bias on a specific task, so that model can focus on learning the essential matching signals. Meanwhile, the prompt tokens obtained in the first step are added to the corresponding tasks to help the model distinguish different task-specific matching signals, as well as to form the basis prompt tokens for a new matching task. In this paper, we consider five common text matching tasks including document retrieval, open-domain question answering, retrieval-based dialogue, paraphrase identification, and natural language inference. Experimental results on eighteen public datasets show that Match-Prompt can improve multi-task generalization capability of PLMs in text matching and yield better in-domain multi-task, out-of-domain multi-task and new task adaptation performance than multi-task and task-specific models trained by previous fine-tuning paradigm.

Dynamic Causal Collaborative Filtering

Causal graph, as an effective and powerful tool for causal modeling, is usually assumed as a Directed Acyclic Graph (DAG). However, recommender systems usually involve feedback loops, defined as the cyclic process of recommending items, incorporating user feedback in model updates, and repeating the procedure. As a result, it is important to incorporate loops into the causal graphs to accurately model the dynamic and iterative data generation process for recommender systems. However, feedback loops are not always beneficial since over time they may encourage more and more narrowed content exposure, which if left unattended, may results in echo chambers. As a result, it is important to understand when the recommendations will lead to echo chambers and how to mitigate echo chambers without hurting the recommendation performance.

In this paper, we design a causal graph with loops to describe the dynamic process of recommendation. We then take Markov process to analyze the mathematical properties of echo chamber such as the conditions that lead to echo chambers. Inspired by the theoretical analysis, we propose a Dynamic Causal Collaborative Filtering ($\partial$CCF) model, which estimates users' post-intervention preference on items based on back-door adjustment and mitigates echo chamber with counterfactual reasoning. Multiple experiments are conducted on real-world datasets and results show that our framework can mitigate echo chambers better than other state-of-the-art frameworks while achieving comparable recommendation performance with the base recommendation models.

Evidence-aware Document-level Relation Extraction

Document-level Relation Extraction (RE) is a promising task aiming at identifying relations of multiple entity pairs in a document. However, in most cases, a relational fact can be expressed enough via a small subset of sentences from the document, namely evidence sentence. Moreover, there often exist strong semantic correlations between evidence sentences that collaborate together to describe a specific relation. To address these challenges, we propose a novel evidence-aware model for document-level RE. Particularly, we formulate evidence sentence selection as a sequential decision problem through a crafted reinforcement learning mechanism. Considering the explosive search space of our agent, an efficient path searching strategy is executed on the converted document graph to heuristically obtain hopeful sentences and feed them to reinforcement learning. Finally, each entity pair owns a customized-filtered document for further inferring the relation between them. We conduct various experiments on two document-level RE benchmarks and achieve a remarkable improvement over previous competitive baselines, verifying the effectiveness of our method.

Effects of Stubbornness on Opinion Dynamics

As an important factor governing opinion dynamics, stubbornness strongly affects various aspects of opinion formation. However, a systematically theoretical study about the influences of heterogeneous stubbornness on opinion dynamics is still lacking. In this paper, we study a popular opinion model in the presence of inhomogeneous stubbornness. We show analytically that heterogeneous stubbornness has a great impact on convergence time, expressed opinion of every node, and the overall expressed opinion. We provide an explanation of the expressed opinion in terms of stubbornness-dependent spanning diverging forests. We propose quantitative indicators to quantify some social concepts, including conflict, disagreement, and polarization by incorporating heterogeneous stubbornness, and develop a nearly linear time algorithm to approximate these quantities, which has a proved theoretical guarantee for the error of each quantity. To demonstrate the performance of our algorithm, we perform extensive experiments on a large set of real networks, which indicate that our algorithm is both efficient and effective, scalable to large networks with millions of nodes.

Drive Less but Finish More: Food Delivery based on Multi-Level Workers in Spatial Crowdsourcing

In this paper, we study the problem of on-demand food delivery in a new setting where two groups of workers -- riders and taxi drivers (drivers for short) -- cooperate with each other for better service. The riders are responsible for the first and the last mile, and the drivers are in charge of the cross-community transportation. We show this problem is generally NP-hard by a reduction from the well-known 3-dimensional matching (3DM). To tackle with this problem, we first reduce it to the maximum independent set problem and use a simple greedy strategy to design an approximate algorithm which has a polynomial time. Considering the exponents in the polynomial are not very small, we then transform the 3DM into two rounds of 2-dimensional matching and propose a fast algorithm to solve it. Though 3DM problem is NP-hard, we find the cooperation between riders and drivers form a special tripartite graph, based on which we construct a flow network and employ the min-cost max-flow algorithm to efficiently compute the exact solution. We conduct extensive experiments to show the efficiency and the effectiveness of our proposed algorithms.

Dissecting Cross-Layer Dependency Inference on Multi-Layered Inter-Dependent Networks

Multi-layered inter-dependent networks have emerged in a wealth of high-impact application domains. Cross-layer dependency inference, which aims to predict the dependencies between nodes across different layers, plays a pivotal role in such multi-layered network systems. Most, if not all, of existing methods exclusively follow a coupling principle of design and can be categorized into the following two groups, including (1) heterogeneous network embedding based methods (data coupling), and (2) collaborative filtering based methods (module coupling). Despite the favorable achievement, methods of both types are faced with two intricate challenges, including (1) the sparsity challenge where very limited observations of cross-layer dependencies are available, resulting in a deteriorated prediction of missing dependencies, and (2) the dynamic challenge given that the multi-layered network system is constantly evolving over time.

In this paper, we first demonstrate that the inability of existing methods to resolve the sparsity challenge roots in the coupling principle from the perspectives of both data coupling and module coupling. Armed with such theoretical analysis, we pursue a new principle where the key idea is to decouple the within-layer connectivity from the observed cross-layer dependencies. Specifically, to tackle the sparsity challenge for static networks, we propose FITO-S, which incorporates a position embedding matrix generated by random walk with restart and the embedding space transformation function. More essentially, the decoupling principle ameliorates the dynamic challenge, which naturally leads to FITO-D, being capable of tracking the inference results in the dynamic setting through incrementally updating the position embedding matrix and fine-tuning the space transformation function. Extensive evaluations on real-world datasets demonstrate the superiority of the proposed framework FITO for cross-layer dependency inference.

Semi-supervised Hypergraph Node Classification on Hypergraph Line Expansion

Previous hypergraph expansions are solely carried out on either vertex level or hyperedge level, thereby missing the symmetric nature of data co-occurrence, and resulting in information loss. To address the problem, this paper treats vertices and hyperedges equally and proposes a new hypergraph expansion named the line expansion(LE) for hypergraphs learning. The new expansion bijectively induces a homogeneous structure from the hypergraph by modeling vertex-hyperedge pairs. Our proposal essentially reduces the hypergraph to a simple graph, which enables the existing graph learning algorithms to work seamlessly with the higher-order structure. We further prove that our line expansion is a unifying framework over various hypergraph expansions. We evaluate the proposed LE on five hypergraph datasets in terms of the hypergraph node classification task. The results show that our method could achieve at least 2% accuracy improvement over the best baseline consistently.

Hierarchical Representation for Multi-view Clustering: From Intra-sample to Intra-view to Inter-view

Multi-view clustering (MVC) aims at exploiting the consistent features within different views to divide samples into different clusters. Existing subspace-based MVC algorithms usually assume linear subspace structures and two-stage similarity matrix construction strategies, thereby posing challenges in imprecise low-dimensional subspace representation and inadequacy of exploring consistency. This paper presents a novel hierarchical representation for MVC method via the integration of intra-sample, intra-view, and inter-view representation learning models. In particular, we first adopt the deep autoencoder to adaptively map the original high-dimensional data into the latent low-dimensional representation of each sample. Second, we use the self-expression of the latent representation to explore the global similarity between samples of each view and obtain the subspace representation coefficients. Third, we construct the third-order tensor by arranging multiple subspace representation matrices and impose the tensor low-rank constraint to sufficiently explore the consistency among views. Being incorporated into a unified framework, these three models boost each other to achieve a satisfactory clustering result. Moreover, an alternating direction method of multipliers algorithm is developed to solve the challenging optimization problem. Extensive experiments on both simulated and real-world multi-view datasets show the superiority of the proposed method over eight state-of-the-art baselines.

GROWN+UP: A ''Graph Representation Of a Webpage" Network Utilizing Pre-training

Large pre-trained neural networks are ubiquitous and critical to the success of many downstream tasks in natural language processing and computer vision. However, within the field of web information retrieval, there is a stark contrast in the lack of similarly flexible and powerful pre-trained models that can properly parse webpages. Consequently, we believe that common machine learning tasks like content extraction and information mining from webpages have low-hanging gains that yet remain untapped.

We aim to close the gap by introducing an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually.

Finally, we show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification, thus lending support to its potential application in diverse downstream tasks.

Scalable Graph Sampling on GPUs with Compressed Graph

GPU is a powerful accelerator for parallel computation. Graph sampling is a fundamental technology for large-scale graph analysis and learning. To accelerate graph sampling using GPUs, recently some solutions like NextDoor, C-SAW have been proposed. However, these solutions cannot handle large graphs efficiently because of the massive memory footprint and expensive transfer cost between CPU and GPU. In this work, we introduce a Chunk-wise Graph Compression format (CGC) to effectively reduce the graph size and save the graph transfer cost. Meanwhile, CGC supports fast visiting any single neighbor of a vertex and is friendly to the graph sampling task. Specifically, CGC first balances the graph compression ratio and decompression efficiency by dividing a neighbor vertex list into chunks. Then it applies a new compression strategy called linear estimation to compress each chunk and allows users to visit a single vertex in O(1) time complexity. Finally, based on the CGC, we develop a scalable GPU-based graph sampling framework GraSS, and evaluate the efficiency and scalability of GraSS on both real-world and synthetic graphs. The empirical results demonstrate that GraSS can support various graph sampling methods on large graphs with high efficiency when the state-of-the-art solutions are out-of-memory or exceed the time limit.

A Biased Sampling Method for Imbalanced Personalized Ranking

Pairwise ranking models have been widely used to address recommendation problems. The basic idea is to learn the rank of users' preferred items through separating items into positive samples if user-item interactions exist, and negative samples otherwise. Due to the limited number of observable interactions, pairwise ranking models face serious class-imbalance issues. Our theoretical analysis shows that current sampling-based methods cause the vertex-level imbalance problem, which makes the norm of learned item embeddings towards infinite after a certain training iterations, and consequently results in vanishing gradient and affects the model inference results. We thus propose an efficient <u>Vi</u>tal <u>N</u>egative <u>S</u>ampler (VINS) to alleviate the class-imbalance issue for pairwise ranking model, in particular for deep learning models optimized by gradient methods. The core of VINS is a bias sampler with reject probability that will tend to accept a negative candidate with a larger degree weight than the given positive item. Evaluation results on several real datasets demonstrate that the proposed sampling method speeds up the training procedure 30% to 50% for ranking models ranging from shallow to deep, while maintaining and even improving the quality of ranking results in top-N item recommendations.

The Interaction Graph Auto-encoder Network Based on Topology-aware for Transferable Recommendation

Deep learning-based recommendation systems have made significant strides in recent years. However, the problem of recommendation systems' generalizability has not been solved. After the training phase, most current models can only solve problems on a particular dataset and are not as generalizable as NLP and CV models. Therefore, a large amount of computing power is required to make conventional recommendation models available to different trades. In real-world scenarios, offline retailers often opt out of recommendation algorithms due to a lack of computer capacity, which puts them at a competitive disadvantage. As a result, we propose an Interaction Graph Auto-encoder Network (IGA) based on topology-aware to address the transferable recommendation problem. IGA is composed primarily of the following components: Interaction Feature Subgraph Extraction, Subgraph Node Labeling, Subgraph Interaction Auto-encoder, and Interaction Preference Attention Network. IGA can transfer knowledge from the training dataset to the new dataset without fine-tuning and give users reliable, personalized recommendation results. Experiments on the MovieLens, Douban, LastFM, and Book-Crossing datasets demonstrate that IGA outperforms state-of-the-art approaches in transferable scenarios. Additionally, IGA requires fewer computing power and is highly adaptable across datasets.

Cognize Yourself: Graph Pre-Training via Core Graph Cognizing and Differentiating

While Graph Neural Networks (GNNs) have become de facto criterion in graph representation learning, they still suffer from label scarcity and poor generalization. To alleviate these issues, graph pre-training has been proposed to learn universal patterns from unlabeled data via applying self-supervised tasks. Most existing graph pre-training methods only use a single self-supervised task, which will lead to insufficient knowledge mining. Recently, there are also some works that try to use multiple self-supervised tasks, however, we argue that these methods still suffer from a serious problem, which we call it graph structure impairment. That is, there actually exists structural gaps among several tasks due to the divergence of optimization objectives, which means customized graph structures should be provided for different self-supervised tasks. Graph structure impairment not only significantly hurts the generalizability of pre-trained GNNs, but also leads to suboptimal solution, and there is no study so far to address it well. Motivated by Meta-Cognitive theory, we propose a novel model named Core Graph Cognizing and Differentiating (CORE) to deal with the problem in an effective approach. Specifically, CORE consists of cognizing network and differentiating process, the former cognizes a core graph which stands for the essential structure of the graph, and the latter allows it to differentiate into several task-specific graphs for different tasks. Besides, this is also the first study to combine graph pre-training with cognitive theory to build a cognition-aware model. Several experiments have been conducted to demonstrate the effectiveness of CORE.

Contrastive Domain Adaptation for Early Misinformation Detection: A Case Study on COVID-19

Despite recent progress in improving the performance of misinformation detection systems, classifying misinformation in an unseen domain remains an elusive challenge. To address this issue, a common approach is to introduce a domain critic and encourage domain-invariant input features. However, early misinformation often demonstrates both conditional and label shifts against existing misinformation data (e.g., class imbalance in COVID-19 datasets), rendering such methods less effective for detecting early misinformation. In this paper, we propose contrastive adaptation network for early misinformation detection (CANMD). Specifically, we leverage pseudo labeling to generate high-confidence target examples for joint training with source data. We additionally design a label correction component to estimate and correct the label shifts (i.e., class priors) between the source and target domains. Moreover, a contrastive adaptation loss is integrated in the objective function to reduce the intra-class discrepancy and enlarge the inter-class discrepancy. As such, the adapted model learns corrected class priors and an invariant conditional distribution across both domains for improved estimation of the target data distribution. To demonstrate the effectiveness of the proposed CANMD, we study the case of COVID-19 early misinformation detection and perform extensive experiments using multiple real-world datasets. The results suggest that CANMD can effectively adapt misinformation detection systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines.

LTE4G: Long-Tail Experts for Graph Neural Networks

Existing Graph Neural Networks (GNNs) usually assume a balanced situation where both the class distribution and the node degree distribution are balanced. However, in real-world situations, we often encounter cases where a few classes (i.e., head class) dominate other classes (i.e., tail class) as well as in the node degree perspective, and thus naively applying existing GNNs eventually fall short of generalizing to the tail cases. Although recent studies proposed methods to handle long-tail situations on graphs, they only focus on either the class long-tailedness or the degree long-tailedness. In this paper, we propose a novel framework for training GNNs, called Long-Tail Experts for Graphs (LTE4G), which jointly considers the class long-tailedness, and the degree long-tailedness for node classification. The core idea is to assign an expert GNN model to each subset of nodes that are split in a balanced manner considering both the class and degree long-tailedness. After having trained an expert for each balanced subset, we adopt knowledge distillation to obtain two class-wise students, i.e., Head class student and Tail class student, each of which is responsible for classifying nodes in the head classes and tail classes, respectively. We demonstrate that LTE4G outperforms a wide range of state-of-the-art methods in node classification evaluated on both manual and natural imbalanced graphs. The source code of LTE4G can be found at

Joint Clothes Detection and Attribution Prediction via Anchor-free Framework with Decoupled Representation Transformer

Clothes attribution prediction is the key technology for users to automatically describe clothing characteristics. Most current methods are first to detect the multiple clothes, and then crop out the clothes and feed to a certain network for clothes attribution prediction. But this two-stage approach is time- and resource- consuming; on the other hand, one-stage approach can provide an effective and efficient solution by integrating clothes detection and attribution prediction into an end-to-end framework. But the one-stage approach tends to explore anchor-based detectors causing high sensitivity to the hyperparameters and high computational complexity from dense anchors. In addition, it may also confront with optimization contradiction problem in the training procedure, as the clothes detection and attribution prediction branches demand diverse optimization. In this work, to handle the above problems, we aim to develop an end-to-end anchor-free framework by involving an additional branch for joint clothes detection and attribution prediction. To handle the optimization contradiction in two branches, we encode the backbone feature map as pixel-level dense queries and decode them via deformable transformer as the output features that are fed into detection and prediction branches, respectively. In this way, the features of detection and prediction branches can be decoupled and the optimization contradiction can be naturally solved. To further enhance the prediction accuracy, we in the prediction branch also develop a special attention strategy and loss function to adaptively integrate the peer attribution relationships into feature learning as well as to avoid mutual suppression for hierarchical attributions. Extensive simulation results verify the effectiveness of the proposed work.

Causal Learning Empowered OD Prediction for Urban Planning

Predicting future origin-destination (OD) flow is essential for urban planning since it provides feedback for planning adjustment and reference for road planning. However, OD prediction for urban planning scenarios is unique as it typically lacks training data. A common practice is to refer to data from other cities, which causes the out-of-distribution (OOD) problem. A promising solution is to leverage causal information in the data. However, there are two challenges in utilizing causal information in urban planning scenarios: (a) Urban system has numerous factors, and only part of them indicate causal information. (b) The planned city development correlates with original city characteristics, therefore bringing confounding bias to the causal modelling process. In this paper, we propose designs to solve both challenges. Specifically, we first design a causal disentangled representation module to identify causal factors in attributes. Second, we adopt a variational sample re-weighting module to reduce the confounding bias. Our proposed model outperforms seven state-of-the-art baselines on three real-world datasets, achieving an average improvement of 9.59% in the MAE metric. Further in-depth analysis shows our method's robustness across different urban planning scenarios and outstanding performance in predicting extremely large OD flows, which corroborates the contribution of our designs to the urban planning field.

Interactive Contrastive Learning for Self-Supervised Entity Alignment

Self-supervised entity alignment (EA) aims to link equivalent entities across different knowledge graphs (KGs) without the use of pre-aligned entity pairs. The current state-of-the-art (SOTA) self-supervised EA approach draws inspiration from contrastive learning, originally designed in computer vision based on instance discrimination and contrastive loss, and suffers from two shortcomings. Firstly, it puts unidirectional emphasis on pushing sampled negative entities far away rather than pulling positively aligned pairs close, as is done in the well-established supervised EA. Secondly, it advocates the minimum information requirement for self-supervised EA, while we argue that self-described KG's side information (e.g., entity name, relation name, entity description) shall preferably be explored to the maximum extent for the self-supervised EA task. In this work, we propose an interactive contrastive learning model for self-supervised EA. It conducts bidirectional contrastive learning via building pseudo-aligned entity pairs as pivots to achieve direct cross-KG information interaction. It further exploits the integration of entity textual and structural information and elaborately designs encoders for better utilization in the self-supervised setting. Experimental results show that our approach outperforms the previous best self-supervised method by a large margin (over 9% Hits@1 absolute improvement on average) and performs on par with previous SOTA supervised counterparts, demonstrating the effectiveness of the interactive contrastive learning for self-supervised EA. The code and data are available at

Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we may need different sampling strategies for different datasets and base classifiers, and they cannot directly optimize the performance metric. Motivated by this, we investigate developing a learning-based over-sampling algorithm to optimize the classification performance, which is a challenging task because of the huge and hierarchical decision space. At the high level, we need to decide how many synthetic samples to generate. At the low level, we need to determine where the synthetic samples should be located, which depends on the high-level decision since the optimal locations of the samples may differ for different numbers of samples. To address the challenges, we propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions. Motivated by the success of SMOTE and its extensions, we formulate the generation process as a Markov decision process (MDP) consisting of three levels of policies to generate synthetic samples within the SMOTE search space. Then we leverage deep hierarchical reinforcement learning to optimize the performance metric on the validation data. Extensive experiments on six real-world datasets demonstrate that AutoSMOTE significantly outperforms the state-of-the-art resampling algorithms. The code is at

Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models

A retrieval model should not only interpolate the training data but also extrapolate well to the queries that are different from the training data. While neural retrieval models have demonstrated impressive performance on ad-hoc search benchmarks, we still know little about how they perform in terms of interpolation and extrapolation. In this paper, we demonstrate the importance of separately evaluating the two capabilities of neural retrieval models. Firstly, we examine existing ad-hoc search benchmarks from the two perspectives. We investigate the distribution of training and test data and find a considerable overlap in query entities, query intent, and relevance labels. This finding implies that the evaluation on these test sets is biased toward interpolation and cannot accurately reflect the extrapolation capacity. Secondly, we propose a novel evaluation protocol to separately evaluate the interpolation and extrapolation performance on existing benchmark datasets. It resamples the training and test data based on query similarity and utilizes the resampled dataset for training and evaluation. Finally, we leverage the proposed evaluation protocol to comprehensively revisit a number of widely-adopted neural retrieval models. Results show models perform differently when moving from interpolation to extrapolation. For example, representation-based retrieval models perform almost as well as interaction-based retrieval models in terms of interpolation but not extrapolation. Therefore, it is necessary to separately evaluate both interpolation and extrapolation performance and the proposed resampling method serves as a simple yet effective evaluation tool for future IR studies.

TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis

Time series anomaly detection is a challenging problem due to the complex temporal dependencies and the limited label data. Although some algorithms including both traditional and deep models have been proposed, most of them mainly focus on time-domain modeling, and do not fully utilize the information in the frequency domain of the time series data. In this paper, we propose a Time-Frequency analysis based time series Anomaly Detection model, or TFAD for short, to exploit both time and frequency domains for performance improvement. Besides, we incorporate time series decomposition and data augmentation mechanisms in the designed time-frequency architecture to further boost the abilities of performance and interpretability. Empirical studies on widely used benchmark datasets show that our approach obtains state-of-the-art performance in univariate and multivariate time series anomaly detection tasks.

Hierarchical Item Inconsistency Signal Learning for Sequence Denoising in Sequential Recommendation

Sequential recommender systems aim to recommend the next items in which target users are most interested based on their historical interaction sequences. In practice, historical sequences typically contain some inherent noise (e.g., accidental interactions), which is harmful to learn accurate sequence representations and thus misleads the next-item recommendation. However, the absence of supervised signals (i.e., labels indicating noisy items) makes the problem of sequence denoising rather challenging. To this end, we propose a novel sequence denoising paradigm for sequential recommendation by learning hierarchical item inconsistency signals. More specifically, we design a hierarchical sequence denoising (HSD) model, which first learns two levels of inconsistency signals in input sequences, and then generates noiseless subsequences (i.e., dropping inherent noisy items) for subsequent sequential recommenders. It is noteworthy that HSD is flexible to accommodate supervised item signals, if any, and can be seamlessly integrated with most existing sequential recommendation models to boost their performance. Extensive experiments on five public benchmark datasets demonstrate the superiority of HSD over state-of-the-art denoising methods and its applicability over a wide variety of mainstream sequential recommendation models. The implementation code is available at

Look Twice as Much as You Say: Scene Graph Contrastive Learning for Self-Supervised Image Caption Generation

Images are commonly used for various information and knowledge applications, such as advertising and recommendation. Automating image caption generation will significantly improve image accessibility. This cross-modal task, which takes image as input and text as output, however, is difficult for learning. Though prior methods achieve good performance for image caption generation, they rely on either supervised learning which requires sufficient labeled data or unsupervised learning which needs external dataset as language pivot. In this paper, we propose SGCL, a novel Scene Graph Contrastive Learning model for self-supervised image caption generation. SGCL adopts the pre-training and fine-tuning pipeline. Specifically, we first apply scene graph generation and objection detection method to encode scene graph and visual information in the image as feature representation. Later, a decoder network based on graph attention network and recurrent neural network is further designed to generate sequential text as caption. To enable contrastive learning in SGCL, we design scene graph augmentations as contrastive views of images and train the model effectively without ground-truth labels through contrastive learning. Additionally, we introduce the pre-trained word embedding and the context projector to enrich the text representation in the decoder network, which benefits model pre-training. Once the pre-training phase is finished, we further fine-tune the model for the image caption generation task with limited labeled data. Extensive experiments on benchmark dataset demonstrate that SGCL outperforms state-of-the-art models (both supervised and unsupervised).

Along the Time: Timeline-traced Embedding for Temporal Knowledge Graph Completion

Recent years have witnessed remarkable progress on knowledge graph embedding (KGE) methods to learn the representations of entities and relations in static knowledge graphs (SKGs). However, knowledge changes over time. In order to represent the facts happening in a specific time, temporal knowledge graph (TKG) embedding approaches are put forward. While most existing models ignore the independence of semantic and temporal information. We empirically find that current models have difficulty distinguishing representations of the same entity or relation at different timestamps. In this regard, we propose a TimeLine-Traced Knowledge Graph Embedding method (TLT-KGE) for temporal knowledge graph completion. TLT-KGE aims to embed the entities and relations with timestamps as a complex vector or a quaternion vector. Specifically, TLT-KGE models semantic information and temporal information as different axes of complex number space or quaternion space. Meanwhile, two specific components carving the relationship between semantic and temporal information are devised to buoy the modeling. In this way, the proposed method can not only distinguish the independence of the semantic and temporal information, but also establish a connection between them. Experimental results on the link prediction task demonstrate that TLT-KGE achieves substantial improvements over state-of-the-art competitors. The source code will be available on

Control-based Bidding for Mobile Livestreaming Ads with Exposure Guarantee

Mobile livestreaming ads are becoming a popular approach for brand promotion and product marketing. However, a large number of advertisers fail to achieve their desired advertising performance due to the lack of ad exposure guarantee in the dynamic advertising environment. In this work, we propose a bidding-based ad delivery algorithm for mobile livestreaming ads that can provide advertisers with bidding strategies for optimizing diverse marketing objectives under general ad performance guaranteed constraints, such as ad exposure and cost-efficiency constraints. By modeling the problem as an online integer programming and applying primal-dual theory, we can derive the bidding strategy from solving the optimal dual variables. The initialization of the dual variables is realized through a deep neural network that captures the complex relation between dual variables and dynamic advertising environments. We further propose a control-based bidding algorithm to adjust the dual variables in an online manner based on the real-time advertising performance feedback and constraints. Experiments on a real-world industrial dataset demonstrate the effectiveness of our bidding algorithm in terms of optimizing marketing objectives and guaranteeing ad constraints.

Disentangling Past-Future Modeling in Sequential Recommendation via Dual Networks

Sequential recommendation (SR) plays an important role in personalized recommender systems because it captures dynamic and diverse preferences from users' real-time increasing behaviors. Unlike the standard autoregressive training strategy, future data (also available during training) has been used to facilitate model training as it provides richer signals about users' current interests and can be used to improve the recommendation quality. However, existing methods suffer from a severe training-inference gap, i.e., both past and future contexts are modeled by the same encoder when training, while only historical behaviors are available during inference. This discrepancy leads to potential performance degradation. To alleviate the training-inference gap, we propose a new framework DualRec, which achieves past-future disentanglement and past-future mutual enhancement by a novel dual network. Specifically, a dual network structure is exploited to model the past and future context separately.And a bi-directional knowledge transferring mechanism enhances the knowledge learnt by the dual network. Extensive experiments on four real-world datasets demonstrate the superiority of our approach over baseline methods. Besides, we demonstrate the compatibility of DualRec by instantiating using different backbones. Further empirical analysis verifies the high utility of modeling future contexts under our DualRec framework.

Dismantling Complex Networks by a Neural Model Trained from Tiny Networks

Can we employ one neural model to efficiently dismantle many complex yet unique networks? This article provides an affirmative answer. Diverse real-world systems can be abstracted as complex networks each consisting of many functional nodes and edges. Percolation theory has indicated that removing only a few vital nodes can cause the collapse of whole network. However, finding the least number of such vital nodes is a rather challenging task for large networks due to its NP-hardness. Previous studies have proposed many centrality measures and heuristic algorithms to tackle this network dismantling (ND) problem. Different from theirs, this article tries to approach the ND task by designing a neural model which can be trained from tiny synthetic networks but will be applied for various real-world networks. It seems a discouraging mission at first sight, as network sizes and topologies are quite different across distinct real-world networks. Nonetheless, this article initiates insightful efforts of designing and training a neural influence ranking model (NIRM). Experiments on fifteen real-world networks validate its effectiveness for its mostly requiring fewer vital nodes to dismantle a network, compared with the state-of-the-art competitors. The key to its success lies in that our NIRM can efficiently encode both local structural and global topological signals for ranking nodes, in addition to our innovative labelling method in training dataset construction.

Disentangled Representation for Long-tail Senses of Word Sense Disambiguation

The long-tailed distribution, also called the heavy-tailed distribution, is common in nature. Since both words and their senses in natural language have long-tailed phenomenon in usage frequency, the Word Sense Disambiguation (WSD) task faces serious data imbalance. The existing learning strategies or data augmentation methods are difficult to deal with the lack of training samples caused by the single application scenario of long-tail senses, and the word sense representations caused by unique word sense definitions. Considering that the features extracted from the Disentangled Representation (DR) independently describe the essential properties of things, and DR does not require deep feature extraction and fusion processes, it alleviates the dependence of the representation learning on the training samples. We propose a novel DR by constraining the covariance matrix of a multivariate Gaussian distribution, which can enhance the strength of independence among features compared to β-VAE. The WSD model implemented by the reinforced DR outperforms the baselines on the English all-words WSD evaluation framework, the constructed long-tail word sense datasets, and the latest cross-lingual datasets.

Handling RDF Streams: Harmonizing Subgraph Matching, Adaptive Incremental Maintenance, and Matching-free Updates Together

RDF stream processing (RSP) has become a vibrant area of research in the Semantic Web community, which guarantees interoperability and opens up important applications. There have been efforts to extend RDF data and SPARQL query for representing streaming information and continuous querying functionalities. However, existing solutions will incur significant low throughput due to the recomputation of the results from scratch as the window slides. In this paper, we propose a novel graph-based framework, referred as IncTreeRDF, towards continuous SPARQL query evaluation over RDF data streams. Under the framework, the RDF data streams are modeled as streaming graphs; the SPARQL queries are translated into graph patterns and evaluated via continuous sub-graph pattern-matching over streaming RDF graphs. IncTreeRDF employs a query-centric auxiliary data structure called TStore to store some intermediate results, which supports fast incremental maintenance. Based on TStore, we can not only avoid re-computing matches of the query but also prune invalid updates. Besides, we define matching-free update, in which subgraph matching calculation can be avoided under this scenario. Extensive experimental results show that IncTreeRDF significantly outperforms existing competitors.

Contrastive Knowledge Graph Error Detection

Knowledge Graph (KG) errors introduce non-negligible noise, severely affecting KG-related downstream tasks. Detecting errors in KGs is challenging since the patterns of errors are unknown and diverse, while ground-truth labels are rare or even unavailable. A traditional solution is to construct logical rules to verify triples, but it is not generalizable since different KGs have distinct rules with domain knowledge involved. Recent studies focus on designing tailored detectors or ranking triples based on KG embedding loss. However, they all rely on negative samples for training, which are generated by randomly replacing the head or tail entity of existing triples. Such a negative sampling strategy is not enough for prototyping practical KG errors, e.g., (Bruce_Lee, place_of_birth, China), in which the three elements are often relevant, although mismatched. We desire a more effective unsupervised learning mechanism tailored for KG error detection. To this end, we propose a novel framework - ContrAstive knowledge Graph Error Detection (CAGED). It introduces contrastive learning into KG learning and provides a novel way of modeling KG. Instead of following the traditional setting, i.e., considering entities as nodes and relations as semantic edges, CAGED augments a KG into different hyper-views, by regarding each relational triple as a node. After joint training with KG embedding and contrastive learning loss, CAGED assesses the trustworthiness of each triple based on two learning signals, i.e., the consistency of triple representations across multi-views and the self-consistency within the triple. Extensive experiments on three real-world KGs show that CAGED outperforms state-of-the-art methods in KG error detection. Our codes and datasets are available at

A Simple Meta-path-free Framework for Heterogeneous Network Embedding

Network embedding has recently attracted attention a lot since networks are widely used in various data mining applications. Attempting to break the limitations of pre-set meta-paths and non-global node learning in existing models, we propose a simple but effective framework for heterogeneous network embedding learning by encoding the original multi-type nodes and relations directly in a self-supervised way. To be more specific, we first learn the relation-based embeddings for global nodes from the neighbor properties under each relation type and exploit an attentive fusion module to combine them. Then we design a multi-hop contrast to optimize the regional structure information by utilizing the strong correlation between nodes and their neighbor-graphs, where we take multiple relationships into consideration by multi-hop message passing instead of pre-set meta-paths. Finally, we evaluate our proposed method on various downstream tasks such as node clustering, node classification, and link prediction between two types of nodes. The experimental results show that our proposed approach significantly outperforms state-of-the-art baselines on these tasks.

Unsupervised Representation Learning on Attributed Multiplex Network

Embedding learning in multiplex networks has drawn increasing attention in recent years and achieved outstanding performance in many downstream tasks. However, most existing network embedding methods either only focus on the structured information of graphs, rely on the human-annotated data, or mainly rely on multi-layer GCNs to encode graphs at the risk of learning ill-posed spectral filters. Moreover, it is also challenging in multiplex network embedding to learn consensus embeddings for nodes across the multiple views by the inter-relationship among graphs. In this study, we propose a novel and flexible unsupervised network embedding method for attributed multiplex networks to generate more precise node embeddings by simplified Bernstein encoders and alternate contrastive learning between local and global. Specifically, we design a graph encoder based on simplified Bernstein polynomials to learn node embeddings of a specific graph view. During the learning of each specific view, local and global contrastive learning are alternately applied to update the view-specific embedding and the consensus embedding simultaneously. Furthermore, the proposed model can be easily extended as a semi-supervised model by adding additional semi-supervised cost or as an attention-based model to attentively integrate embeddings from multiple graphs. Experiments on three publicly available real-world datasets show that the proposed method achieves significant improvements on downstream tasks over state-of-the-art baselines, while being faster or competitive in terms of runtime compared to the previous studies.

Automating DBSCAN via Deep Reinforcement Learning

DBSCAN is widely used in many scientific and engineering fields because of its simplicity and practicality. However, due to its high sensitivity parameters, the accuracy of the clustering result depends heavily on practical experience. In this paper, we first propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process, which aims to find the best clustering parameters without manual assistance. DRL-DBSCAN learns the optimal clustering parameter search policy for different feature distributions via interacting with the clusters, using a weakly-supervised reward training policy network. In addition, we also present a recursive search mechanism driven by the scale of the data to efficiently and controllably process large parameter spaces. Extensive experiments are conducted on five artificial and real-world datasets based on the proposed four working modes. The results of offline and online tasks show that the DRL-DBSCAN not only consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively, but also can stably find the dominant parameters with high computational efficiency. The code is available at

GBERT: Pre-training User representations for Ephemeral Group Recommendation

Due to the prevalence of group activities on social networks, group recommendations have received an increasing number of attentions. Most group recommendation methods concentrated on dealing with persistent groups, while little attention has paid to ephemeral groups. Ephemeral groups are formed ad-hoc for one-time activities, and therefore they suffer severely from data sparsity and cold-start problems. To deal with such problems, we propose a pre-training and fine-tuning method called GBERT for improved group recommendations, which employs BERT to enhance the expressivity and capture group-specific preferences of members. In the pre-training stage, GBERT employs three pre-training tasks to alleviate data sparsity and cold-start problem, and learn better user representations. In the fine-tuning stage, an influence-based regulation objective is designed to regulate user and group representations by allocating weights according to each member's influence. Extensive experiments on three public datasets demonstrate its superiority over the state-of-the-art methods for ephemeral group recommendations.

DeepVT: Deep View-Temporal Interaction Network for News Recommendation

Personalized news recommendation aims to provide people with customized content, which can effectively improve the reading experience. Because user interests in news are diverse and changeable, how to learn accurate user representations is the core challenge in news recommendation. However, most of the previous works only apply news-level representation for user modeling directly, the views of news, such as title, abstract, and category, are only implied and compressed into a single vector of news, which makes it impossible for different views in different news to interact with each other. In this paper, we first focus on the view-level information for user modeling and propose Deep View-Temporal Interaction Network (DeepVT) for news recommendation. It mainly contains two components, i.e., 2D semi-causal convolutional neural network (SC-CNN) and multi-operator attention (MoA). SC-CNN can synthesize interaction information at the view-level and temporal information at the news-level simultaneously and efficiently. And MoA integrates different similarity operators in self-attention functions to avoid attention bias and enhance robustness. By collaboration with SC-CNN, the global interaction at the view-level becomes more sufficient. Experiments on a large-scale real-world dataset, Microsoft News Dataset (MIND), show that our model outperforms previous models in terms of all metrics significantly.

RuDi: Explaining Behavior Sequence Models by Automatic Statistics Generation and Rule Distillation

Risk scoring systems have been widely deployed in many applications, which assign risk scores to users according to their behavior sequences. Though many deep learning methods with sophisticated designs have achieved promising results, the black-box nature hinders their applications due to fairness, explainability, and compliance consideration. Rule-based systems are considered reliable in these sensitive scenarios. However, building a rule system is labor-intensive. Experts need to find informative statistics from user behavior sequences, design rules based on statistics and assign weights to each rule. In this paper, we bridge the gap between effective but black-box models and transparent rule models. We propose a two-stage method, RuDi, that distills the knowledge of black-box teacher models into rule-based student models. We design a Monte Carlo tree search-based statistics generation method that can provide a set of informative statistics in the first stage. Then statistics are composed into logical rules with our proposed neural logical networks by mimicking the outputs of teacher models. We evaluate RuDi on three real-world public datasets and an industrial dataset to demonstrate its effectiveness.

Cross-domain Cross-architecture Black-box Attacks on Fine-tuned Models with Transferred Evolutionary Strategies

Fine-tuning can be vulnerable to adversarial attacks. Existing works about black-box attacks on fine-tuned models (BAFT) are limited by strong assumptions. To fill the gap, we propose two novel BAFT settings, cross-domain and cross-domain cross-architecture BAFT, which only assume that (1) the target model for attacking is a fine-tuned model, and (2) the source domain data is known and accessible. To successfully attack fine-tuned models under both settings, we propose to first train an adversarial generator against the source model, which adopts an encoder-decoder architecture and maps a clean input to an adversarial example. Then we search in the low-dimensional latent space produced by the encoder of the adversarial generator. The search is conducted under the guidance of the surrogate gradient obtained from the source model. Experimental results on different domains and different network architectures demonstrate that the proposed attack method can effectively and efficiently attack the fine-tuned models.

Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Models

Deep learning techniques have been applied widely in industrial recommendation systems. However, far less attention has been paid on the overfitting problem of models in recommendation systems, which, on the contrary, is recognized as a critical issue for deep neural networks. In the context of Click-Through Rate (CTR) prediction, we observe an interesting one-epoch overfitting problem: the model performance exhibits a dramatic degradation at the beginning of the second epoch. Such a phenomenon has been witnessed widely in real-world applications of CTR models. Thereby, the best performance is usually achieved by training with only one epoch. To understand the underlying factors behind the one-epoch phenomenon, we conduct extensive experiments on the production data set collected from the display advertising system of Alibaba. The results show that the model structure, the optimization algorithm with a fast convergence rate, and the feature sparsity are closely related to the one-epoch phenomenon. We also provide a likely hypothesis for explaining such a phenomenon and conduct a set of proof-of-concept experiments. We hope this work can shed light on the future research on training more epochs for better performance.

MAE4Rec: Storage-saving Transformer for Sequential Recommendations

Sequential recommender systems (SRS) aim to infer the users' preferences from their interaction history and predict items that will be of interest to the users. The majority of SRS models typically incorporate all historical interactions for next-item recommendations. Despite their success, feeding all interactions into the model without filtering may lead to severe practical issues: (i) redundant interactions hinder the SRS model from capturing the users' intentions; (ii) the computational cost is huge, as the computational complexity is proportional to the length of the interaction sequence; (iii) more memory space is necessitated to store all interaction records from all users. To this end, we propose a novel storage-saving SRS framework, MAE4Rec, based on a unidirectional self-attentive mechanism and masked autoencoder. Specifically, in order to lower the storage consumption, MAE4Rec first masks and discards a large percentage of historical interactions, and then infers the next interacted item solely based on the latent representation of unmarked ones. Experiments on two real-world datasets demonstrate that the proposed model achieves competitive performance against state-of-the-art SRS models with more than 40% compression of storage.

CPEE: <u>C</u>ivil Case Judgment <u>P</u>rediction centering on the Trial Mode of <u>E</u>ssential <u>E</u>lements

Civil Case Judgment Prediction (CCJP) is a fundamental task in the legal intelligence of the civil law system, which aims to automatically predict the judgment results on each plea of the plaintiff. Existing studies mainly focus on making judgment predictions only on a certain civil cause (e.g., the divorce dispute) by utilizing the fact descriptions and pleas of the plaintiff, which still suffer from the various causes and complicated legal essential elements in the real court. Thus, in this paper, we formalize CCJP as a multi-task learning problem and propose a CCJP method centering on the trial mode of essential elements, CPEE, which explores the practical judicial process and analyzes comprehensive legal essential elements to make judgment predictions. Specifically, we first construct three tasks (i.e., the predictions on the civil causes, law articles, and the final judgment on each plea) necessary for CCJP, that follow the judgment process and exploit the results of intermediate subtasks to make judgment predictions. Then we design a logic-enhanced network to predict the results of three tasks and conduct a comprehensive study of civil cases. Finally, owing to the interlinked and dependent relationships among each task, we adopt the cause prediction result to help predict law articles and incorporate them into final judgment prediction through a gate mechanism. Furthermore, since the existing dataset fails to provide sufficient case information, we construct a real-world CCJP dataset that contains various causes and comprehensive legal elements. Extensive experimental results on the dataset validate the effectiveness of our method.

Two-Level Graph Path Reasoning for Conversational Recommendation with User Realistic Preference

Conversational recommender systems model user dynamic preferences and recommend items based on multi-turn interactions. Though the conversational recommender system has achieved good performance, it has two limitations. On the one hand, researchers usually random select an anchor item from user's historical interactions to simulate the interaction with the real user, but some items in the historical interactions do not fit the user realistic preferences (item noise). On the other hand, it pays too much attention to user dynamic preferences, but nurses some static preferences that are difficult to change over a short period. In fact, when there is no explicit attribute preference in user's conversation, the user static preferences can also be used to make recommendations. To address the aforementioned issues, a novel method that combines graph path reasoning with multi-turn conversation is proposed, called Graph Path reasoning for conversational Recommendation (GPR). In GPR, a soft-clustering is designed to classify items and then set operations are utilized to filter the noise in the user's historical interactions. To capture user dynamic preferences and take account of the user inherent static preferences, GPR asks questions about attributes in the attribute-level reasoning and asks whether the items fit user static preferences in the item-level reasoning on a heterogeneous graph. In the multi-turn of two-level graph path reasoning, a reinforcement learning is used to obtain the optimal path and accurately recommend items to users. Extensive experiments conducted on two benchmark datasets verify that GPR can significantly improve recommendation performance and reduce the turn of path reasoning.

End-to-end Modularity-based Community Co-partition in Bipartite Networks

Resolving community structure in networks is of significant benefit for both scientific inquiries and practical applications. Recently, deep neural networks have demonstrated excellent performance on various graph mining tasks, including community detection. However, there are still some challenges that are urgent to be addressed. First, being frequently formulated in an unsupervised setting, community detection has been proved to be more resistant to the advantages of end-to-end learning. Many deep methods carry out clustering algorithms after the acquisition of node representations. Second, very few studies consider the heterogeneity of a large number of real-world networks in end-to-end community detection. For instance, the building blocks of general heterogeneous networks are the bipartite model, which is a ubiquitous structure where two types of nodes co-exist. In view of these challenges, we study the end-to-end community co-partition of two types of nodes in bipartite networks. Specifically, we extend both spectral and spatial graph convolution operators to bipartite structures for node feature encoding. Then we formulate a novel loss function with a modularity-based objective, as well as two collapsed regularizations for producing more informative community assignment matrices. Co-partitions of nodes can be directly achieved by optimization with stochastic gradient descent under the proposed framework. Comprehensive empirical analysis, compared with various types of classic and deep methods, demonstrates the efficacy and the scalability of the proposed method.

MentorGNN: Deriving Curriculum for Pre-Training GNNs

Graph pre-training strategies have been attracting a surge of attention in the graph mining community, due to their flexibility in parameterizing graph neural networks (GNNs) without any label information. The key idea lies in encoding valuable information into the backbone GNNs, by predicting the masked graph signals extracted from the input graphs. In order to balance the importance of diverse graph signals (e.g., nodes, edges, subgraphs), the existing approaches are mostly hand-engineered by introducing hyperparameters to re-weight the importance of graph signals. However, human interventions with sub-optimal hyperparameters often inject additional bias and deteriorate the generalization performance in the downstream applications. This paper addresses these limitations from a new perspective, i.e., deriving curriculum for pre-training GNNs. We propose an end-to-end model named MentorGNN that aims to supervise the pre-training process of GNNs across graphs with diverse structures and disparate feature spaces. To comprehend heterogeneous graph signals at different granularities, we propose a curriculum learning paradigm that automatically re-weighs graph signals in order to ensure a good generalization in the target domain. Moreover, we shed new light on the problem of domain adaption on relational data (i.e., graphs) by deriving a natural and interpretable upper bound on the generalization error of the pre-trained GNNs. Extensive experiments on a wealth of real graphs validate and verify the performance of MentorGNN.

D-HYPR: Harnessing Neighborhood Modeling and Asymmetry Preservation for Digraph Representation Learning

Digraph Representation Learning (DRL) aims to learn representations for directed homogeneous graphs (digraphs). Prior work in DRL is largely constrained (e.g., limited to directed acyclic graphs), or has poor generalizability across tasks (e.g., evaluated solely on one task). Most Graph Neural Networks (GNNs) exhibit poor performance on digraphs due to the neglect of modeling neighborhoods and preserving asymmetry. In this paper, we address these notable challenges by leveraging hyperbolic collaborative learning from multi-ordered and partitioned neighborhoods, and regularizers inspired by socio-psychological factors. Our resulting formalism, Digraph Hyperbolic Networks (D-HYPR) -- albeit conceptually simple -- generalizes to digraphs where cycles and non-transitive relations are common, and is applicable to multiple downstream tasks including node classification, link presence prediction, and link property prediction. In order to assess the effectiveness of D-HYPR, extensive evaluations were performed across 8 real-world digraph datasets involving 21 prior techniques. D-HYPR statistically significantly outperforms the current state of the art. We release our code at

Multi-task Learning with Adaptive Global Temporal Structure for Predicting Alzheimer's Disease Progression

In this paper, we propose a multi-task learning approach for predicting the progression of Alzheimer's disease (AD), known as the most common form of dementia. The vital challenge is to identify how the tasks are related and build learning models to capture such task relatedness. Unlike previous methods that assume low-rank structure, chase the predefined local temporal relatedness or utilize local approximation, we propose a novel penalty termed <u>L</u> ongitudinal <u>S</u> tability <u>A</u> djustment (LSA) to adaptively capture the intrinsic global temporal correlation among multiple time points and thus utilize the accumulated disease progression information. We combine LSA with sparse group Lasso to present a novel multi-task learning formulation to identify biomarkers closely related to cognitive measurement and predict AD progression. Two efficient algorithms are designed for large-scale dataset. Experimental results conducted on two AD data sets demonstrate our framework outperforms competing methods in terms of overall and each task performances. We also perform stability selection to identify stable biomarkers from the MRI feature set and analyze their temporal patterns in disease progression.

Adversarial Robustness through Bias Variance Decomposition: A New Perspective for Federated Learning

Federated learning learns a neural network model by aggregating the knowledge from a group of distributed clients under the privacy-preserving constraint. In this work, we show that this paradigm might inherit the adversarial vulnerability of the centralized neural network, i.e., it has deteriorated performance on adversarial examples when the model is deployed. This is even more alarming when federated learning paradigm is designed to approximate the updating behavior of a centralized neural network. To solve this problem, we propose an adversarially robust federated learning framework, named Fed_BVA, with improved server and client update mechanisms. This is motivated by our observation that the generalization error in federated learning can be naturally decomposed into the bias and variance triggered by multiple clients' predictions. Thus, we propose to generate the adversarial examples via maximizing the bias and variance during server update, and learn the adversarially robust model updates with those examples during client update. As a result, an adversarially robust neural network can be aggregated from these improved local clients' model updates. The experiments are conducted on multiple benchmark data sets using several prevalent neural network models, and the empirical results show that our framework is robust against white-box and black-box adversarial corruptions under both IID and non-IID settings.

Decoupled Hyperbolic Graph Attention Network for Modeling Substitutable and Complementary Item Relationships

Modeling substitutable and complementary item relationships is a fundamental and important topic for recommendation in e-commerce online scenarios. In the real world, item relationships are usually coupled, heterogeneous and they also have abundant side information and hierarchical data structures. Recently, to take full advantage of both sides information and topological structure, graph neural networks are widely explored in relationship modeling. However, the existing methods are crude in decoupling heterogeneous relationships. Their model designs lack deep insight of relationships' coupling mode, i.e. neglects the prior knowledge of how relationships affect each other. In addition, many existing graph methods, regardless of how they handle coupled relationships, are deployed in Euclidean spaces, which distorts hierarchical data structure and limits the expressive power due to the non power law characteristic of Euclidean topology. In this paper, we propose a novel Decoupled Hyperbolic Graph Attention Network (DHGAN). The innovations of our DHGAN can be highlighted as two aspects. Firstly, we design metapaths in an adequate way following an algebraic perspective of relationships coupling mode, which helps achieving better model interpretability. Secondly, DHGAN maps heterogeneous relationships into separate hyperbolic spaces, which can better capture the hierarchical information of graph nodes and helps improving model's representational capacity. We conduct extensive experiments on three public real-world datasets, demonstrating DHGAN is superior to the state-of-the-art graph baselines. We release the codes at

Personalized Query Suggestion with Searching Dynamic Flow for Online Recruitment

Employing query suggestion techniques to assist users in articulating their needs during online search has become increasingly vital for search engines in an age of exponential information growth. The success of a query suggestion system lies in understanding and modeling user search intent behind each query accurately, which can hardly be achieved without personalization efforts on taking advantage of dynamic user feedback behaviors and rich contextual information. This valuable area, however, has been still largely untapped by current query suggestion systems. In this work, we propose <u>D</u>ynamic <u>S</u>earching <u>F</u>low <u>M</u>odel (DSFM), a query suggestion framework that is capable of modeling and refining user search intent progressively in recruitment scenarios by leveraging a dynamic flow mechanism. Here the concepts of local flow and global flow are introduced to capture the real-time intention of users and the overall influence of a session, respectively. By utilizing rich semantic information contained in resumes and job requirements, DSFM enables the personalization of query suggestions. In addition, weighted contrast learning is introduced into the training process to produce more extensive targeted query samples and partially alleviate the exposure bias. The adoption of attention mechanism allows the selection of the most relevant information to compose the final intention representation. Extensive experimental results on different categories of real-world datasets demonstrate the effectiveness of our proposed approach on the task of query suggestion for online recruitment platforms.

From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking

Contextual information in search sessions is important for capturing users' search intents. Various approaches have been proposed to model user behavior sequences to improve document ranking in a session. Typically, training samples of (search context, document) pairs are sampled randomly in each training epoch. In reality, the difficulty to understand user's search intent and to judge document's relevance varies greatly from one search context to another. Mixing up training samples of different difficulties may confuse the model's optimization process. In this work, we propose a curriculum learning framework for context-aware document ranking, in which the ranking model learns matching signals between the search context and the candidate document in an easy-to-hard manner. In so doing, we aim to guide the model gradually toward a global optimum. To leverage both positive and negative examples, two curricula are designed. Experiments on two real query log datasets show that our proposed framework can improve the performance of several existing methods significantly, demonstrating the effectiveness of curriculum learning for context-aware document ranking.

Robust Node Classification on Graphs: Jointly from Bayesian Label Transition and Topology-based Label Propagation

Node classification using Graph Neural Networks (GNNs) has been widely applied in various real-world scenarios. However, in recent years, compelling evidence emerges that the performance of GNN-based node classification may deteriorate substantially by topological perturbation, such as random connections or adversarial attacks. Various solutions, such as topological denoising methods and mechanism design methods, have been proposed to develop robust GNN-based node classifiers but none of these works can fully address the problems related to topological perturbations. Recently, the Bayesian label transition model is proposed to tackle this issue but its slow convergence may lead to inferior performance. In this work, we propose a new label inference model, namely LInDT, which integrates both Bayesian label transition and topology-based label propagation for improving the robustness of GNNs against topological perturbations. LInDT is superior to existing label transition methods as it improves the label prediction of uncertain nodes by utilizing neighborhood-based label propagation leading to better convergence of label inference. Besides, LIndT adopts asymmetric Dirichlet distribution as a prior, which also helps it to improve label inference. Extensive experiments on five graph datasets demonstrate the superiority of LInDT for GNN-based node classification under three scenarios of topological perturbations.

Tiger: Transferable Interest Graph Embedding for Domain-Level Zero-Shot Recommendation

Recommender systems play a significant role in online services and have attracted wide attention from both academia and industry. In this paper, we focus on an important, practical, but often overlooked task: domain-level zero-shot recommendation (DZSR). The challenge of DZSR mainly lies in the absence of collaborative behaviors in the target domain, which may be caused by various reasons, such as the domain being newly launched without existing user-item interactions, or users' behaviors being too sensitive to collect for training. To address this challenge, we propose a Transferable Interest Graph Embedding technique for Recommendations (Tiger). The key idea is to connect isolated collaborative filtering datasets with a knowledge graph tailored to recommendations, then propagate collaborative signals from public domains to the zero-shot target domain. The backbone of Tiger is the transferable interest extractor, which is a simple yet effective graph convolutional network (GCN) aggregating multiple hops of neighbors on a shared interest graph. We find that the bottom layers of GCN preserve more domain-specific information while the upper layers represent universal interest better. Thus, in Tiger, we discard the bottom layers of GCN to reconstruct user interest so that collaborative signals can be successfully propagated to other domains, and retain the bottom layers of GCN to include domain-specific information for items. Extensive experiments with four public datasets demonstrate that Tiger can effectively make recommendations for a zero-shot domain and outperform several alternative baselines.

Improving Knowledge-aware Recommendation with Multi-level Interactive Contrastive Learning

Incorporating Knowledge Graphs (KG) into recommeder system as side information has attracted considerable attention. Recently, the technical trend of Knowledge-aware Recommendation (KGR) is to develop end-to-end models based on graph neural networks (GNNs). However, the extremely sparse user-item interactions significantly degrade the performance of the GNN-based models, from the following aspects: 1) the sparse interaction, itself, means inadequate supervision signals and limits the supervised GNN-based models; 2) the combination of sparse interactions (CF part) and redundant KG facts (KG part) further results in an unbalanced information utilization. Besides, the GNN paradigm aggregates local neighbors for node representation learning, while ignoring the non-local KG facts and making the knowledge extraction insufficient. Inspired by the recent success of contrastive learning in mining supervised signals from data itself, in this paper, we focus on exploring contrastive learning in KGR and propose a novel multi-level interactive contrastive learning mechanism, to alleviate the aforementioned challenges. Different from traditional contrastive learning methods which contrast nodes of two generated graph views, interactive contrastive mechanism conducts layer-wise self-supervised learning by contrasting layers of different parts within graphs, which is also an "interaction" action. Specifically, we first construct local and non-local graphs for user/item in KG, exploring more KG facts for KGR. Then an intra-graph level interactive contrastive learning is performed within each local/non-local graph, which contrasts layers of the CF and KG parts, for more consistent information leveraging. Besides, an inter-graph level interactive contrastive learning is performed between the local and non-local graphs, for sufficiently and coherently extracting non-local KG signals. Extensive experiments conducted on three benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. The implementations are available at:

Hierarchical Conversational Preference Elicitation with Bandit Feedback

The recent advances of conversational recommendations provide a promising way to efficiently elicit users' preferences via conversational interactions. To achieve this, the recommender system conducts conversations with users, asking their preferences for different items or item categories. Most existing conversational recommender systems for cold-start users utilize a multi-armed bandit framework to learn users' preference in an online manner. However, they rely on a pre-defined conversation frequency for asking about item categories instead of individual items, which may incur excessive conversational interactions that hurt user experience. To enable more flexible questioning about key-terms, we formulate a new conversational bandit problem that allows the recommender system to choose either a key-term or an item to recommend at each round and explicitly models the rewards of these actions. This motivates us to handle a new exploration-exploitation (EE) trade-off between key-term asking and item recommendation, which requires us to accurately model the relationship between key-term and item rewards. We conduct a survey and analyze a real-world dataset to find that, unlike assumptions made in prior works, key-term rewards are mainly affected by rewards of representative items. We propose two bandit algorithms, Hier-UCB and Hier-LinUCB, that leverage this observed relationship and the hierarchical structure between key-terms and items to efficiently learn which items to recommend. We theoretically prove that our algorithm can reduce the regret bound's dependency on the total number of items from previous work. We validate our proposed algorithms and regret bound on both synthetic and real-world data.

SESSION: CIKM'22 Applied Research Papers

A Case Study in Educational Recommenders: Recommending Music Partitures at Tomplay

Recommendation technologies have been playing an instrumental role for promoting both physical and digital content across several global platforms (Amazon, Apple, Netflix). Here we provide a study on the benefits of recommendation technologies in an educational platform with a focus on music learning. There are several characteristics present in this educational platform that make this recommendation problem particularly interesting, namely: a) the few but highly repetitive interactions, b) the existence of multiple versions of the same content across many difficulty levels, orchestrations, and musical instruments, and c) the user's expertise in a musical instrument which is essential for making appropriate recommendations. We highlight the unique dataset characteristics and compare them to those of other widely-used recommendation datasets. To alleviate the very high data sparsity due to the multi-instantiation of songs, we use entity resolution principles to embed songs in a new space. Using this lightweight entity resolution step on song data, in combination with neural recommendation architectures, we can double the predictive accuracy compared to techniques based on matrix factorization.

E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling

Promotions also affect revenue and may incur a monetary loss that is often limited by a dedicated promotional budget. We propose an Online Constrained Multiple-Choice Promotions Personalization framework, driven by causal incremental estimations achieved by uplift modeling. Our work formalizes the problem as an Online Multiple-Choice Knapsack Problem and extends the existent literature by addressing cases with negative weights and values as a result from causal estimations. Our real-time adaptive method guarantees budget constraints compliance achieving above 99.7% of the potential optimal impact on various datasets. It was deployed in a large-scale experimental study at - one of the leading online travel platforms in the world. The application resulted in 162% improvement in sales while complying a zero-budget constraint, enabling long-term self-sponsored promotional campaigns.

Will This Online Shopping Session Succeed? Predicting Customer's Purchase Intention Using Embeddings

Customers are increasingly using online channels to buy products. For e-commerce companies, this offers new opportunities to tailor the shopping experience to customers' needs. Therefore, it is of great importance for a company to know their customers' intentions while browsing their webpage. A major challenge is the real-time analysis of a customer's intention during browsing sessions. To this end, a representation of the customer's browsing behavior must be retrieved from their live interactions on the webpage. Typically, characteristic behavioral features are extracted manually based on the knowledge of marketing experts. In this paper, we propose a customer embedding representation that is based on the customer's click-events recorded during browsing sessions. Thus, our approach does not use manually extracted features and is not based on marketing expert domain knowledge, which makes it transferable to different webpages and different online markets. We demonstrate our approach using three different e-commerce datasets to successfully predict whether a customer is going to purchase a specific product. For the prediction, we utilize the customer embedding representations as input for different machine learning models. We compare our approach with existing state-of-the-art approaches for real-time purchase prediction and show that our proposed customer representation with an LSTM predictor outperforms the state-of-the-art approach on all three datasets. Additionally, the creation process of our customers' representation is on average 235 times faster than the creation process of the baseline.

Improving Text-based Similar Product Recommendation for Dynamic Product Advertising at Yahoo

Retrieving similar products is a critical functionality required by many e-commerce websites as well as dynamic product advertising systems. Retargeting and Prospecting are two major forms of dynamic product advertising. Typically, after a user interacts with a product on an advertiser website (e.g., Macy's), when the user later visits a website (e.g., supported by a dynamic product advertising system, the same product may be shown to the user as a Retargeting product ad, while some similar products may be shown to the user as Prospecting product ads on the web page. Similar products can enrich users' ad experience based on users' intent on the Prospecting product ads through which the users interacted. These product ads can also serve as substitutes when Retargeting ad candidates are out of stock. However, it is challenging to retrieve similar products among billions of products in a product catalog efficiently. Deep Siamese models allow efficient retrieval but do not put enough emphasize on key product attributes. To improve the quality of the similar products, we propose to first use a Siamese Transformer-based model to retrieve similar products and then refine them with the attribute "product name" that indicates the type of a product (e.g., running shoes, engagement ring, etc.) for post filtering. We propose a novel product name generation model that fine tunes a pre-trained Transformer-based language model with a sequence to sequence objective. To the best of our knowledge, this is the first work using a generative approach for identifying product attributes. We introduce two applications of the proposed approach for the dynamic product advertising system of Yahoo for Retargeting and Prospecting respectively. Offline evaluation and online A/B testing shows that the proposed approach retrieves high quality similar products, leading to an increase of ad clicks and ad revenue.

Efficient and Effective SPARQL Autocompletion on Very Large Knowledge Graphs

We show how to achieve fast autocompletion for SPARQL queries on very large knowledge graphs. At any position in the body of a SPARQL query, the autocompletion suggests matching subjects, predicates, or objects. The suggestions are context-sensitive and ranked by their relevance to the part of the query already typed. The suggestions can be narrowed down by prefix search on the names and aliases of the desired subject, predicate, or object. All suggestions are themselves obtained via SPARQL queries. For existing SPARQL engines, these queries are impractically slow on large knowledge graphs. We present various algorithmic and engineering improvements of an open-source SPARQL engine such that these queries are executed efficiently. We evaluate a variety of suggestion methods on three large knowledge graphs, including the complete Wikidata. We compare our results with two widely used SPARQL engines, Virtuoso and Blazegraph. Our code, benchmarks, and complete reproducibility materials are available on

Graph Neural Networks Pretraining Through Inherent Supervision for Molecular Property Prediction

Recent global events have emphasized the importance of accelerating the drug discovery process. A way to deal with the issue is to use machine learning to increase the rate at which drugs are made available to the public. However, chemical labeled data for real-world applications is extremely scarce making traditional approaches less effective. A fruitful course of action for this challenge is to pretrain a model using related tasks with large enough datasets, with the next step being finetuning it for the desired task. This is challenging as creating these datasets requires labeled data or expert knowledge. To aid in solving this pressing issue, we introduce MISU - Molecular Inherent SUpervision, a unique method for pretraining graph neural networks for molecular property prediction. Our method leapfrogs past the need for labeled data or any expert knowledge by introducing three innovative components that utilize inherent properties of molecular graphs to induce information extraction at different scales, from the local neighborhood of an atom to substructures in the entire molecule. Our empirical results for six chemical-property-prediction tasks show that our method reaches state-of-the-art results compared to numerous baselines.

Debiased Balanced Interleaving at Amazon Search

Interleaving is an online evaluation technique that has shown to be orders of magnitude more sensitive than traditional A/B tests. It presents users with a single merged result of the compared rankings and then attributes user actions back to the evaluated rankers. Different interleaving methods in the literature have their advantages and limitations with respect to unbiasedness, sensitivity, preservation of user experience, and implementation and computation complexity. We propose a new interleaving method that utilizes a counterfactual evaluation framework for credit attribution while sticking to the simple ranking merge policy of balanced interleaving, and formally derive an unbiased estimator for comparing rankers with theoretical guarantees. We then confirm the effectiveness of our method with both synthetic and real experiments. We also discuss practical considerations of bringing different interleaving methods from the literature into a large-scale experiment, and show that our method achieves a favorable tradeoff in implementation and computation complexity while preserving statistical power and reliability. We have successfully implemented our method and produced consistent conclusions at the scale of billions of search queries. We report 10 online experiments that apply our method to e-commerce search, and observe a 60x sensitivity gain over A/B tests. We also find high correlations between our proposed estimator and corresponding A/B metrics, which helps interpret interleaving results in the magnitude of A/B measurements.

A Relevant and Diverse Retrieval-enhanced Data Augmentation Framework for Sequential Recommendation

Within online platforms, it is critical to capture the semantics of sequential user behaviors for accurately predicting user interests. Recently, significant progress has been made in sequential recommendation with deep learning. However, existing neural sequential recommendation models may not perform well in practice due to the sparsity of the real-world data especially in cold-start scenarios. To tackle this problem, we propose the model ReDA, which stands for Retrieval-enhanced Data Augmentation for modeling sequential user behaviors. The main idea of our approach is to leverage the related information from similar users for generating both relevant and diverse augmentation. First, we train a neural retriever to retrieve the augmentation users according to the se- mantic similarity between user representations, and then conduct two types of data augmentation to generate augmented user representations. Furthermore, these augmented data are incorporated in a contrastive learning framework for learning more capable representations. Extensive experiments conducted on both public and industry datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods, especially when only limited training data is available.

Fooling MOSS Detection with Pretrained Language Models

As artificial intelligence (AI) technologies become increasingly powerful and prominent in society, their misuse is a growing concern. In educational settings, AI technologies could be used by students to cheat on assignments and exams. In this paper we explore whether transformers can be used to solve introductory level programming assignments while bypassing commonly used AI tools to detect similarities between pieces of software. We find that a student using GPT-J [60] can complete introductory level programming assignments without triggering suspicion from MOSS [2], a widely used software similarity and plagiarism detection tool. This holds despite the fact that GPT-J was not trained on the problems in question and is not provided with any examples to work from. We further find that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code. We conclude with a discussion of the ethical and educational implications of large language models and directions for future research.

A Context-Enhanced Transformer with Abbr-Recover Policy for Chinese Abbreviation Prediction

Chinese abbreviation prediction is very important for various natural language processing tasks such as query understanding and entity linking, since people tend to use the concise abbreviation rather than the full form (name) to mention an entity. The existing models achieve their predictions through sequence labeling, i.e., the binary classification for each character (token) of the full form. However, they only leverage the semantics of the entity itself, overlooking the label dependencies between the tokens, and the rich information of the entity-related texts. In this paper we proposed a Context-Enhanced Transformer with Abbr-Recover policy, namely CETAR, for Chinese abbreviation prediction. CETAR predicts the abbreviation sequence mainly through an iterative decoding process, of which each round consists of an abbreviation and recovery operation. Our extensive experiments upon both general field and specific domain datasets justify that CETAR outperforms the state-of-the-art baselines including sequence labeling models and sequence generation models. Moreover, we have successfully constructed a Chinese abbreviation dataset from the famous tour website Fliggy, and we also shared it at The online A/B test on the Fliggy search system shows that 2.03% of conversion rate improvement has been achieved with the predicted abbreviations.

Simulation-Informed Revenue Extrapolation with Confidence Estimate for Scaleup Companies Using Scarce Time-Series Data

Investment professionals rely on extrapolating company revenue into the future (i.e. revenue forecast) to approximate the valuation of scaleups (private companies in a high-growth stage) and inform their investment decision. This task is manual and empirical, leaving the forecast quality heavily dependent on the investment professionals' experiences and insights. Furthermore, financial data on scaleups is typically proprietary, costly and scarce, ruling out the wide adoption of data-driven approaches. To this end, we propose a simulation-informed revenue extrapolation (SiRE) algorithm that generates fine-grained long-term revenue predictions on small datasets and short time-series. SiRE models the revenue dynamics as a linear dynamical system (LDS), which is solved using the EM algorithm. The main innovation lies in how the noisy revenue measurements are obtained during training and inferencing. SiRE works for scaleups that operate in various sectors and provides confidence estimates. The quantitative experiments on two practical tasks show that SiRE significantly surpasses the baseline methods by a large margin. We also observe high performance when SiRE extrapolates long-term predictions from short time-series. The performance-efficiency balance and result explainability of SiRE are also validated empirically. Evaluated from the perspective of investment professionals, SiRE can precisely locate the scaleups that have a great potential return in 2 to 5 years. Furthermore, our qualitative inspection illustrates some advantageous attributes of the SiRE revenue forecasts.

GIFT: Graph-guIded Feature Transfer for Cold-Start Video Click-Through Rate Prediction

Short video has witnessed rapid growth in the past few years in e-commerce platforms like Taobao. To ensure the freshness of the content, platforms need to release a large number of new videos every day, making conventional click-through rate (CTR) prediction methods suffer from the item cold-start problem. In this paper, we propose GIFT, an efficient Graph-guIded Feature Transfer system, to fully take advantages of the rich information of warmed-up videos to compensate for the cold-start ones. Specifically, we establish a heterogeneous graph that contains physical and semantic linkages to guide the feature transfer process from warmed-up video to cold-start videos.Specifically, we establish a heterogeneous graph that contains physical and semantic linkages to guide the feature transfer process. The physical linkages consist of the explicit relationships (e.g., produced by the same author, or showcasing the same product etc.), and the semantic linkages measure the proximity of multi-modal representations of two videos. We elaborately design the feature transfer function to make aware of different parts of transferred features (e.g., id representations and historical statistics) from different types of nodes and edges along the metapath on the graph. We conduct extensive experiments on a large real-world dataset, and the results show that our GIFT system outperforms SOTA methods significantly and brings a 6.82% lift on CTR in the homepage of Taobao App.

Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction

Rich user behavior data has been proven to be of great value for Click-Through Rate (CTR) prediction applications, especially in industrial recommender, search, or advertising systems. However, it's non-trivial for real-world systems to make full use of long-term user behaviors due to the strict requirements of online serving time. Most previous works adopt the retrieval-based strategy, where a small number of user behaviors are retrieved first for subsequent attention. However, the retrieval-based methods are sub-optimal and would cause information losses, and it's difficult to balance the effectiveness and efficiency of the retrieval algorithm. In this paper, we propose SDIM (Sampling-based Deep Interest Modeling), a simple yet effective sampling-based end-to-end approach for modeling long-term user behaviors. We sample from multiple hash functions to generate hash signatures of the candidate item and each item in the user behavior sequence, and obtain the user interest by directly gathering behavior items associated with the candidate item with the same hash signature. We show theoretically and experimentally that the proposed method performs on par with standard attention-based models on modeling long-term user behaviors, while being sizable times faster. We also introduce the deployment of SDIM in our system. Specifically, we decouple the behavior sequence hashing, which is the most time-consuming part, from the CTR model by designing a separate module named BSE (Behavior Sequence Encoding). BSE is latency-free for the CTR server, enabling us to model extremely long user behaviors. Both offline and online experiments are conducted to demonstrate the effectiveness of SDIM. SDIM now has been deployed online in the search system of Meituan APP.

Numerical Feature Representation with Hybrid N-ary Encoding

Numerical features (e.g., statistical features) are widely used in recommender systems and online advertising. Existing approaches for numerical feature representation in industry are primarily based on discretization. However, hard-discretization based methods (e.g., Equal Distance Discretization) are deficient in continuity while soft-discretization based methods (e.g., AutoDis) lack discriminability. To emphasize both continuity and discriminability for numerical features, we propose an end-to-end representation learning framework named NaryDis. Specifically, NaryDis first leverages hybrid n-ary encoding as an automatic discretization module to generate hybrid-grained discretization results (multiple encoded sequences). Each position of the encoded sequence is assigned with a positional embedding and an intra-ary attention network is leveraged to aggregate the positional embeddings for obtaining ary-wise representations. Then an inter-ary attention is adopted to assemble these representations, which are further constrained by a self-supervised regularization module. Comprehensive experiments on two public datasets are conducted to show the superiority and compatibility of NaryDis. Besides, we deeply investigate the properties of continuity and discriminability. Moreover, we further verify the effectiveness of NaryDis on a large-scale industrial advertisement dataset.

Generating Persuasive Responses to Customer Reviews with Multi-Source Prior Knowledge in E-commerce

Customer reviews usually contain much information about one's online shopping experience. While positive reviews are beneficial to the stores, negative ones will largely influence consumers' decision and may lead to a decline in sales. Therefore, it is of vital importance to carefully and persuasively reply to each negative review and minimize its disadvantageous effect. Recent studies consider leveraging generation models to help the sellers respond. However, this problem is not well-addressed as the reviews may contain multiple aspects of issues which should be resolved accordingly and persuasively. In this work, we propose a Multi-Source Multi-Aspect Attentive Generation model for persuasive response generation. Various sources of information are appropriately obtained and leveraged by the proposed model for generating more informative and persuasive responses. A multi-aspect attentive network is proposed to automatically attend to different aspects in a review and ensure most of the issues are tackled. Extensive experiments on two real-world datasets, demonstrate that our approach outperforms the state-of-the-art methods and online tests prove that our deployed system significantly enhances the efficiency of the stores' dealing with negative reviews.

Hierarchically Constrained Adaptive Ad Exposure in Feeds

A contemporary feed application usually provides blended results of organic items and sponsored items~(ads) to users. Conventionally, ads are exposed at fixed positions. Such a fixed ad exposure strategy is inefficient due to ignoring users' personalized preferences towards ads. To this end,adaptive ad exposure is becoming an appealing strategy to boost the overall performance of the feed. However, existing approaches to implement the adaptive ad exposure strategy suffer from several limitations: 1) they usually fall into sub-optimal solutions because of only focusing on request-level optimization without consideration of the application-level performance and constraints, 2) they neglect the necessity of keeping the game-theoretical properties of ad auctions, and 3) they can hardly be deployed in large-scale applications due to high computational complexity. In this paper, we focus on the application-level performance optimization under hierarchical constraints in feeds and formulate adaptive ad exposure as a Dynamic Knapsack Problem. We propose Hierarchically Constrained Adaptive Ad Exposure~(HCA2E) that possesses the desirable game-theoretical properties, computational efficiency, and performance robustness. Comprehensive offline and online experiments on a leading e-commerce application demonstrate the performance superiority of HCA2E.

Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation

Model-based methods for recommender systems have been studied extensively for years. Modern recommender systems usually resort to 1) representation learning models which define user-item preference as the distance between their embedding representations, and 2) embedding-based Approximate Nearest Neighbor (ANN) search to tackle the efficiency problem introduced by large-scale corpus. While providing efficient retrieval, the embedding-based retrieval pattern also limits the model capacity since the form of user-item preference measure is restricted to the distance between their embedding representations. However, for other more precise user-item preference measures, e.g., preference scores directly derived from a deep neural network, they are computationally intractable because of the lack of an efficient retrieval method, and an exhaustive search for all user-item pairs is impractical.

In this paper, we propose a novel method to extend ANN search to arbitrary matching functions, e.g., a deep neural network. Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items. To solve the problem that the similarity measures of graph construction and user-item matching function are heterogeneous, we propose a pluggable adversarial training task to ensure the graph search with arbitrary matching function can achieve fairly high precision. Experimental results in both open source and industry datasets demonstrate the effectiveness of our method. The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase. We also summarize our detailed experiences in deployment in this paper.

ReLiable: Offline Reinforcement Learning for Tactical Strategies in Professional Basketball Games

Professional basketball provides an intriguing example of a dynamic spatio-temporal game that incorporates both hidden strategy policies and situational decision making. During a game, the coaches and players are assumed to follow a general game plan, but players are also forced to make spur-of-the-moment decisions based on immediate conditions on the court. However, because it is challenging to process heterogeneous signals on the court and the space of potential actions and outcomes is massive, it is hard for players to find an optimal strategy on the fly given a short amount of time to observe conditions and take action. In this work, we present ReLiable (ReinforcemEnt Learning In bAsketBaLl gamEs). Specifically, we investigate the possibility of using reinforcement learning (RL) to guide player decisions. We train an offline deep Q-network (DQN) on historical National Basketball Association (NBA) game data from 2015-2016. The data include play-by-play and player movement sensor data. We apply our trained agent to games that it has not seen. Our method is able to propose potentially smarter tactical strategies, compared with replay gameplay data, producing expected final game scores comparable to elite NBA teams. Our approach can be useful for learning strategy policies from other game-like domains characterized by competing groups and sequential spatio-temporal event data.

Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning

Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. In this paper, we propose a methodology for predicting student performance from their online learning activities that optimizes inference accuracy over different demographic groups such as race and gender. Building upon recent foundations in federated learning, in our approach, personalized models for individual student subgroups are derived from a global model aggregated across all student models via meta-gradient updates that account for subgroup heterogeneity. To learn better representations of student activity, we augment our approach with a self-supervised behavioral pretraining methodology that leverages multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums), and include a neural network attention mechanism in the model aggregation stage. Through experiments on three real-world datasets from online courses, we demonstrate that our approach obtains substantial improvements over existing student modeling baselines in predicting student learning outcomes for all subgroups. Visual analysis of the resulting student embeddings confirm that our personalization methodology indeed identifies different activity patterns within different subgroups, consistent with its stronger inference ability compared with the baselines.

Hierarchical Capsule Prediction Network for Marketing Campaigns Effect

Marketing campaigns are a set of strategic activities that can promote a business's goal. The effect prediction for marketing campaigns in a real industrial scenario is very complex and challenging due to the fact that prior knowledge is often learned from observation data, without any intervention for the marketing campaign. Furthermore, each subject is always under the interference of several marketing campaigns simultaneously. Therefore, we cannot easily parse and evaluate the effect of a single marketing campaign. To the best of our knowledge, there are currently no effective methodologies to solve such a problem, i.e., modeling an individual-level prediction task based on a hierarchical structure with multiple intertwined events. In this paper, we provide an in-depth analysis of the underlying parse tree-like structure involved in the effect prediction task and we further establish a Hierarchical Capsule Prediction Network (HapNet) for predicting the effects of marketing campaigns. Extensive results based on both the synthetic data and real data demonstrate the superiority of our model over the state-of-the-art methods and show remarkable practicability in real industrial applications.

Detecting Environmental Violations with Satellite Imagery in Near Real Time: Land Application under the Clean Water Act

This paper introduces a new, highly consequential setting for the use of computer vision for environmental sustainability. Concentrated Animal Feeding Operations (CAFOs) (aka intensive livestock farms or "factory farms") produce significant manure and pollution. Dumping manure in the winter months poses significant environmental risks and violates environmental law in many states. Yet the federal Environmental Protection Agency (EPA) and state agencies have relied primarily on self-reporting to monitor such instances of "land application." Our paper makes four contributions. First, we introduce the environmental, policy, and agricultural setting of CAFOs and land application. Second, we provide a new dataset of high-cadence (daily to weekly) 3m/pixel satellite imagery from 2018-20 for 330 CAFOs in Wisconsin with hand labeled instances of land application (n=57,697). Third, we develop an object detection model to predict land application and a system to perform inference in near real-time. We show that this system effectively appears to detect land application (PR AUC = 0.93) and we uncover several outlier facilities which appear to apply regularly and excessively. Last, we estimate the population prevalence of land application events in Winter 2021/22. We show that the prevalence of land application is much higher than what is self-reported by facilities. The system can be used by environmental regulators and interest groups, one of which piloted field visits based on this system this past winter. Overall, our application demonstrates the potential for AI-based computer vision systems to solve major problems in environmental compliance with near-daily imagery.

DuMapper: Towards Automatic Verification of Large-Scale POIs with Street Views at Baidu Maps

With the increased popularity of mobile devices, Web mapping services have become an indispensable tool in our daily lives. To provide user-satisfied services, such as location searches, the point of interest (POI) database is the fundamental infrastructure, as it archives multimodal information on billions of geographic locations closely related to people's lives, such as a shop or a bank. Therefore, verifying the correctness of a large-scale POI database is vital. To achieve this goal, many industrial companies adopt volunteered geographic information (VGI) platforms that enable thousands of crowdworkers and expert mappers to verify POIs seamlessly; but to do so, they have to spend millions of dollars every year. To save the tremendous labor costs, we devised DuMapper, an automatic system for large-scale POI verification with the multimodal street-view data at Baidu Maps. This paper presents not only DuMapper I, which imitates the process of POI verification conducted by expert mappers, but also proposes DuMapper II, a highly efficient framework to accelerate POI verification by means of deep multimodal embedding and approximate nearest neighbor (ANN) search. DuMapper II takes the signboard image and the coordinates of a real-world place as input to generate a low-dimensional vector, which can be leveraged by ANN algorithms to conduct a more accurate search through billions of archived POIs in the database for verification within milliseconds. Compared with DuMapper I, experimental results demonstrate that DuMapper II can significantly increase the throughput of POI verification by 50 times. DuMapper has already been deployed in production since June 2018, which dramatically improves the productivity and efficiency of POI verification at Baidu Maps. As of December 31, 2021, it has enacted over 405 million iterations of POI verification within a 3.5-year period, representing an approximate workload of 800 high-performance expert mappers.

Towards Practical Large Scale Non-Linear Semi-Supervised Learning with Balancing Constraints

Semi-Supervised Support Vector Machine (S3VM) is one of the most popular methods for semi-supervised learning, which can make full use of plentiful, easily accessible unlabeled data. Balancing constraint is normally enforced in S3VM (denoted as BCS3VM) to avoid the harmful solution which assigns all or most of the unlabeled examples to one same label. Traditionally, non-linear BCS3VM is solved by sequential minimal optimization algorithm. Recently, a novel incremental learning algorithm (IL-BCS3VM) was proposed to scale up BCS3VM further. However, IL-BCS3VM needs to calculate the inverse of the linear system related to the support matrix, making the algorithm not scalable enough. To make BCS3VM be more practical in large-scale problems, in this paper, we propose a new scalable BCS3VM with accelerated triply stochastic gradients (denoted as TSG-BCS3VM). Specifically, to make the balancing constraint handle different proportions of positive and negative samples among labeled and unlabeled data, we propose a soft balancing constraint for S3VM. To make the algorithm scalable, we generate triply stochastic gradients by sampling labeled and unlabeled samples as well as the random features to update the solutions, where Quasi-Monte Carlo (QMC) sampling is utilized on random features to accelerate TSG-BCS3VM further. Our theoretical analysis shows that the convergence rate is O(1/√T) for both diminishing and constant learning rates where T is the number of iterations, which is much better than previous results thanks to the QMC method. Empirical results on a variety of benchmark datasets show that our algorithm not only has a good generalization performance but also enjoys better scalability than existing BCS3VM algorithms.

Cascaded Debiasing: Studying the Cumulative Effect of Multiple Fairness-Enhancing Interventions

Understanding the cumulative effect of multiple fairness-enhancing interventions at different stages of the machine learning (ML) pipeline is a critical and underexplored facet of the fairness literature. Such knowledge can be valuable to data scientists/ML practitioners in designing fair ML pipelines. This paper takes the first step in exploring this area by undertaking an extensive empirical study comprising 60 combinations of interventions, 9 fairness metrics, 2 utility metrics (Accuracy and F1 Score) across 4 benchmark datasets. We quantitatively analyze the experimental data to measure the impact of multiple interventions on fairness, utility and population groups. We found that applying multiple interventions results in better fairness and lower utility than individual interventions on aggregate. However, adding more interventions do no always result in better fairness or worse utility. The likelihood of achieving high performance (F1 Score) along with high fairness increases with larger number of interventions. On the downside, we found that fairness-enhancing interventions can negatively impact different population groups, especially the privileged group. This study highlights the need for new fairness metrics that account for the impact on different population groups apart from just the disparity between groups. Lastly, we offer a list of combinations of interventions that perform best for different fairness and utility metrics to aid the design of fair ML pipelines.

RecipeMind: Guiding Ingredient Choices from Food Pairing to Recipe Completion using Cascaded Set Transformer

We propose a computational approach for recipe ideation, a downstream task that helps users select and gather ingredients for creating dishes. To perform this task, we developed RecipeMind, a food affinity score prediction model that quantifies the suitability of adding an ingredient to set of other ingredients. We constructed a large-scale dataset containing ingredient co-occurrence based scores to train and evaluate RecipeMind on food affinity score prediction. Deployed in recipe ideation, RecipeMind helps the user expand an initial set of ingredients by suggesting additional ingredients. Experiments and qualitative analysis show RecipeMind's potential in fulfilling its assistive role in cuisine domain.

Real-time Short Video Recommendation on Mobile Devices

Short video applications have attracted billions of users in recent years, fulfilling their various needs with diverse content. Users usually watch short videos on many topics on mobile devices in a short period of time, and give explicit or implicit feedback very quickly to the short videos they watch. The recommender system needs to perceive users' preferences in real-time in order to satisfy their changing interests. Traditionally, recommender systems deployed at server side return a ranked list of videos for each request from client. Thus it cannot adjust the recommendation results according to the user's real-time feedback before the next request. Due to client-server transmitting latency, it is also unable to make immediate use of users' real-time feedback. However, as users continue to watch videos and feedback, the changing context leads the ranking of the server-side recommendation system inaccurate. In this paper, we propose to deploy a short video recommendation framework on mobile devices to solve these problems. Specifically, we design and deploy a tiny on-device ranking model to enable real-time re-ranking of server-side recommendation results. We improve its prediction accuracy by exploiting users' real-time feedback of watched videos and client-specific real-time features.

With more accurate predictions, we further consider interactions among candidate videos, and propose a context-aware re-ranking method based on adaptive beam search. The framework has been deployed on Kuaishou, a billion-user scale short video application, and improved effective view, like and follow by 1.28%, 8.22% and 13.6% respectively.

Towards Fairer Classifier via True Fairness Score Path

Fair classification which enforces a fairness constraint on the original learning problem is an emerging topic in machine learning. Due to its non-convexity and non-discontinuity, the original (true) fairness constraint is normally relaxed to a convex and smooth surrogate which could lead to slightly deviated solutions and could violate the original fairness constraint. To re-calibrate with the original constraint, existing methods usually hand-tunes a hyper-parameter of the convex surrogate. Such a method is obviously time consuming, besides it cannot guarantee to find the fairer classifier (i.e., original fairness constraint is less than a smaller threshold). To address this challenging problem, we propose a novel true fairness score path algorithm which guarantees to find fairer classifiers efficiently. Specifically, we first give a new formulation of fair classification which treats the surrogate fairness constraint as an additional regularization term, with a fairness hyper-parameter controlling the degree of surrogate fairness. Then, we propose a solution path algorithm which tracks the solutions of fair classification regarding to the fairness hyper-parameter. Based on the solution path, we further propose a true fairness score path algorithm which derives the curve of fairness score with respect to the fairness hyper-parameter and allows us to find the fairer classifiers. Finally, extensive experimental results not only verify the effectiveness of our algorithm, but also show that we can find the fairer classifiers efficiently.

UDM: A Unified Deep Matching Framework in Recommender Systems

Due to the large-scale users and items, industrial recommender systems usually consist of two stages, the matching stage and the ranking stage. The matching stage is responsible for retrieving a small fraction of relevant items from the large-scale item pool which are further selected by the ranking stage. Most of the existing deep learning-based matching models focus on the problem of modeling user interest representation by using inner product between user representation and item representation to obtain the user-to-item relevance. However, the item-to-item relevance between user interacted item and target item is not considered in the deep matching models which is computationally prohibitive for large-scale applications. In this paper, we propose a unified deep matching framework called UDM for the matching stage to mitigate this issue. UDM can model the user-to-item relevance and item-to-item relevance simultaneously with the help of an interest extraction module and interest interaction module, respectively. Specifically, the interest extraction module is used as the main network to extract users' multiple interests with multiple vectors based on users' behavior sequences, while the interest interaction module is used as an auxiliary network to supervise the learning of the interest extraction module, which can model the interaction between user interacted items and target item. In the experiments conducted on two public datasets and a large-scale industrial dataset, UDM achieves consistent improvements over state-of-the-art models. Moreover, UDM has been deployed in the operational system of Alibaba. Online A/B testing results further reveal the effectiveness of UDM. To the best of our knowledge, UDM is the first deep matching framework which combines the user-to-item relevance modeling and item-to-item relevance modeling in the same model.

Sentaur: Sensor Observable Data Model for Smart Spaces

This paper presents Sentaur, a middleware designed, built, and deployed to support sensor-based smart space analytical applications. Sentaur supports a powerful data model that decouples semantic data (about the application domain) from sensor data (using which the semantic data is derived). By supporting mechanisms to map/translate data, concepts, and queries between the two levels, Sentaur relieves application developers from having to know or reason about either capabilities of sensors or write sensor specific code. This paper describes Sentaur's data model, its translation strategy, and highlights its benefits through real-world case studies.

Addressing Cold Start in Product Search via Empirical Bayes

Cold start is a challenge in product search. Profuse literature addresses related problems such as bias and diversity in search, and cold start is a classic topic in recommender systems research. While search cold start might be seen conceptually as a particular case in such areas, we find that available solutions fail to specifically and practically solve the cold-start problem in product search. The problem is complex as exposing new products may come at the expense of primary business metrics (e.g. revenue), and involves a complex balance between customer satisfaction, seller satisfaction, business performance, short-term gains and long-term value.

In this paper, we propose a principled approach to deal with cold start in a large-scale e-commerce search system. We discuss how product ranking is affected by non-behavioral topical relevance and behavioral popularity, and their role in introducing biases that result in cold-start for ranking new products. Our approach applies Empirical Bayes to model behavioral information via non-behavioral signals in terms of priors, and effectively estimate true engagement posterior updates. We report comprehensive offline and online experiments over large datasets that show the effectiveness of our methods to address cold start, and provide further insights. An online A/B test on 50 million queries shows a significant improvement in new product impressions by 13.53% and a significant increase in new product purchase by 11.14%, with overall purchases up by 0.08%, highlighting the empirical effectiveness of the approach.

PROPN: Personalized Probabilistic Strategic Parameter Optimization in Recommendations

Real-world recommender systems usually consist of two phases. Predictive models in Phase I provide accurate predictions of users' actions on items, and Phase II is to aggregate the predictions withstrategic parameters to make final recommendations, which aim to meet multiple business goals, such as maximizing users' like rate and average engagement time. Though it is important to generate accurate predictions in Phase I, it is also crucial to optimize the strategic parameters in Phase II. Conventional solutions include manually tunning, Bayesian optimization, contextual multi-armed bandit optimization, etc. However, these methods either produce universal strategic parameters for all the users or focus on a deterministic solution, which leads to an undesirable performance. In this paper, we propose a personalized probabilistic solution for strategic parameter optimization. We first formulate the personalized probabilistic optimizing problem and compare its solution with deterministic and context-free solutions theoretically to show its superiority. We then introduce a novel Personalized pRObabilistic strategic parameter optimizing Policy Network (PROPN) to solve the problem. PROPN follows reinforcement learning architecture where a neural network serves as an agent that dynamically adjusts the distributions of strategic parameters for each user. We evaluate our model under the streaming recommendation setting on two public real-world datasets. The results show that our framework outperforms representative baseline methods.

BLUTune: Query-informed Multi-stage IBM Db2 Tuning via ML

Modern data systems such as IBM Db2 have hundreds of system configuration parameters, ''knobs", which heavily influence the performance of business queries. Manual configuration, ''tuning," by experts is painstaking and time consuming. We propose a query informed tuning system called BLUTune which uses machine learning (ML)-deep reinforcement learning based on advantage actor critic neural networks-to tune configurations within defined resource constraints. We translate high-dimensional query execution plans (QEPs) into a low-dimensional embedding space (QEP2Vec) for input into the ML models. To scale to complex and large workloads, we bootstrap the training process through transfer learning. We first train our model based on the estimated cost of queries; we then fine-tune it based on actual query execution times. We demonstrate by an experimental study over various synthetic and real-world workloads BLUTune's efficiency and effectiveness.

DuETA: Traffic Congestion Propagation Pattern Modeling via Efficient Graph Learning for ETA Prediction at Baidu Maps

Estimated time of arrival (ETA) prediction, also known as travel time estimation, is a fundamental task for a wide range of intelligent transportation applications, such as navigation, route planning, and ride-hailing services. To accurately predict the travel time of a route, it is essential to take into account both contextual and predictive factors, such as spatial-temporal interaction, driving behavior, and traffic congestion propagation inference. The ETA prediction models previously deployed at Baidu Maps have addressed the factors of spatial-temporal interaction (ConSTGAT) and driving behavior (SSML). In this work, we believe that modeling traffic congestion propagation patterns is of great importance toward accurately performing ETA prediction, and we focus on this factor to improve ETA performance. Traffic congestion propagation pattern modeling is challenging, and it requires accounting for impact regions over time and cumulative effect of delay variations over time caused by traffic events on the road network. In this paper, we present a practical industrial-grade ETA prediction framework named DuETA. Specifically, we construct a congestion-sensitive graph based on the correlations of traffic patterns, and we develop a route-aware graph transformer to directly learn the long-distance correlations of the road segments. This design enables DuETA to capture the interactions between the road segment pairs that are spatially distant but highly correlated with traffic conditions. Extensive experiments are conducted on large-scale, real-world datasets collected from Baidu Maps. Experimental results show that ETA prediction can significantly benefit from the learned traffic congestion propagation patterns, which demonstrates the effectiveness and practical applicability of DuETA. In addition, DuETA has already been deployed in production at Baidu Maps, serving billions of requests every day. This demonstrates that DuETA is an industrial-grade and robust solution for large-scale ETA prediction services.

DuIVRS: A Telephonic Interactive Voice Response System for Large-Scale POI Attribute Acquisition at Baidu Maps

The task of POI attribute acquisition, which aims at completing missing attributes (e.g., POI name, address, status, phone, and open/close time) for a point of interest (POI) or updating existing attribute values of a POI, plays an essential role in enabling users to entertain location-based services using commercial map applications, such as Baidu Maps. Existing solutions have adopted street views or web documents to acquire POI attributes, which have a major limitation in applying for large-scale production due to the labor-intensive and time-consuming nature of collecting data, error accumulation in processing textual/visual data in unstructured or free format, and necessitating post-processing steps with manual efforts. In this paper, we present our efforts and findings from a 3-year longitudinal study on designing and implementing DuIVRS, which is an alternative, fully automatic, and production-proven solution for large-scale POI attribute acquisition via completely machine-directed dialogues. Specifically, DuIVRS is designed to proactively acquire POI attributes via a telephonic interactive voice response system, whose tasks are to generate machine-initiative directed dialogues, make scripted telephone calls to businesses, and interact with people who answered the phone to achieve predefined goals through multi-turn dialogues. DuIVRS has already been deployed in production at Baidu Maps since December 2018, which greatly improves productivity and reduces production cost of POI attribute acquisition. As of December 31, 2021, DuIVRS has made 140 million calls and 42 million POI attribute updates within a 3-year period, which represents an approximately 3-year workload for a high-performance team of 1,000 call center workers. This demonstrates that DuIVRS is an industrial-grade and robust solution for cost-effective, large-scale acquisition of POI attributes.

Incorporating Fairness in Large-scale Evacuation Planning

Evacuation planning is an essential part of disaster management where the goal is to relocate people in a safe and orderly manner. Existing research has shown that such problems are hard to approximate and current methods are difficult to scale to real-life applications. We introduce a notion of fairness and two related objectives while studying evacuation planning, namely: minimizing maximum inconvenience and minimizing average inconvenience. We show that both problems are not just NP-hard to solve exactly, but in fact are NP-hard to approximate. On the positive side, we present a heuristic optimization method MIP-LNS, based on the well-known Large Neighborhood Search framework, that can find good approximate solutions in reasonable amount of time. We also consider a multi-objective problem where the goal is to minimize both objectives and solve it using MIP-LNS. We use real-world road network and population data from Harris County in Houston, Texas (a region that needed large-scale evacuations in the past), and apply MIP-LNS to calculate evacuation plans for the area. We compare the quality of the plans in terms of evacuation efficiency and fairness. We find that the solutions to the multi-objective problem are superior in both of these aspects. We also perform statistical tests to show that the solutions are significantly different.

Bridging Self-Attention and Time Series Decomposition for Periodic Forecasting

In this paper, we study how to capture explicit periodicity to boost the accuracy of deep models in univariate time series forecasting. Recent advanced deep learning models such as recurrent neural networks (RNNs) and transformers have reached new heights in terms of modeling sequential data, such as natural languages, due to their powerful expressiveness. However, real-world time series are often more periodic than general sequential data, while recent studies confirm that standard neural networks are not capable of capturing the periodicity sufficiently because they have no modules that can represent periodicity explicitly. In this paper, we alleviate this challenge by bridging the self-attention network with time series decomposition and propose a novel framework called DeepFS. DeepFS equips <u> Deep </u> models with <u> F </u> ourier <u> S </u>eries to preserve the periodicity of time series. Specifically, our model first uses self-attention to encode temporal patterns, from which to predict the periodic and non-periodic components for reconstructing the forecast outputs. The Fourier series is injected as an inductive bias in the periodic component. Capturing periodicity not only boosts the forecasting accuracy but also offers interpretable insights for real-world time series. Extensive empirical analyses on both synthetic and real-world datasets demonstrate the effectiveness of DeepFS. Studies about why and when DeepFS works provide further understanding of our model.

Adaptive Domain Interest Network for Multi-domain Recommendation

Industrial recommender systems usually hold data from multiple business scenarios and are expected to provide recommendation services for these scenarios simultaneously. In the retrieval step, the topK high-quality items selected from a large number of corpus usually need to be various for multiple scenarios. Take Alibaba display advertising system for example, not only because the behavior patterns of Taobao users are diverse, but also differentiated scenarios' bid prices assigned by advertisers vary significantly. Traditional methods either train models for each scenario separately, ignoring the cross-domain overlapping of user groups and items, or simply mix all samples and maintain a shared model which makes it difficult to capture significant diversities between scenarios. In this paper, we present Adaptive Domain Interest Network(ADIN) that adaptively handles the commonalities and diversities across scenarios, making full use of multi-scenarios data during training. Then the proposed method is able to improve the performance of each business domain by giving various topK candidates for different scenarios during online inference. Specifically, our proposed ADIN models the commonalities and diversities for different domains by shared networks and domain-specific networks, respectively. In addition, we apply the domain-specific batch normalization and design the domain interest adaptation layer for feature-level domain adaptation. A self training strategy is also incorporated to capture label-level connections across domains.ADIN has been deployed in the display advertising system of Alibaba, and obtains 1.8% improvement on advertising revenue.

RaDaR: A Real-Word Dataset for AI powered Run-time Detection of Cyber-Attacks

Artificial Intelligence techniques on malware run-time behavior have emerged as a promising tool in the arms race against sophisticated and stealthy cyber-attacks. While data of malware run-time features are critical for research and benchmark comparisons, unfortunately, there is a dearth of real-world datasets due to multiple challenges to their collection. The evasive nature of malware, its dependence on connected real-world conditions to execute, and its potential repercussions pose significant challenges for executing malware in laboratory settings. Consequently, prior open datasets rely on isolated virtual sandboxes to run malware, resulting in data that is not representative of malware behavior in the wild.

This paper presents RaDaR, an open real-world dataset for run-time behavioral analysis of Windows malware. RaDaR is collected by executing malware on a real-world testbed with Internet connectivity and in a timely manner, thus providing a close-to-real-world representation of malware behavior. To enable an unbiased comparison of different solutions and foster multiple verticals in malware research, RaDaR provides a multi-perspective data collection and labeling of malware activity. The multi-perspective collection provides a comprehensive view of malware activity across the network, operating system (OS), and hardware. On the other hand, the multi-perspective labeling provides four independent perspectives to analyze the same malware, including its methodology, objective, capabilities, and the information it exfiltrates. To date, RaDaR includes 7 million network packets, 11.3 million OS system call traces, and 3.3 million hardware events of 10,434 malware samples having different methodologies (3 classes) and objectives (9 classes), spread across 30 well-known malware families.

PAVE: Lazy-MDP based Ensemble to Improve Recall of Product Attribute Extraction Models

E-commerce stores face the challenge of missing and inconsistent attribute values in the product detail pages and have to impute them on behalf of their vendors. Traditional approaches formulate the problem of attribute extraction(AE) from product profiles as natural language tasks such as information extraction or text classification. Such models typically operate at high precision but may yield low recall especially on attributes with an open vocabulary due to 1) missing or incorrect information in product profiles, 2) generalization errors due to lack of contextual understanding, and 3) confidence thresholding to operate at high precision. In this work, we present PAVE: Product Attribute Value Ensemble, a novel reinforcement learning model that usesLazy-MDP formalism to solve for low recall by aggregating information from a sequence of product neighbors. We train a policy network usingProximal Policy Optimization that learns to choose the correct value from the sequence. We observe consistent improvement in recall across all open attributes compared to traditionalAE models with an average lift of 10.3% with no drop in precision. Our method surpasses simple aggregation methods like nearest neighbor, majority vote and binary classifier ensembles and even outperformsAE models for closed attributes. Our approach is scalable, robust to noisy product neighbors and generalizes well on unseen attributes.

Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou

Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. It is widely used in a variety of business scenarios to make operational decisions when acquiring new customers. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. Existing approaches either directly learn from posterior feature distributions or leverage statistical models that make strong assumption on prior distributions, both of which fail to capture those mutable distributions. In this paper, we propose a complete set of industrial-level LTV modeling solutions. Specifically, we introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans, which greatly improves model performance. We further introduce a Multi Distribution Multi Experts (MDME) module based on the Divide-and-Conquer idea, which transforms the severely imbalanced distribution modeling problem into a series of relatively balanced sub-distribution modeling problems hence greatly reduces the modeling complexity. In addition, a novel evaluation metric Mutual Gini is introduced to better measure the distribution difference between the estimated value and the ground-truth label based on the Lorenz Curve. The ODMN framework has been successfully deployed in many business scenarios of Kuaishou, and achieved great performance. Extensive experiments on real-world industrial data demonstrate the superiority of the proposed methods compared to state-of-the-art baselines including ZILN and Two-Stage XGBoost models.

An Adaptive Framework for Confidence-constraint Rule Set Learning Algorithm in Large Dataset

Decision rules have been successfully used in various classification applications because of their interpretability and efficiency. In many real-world scenarios, especially in industrial applications, it is necessary to generate rule sets under certain constraints, such as confidence constraints. However, most previous rule mining methods only emphasize the accuracy of the rule set but take no consideration of these constraints. In this paper, we propose a Confidence-constraint Rule Set Learning (CRSL) framework consisting of three main components, i.e. rule miner, rule ranker, and rule subset selector. Our method not only considers the trade-off between confidence and coverage of the rule set but also considers the trade-off between interpretability and performance. Experiments on benchmark data and large-scale industrial data demonstrate that the proposed method is able to achieve better performance (6.7% and 8.8% improvements) and competitive interpretability when compared with other rule set learning methods.

Query Rewriting in TaoBao Search

In e-commerce search engines, query rewriting (QR) is a crucial technique that improves shopping experience by reducing the vocabulary gap between user queries and product catalog. Recent works have mainly adopted the generative paradigm. However, they hardly ensure high-quality generated rewrites and do not consider personalization, which leads to degraded search relevance. In this work, we present Contrastive Learning Enhanced Query Rewriting (CLE-QR), the solution used in Taobao product search. It uses a novel contrastive learning enhanced architecture based on "query retrieval-semantic relevance ranking-online ranking". It finds the rewrites from hundreds of millions of historical queries while considering relevance and personalization. Specifically, we first alleviate the representation degeneration problem during the query retrieval stage by using an unsupervised contrastive loss, and then further propose an interaction-aware matching method to find the beneficial and incremental candidates, thus improving the quality and relevance of candidate queries. We then present a relevance-oriented contrastive pre-training paradigm on the noisy user feedback data to improve semantic ranking performance. Finally, we rank these candidates online with the user profile to model personalization for the retrieval of more relevant products. We evaluate CLE-QR on Taobao Product Search, one of the largest e-commerce platforms in China. Significant metrics gains are observed in online A/B tests. CLE-QR has been deployed to our large-scale commercial retrieval system and serviced hundreds of millions of users since December 2021. We also introduce its online deployment scheme, and share practical lessons and optimization tricks of our lexical match system.

Cognitive Diagnosis Focusing on Knowledge Concepts

Cognitive diagnosis is a crucial task in the field of educational measurement and psychology, which aims to diagnose the strengths and weaknesses of participants. Existing cognitive diagnosis methods only consider which of knowledge concepts are involved in the knowledge components of exercises, but ignore the fact that different knowledge concepts have different effects on practice scores in actual learning situations. Therefore, researchers need to reshape the learning scene by combining the multi-factor relationships between knowledge components. In this paper, in order to more comprehensively simulate the interaction between students and exercises, we developed a neural network-based CDMFKC model for cognitive diagnosis. Our method not only captures the nonlinear interaction between exercise characteristics, student performance, and their mastery of each knowledge concept, but also further considers the impact of knowledge concepts by designing the difficulty and discrimination of knowledge concepts, and uses multiple neural layers to model their interaction so as to obtain accurate and interpretable diagnostic results. In addition, we propose an improved CDMFKC model with guessing parameter and slipping parameter designed by knowledge concept proficiency and student proficiency vectors. We validate the performance of these two diagnostic models on six real datasets. The experimental results show that the two models have better effects in the aspects of accuracy, rationality and interpretability.

Predicting Multi-level Socioeconomic Indicators from Structural Urban Imagery

Understanding economic development and designing government policies requires accurate and timely measurements of socioeconomic activities. In this paper, we show how to leverage city structural information and urban imagery like satellite images and street view images to accurately predict multi-level socioeconomic indicators. Our framework consists of four steps. First, we extract structural information from cities by transforming real-world street networks into city graphs (GeoStruct). Second, we design a contrastive learning-based model to refine urban image features by looking at geographic similarity between images, with images that are geographically close together having similar features (GeoCLR). Third, we propose using street segments as containers to adaptively fuse the features of multi-view urban images, including satellite images and street view images (GeoFuse). Finally, given the city graph with a street segment as a node and a neighborhood area as a subgraph, we jointly model street- and neighborhood-level socioeconomic indicator predictions as node and subgraph classification tasks. The novelty of our method is that we introduce city structure to organize multi-view urban images and model the relationships between socioeconomic indicators at different levels. We evaluate our framework on the basis of real-world datasets collected in multiple cities. Our proposed framework improves performance by over 10% when compared to state-of-the-art baselines in terms of prediction accuracy and recall.

IntTower: The Next Generation of Two-Tower Model for Pre-Ranking System

Scoring a large number of candidates precisely in several milliseconds is vital for industrial pre-ranking systems. Existing pre-ranking systems primarily adopt the two-tower model since the "user-item decoupling architecture" paradigm is able to balance the efficiency and effectiveness. However, the cost of high efficiency is the neglect of the potential information interaction between user and item towers, hindering the prediction accuracy critically. In this paper, we show it is possible to design a two-tower model that emphasizes both information interactions and inference efficiency. The proposed model, IntTower (short for Interaction enhanced Two-Tower), consists of Light-SE, FE-Block and CIR modules. Specifically, lightweight Light-SE module is used to identify the importance of different features and obtain refined feature representations in each tower. FE-Block module performs fine-grained and early feature interactions to capture the interactive signals between user and item towers explicitly and CIR module leverages a contrastive interaction regularization to further enhance the interactions implicitly. Experimental results on three public datasets show that IntTower outperforms the SOTA pre-ranking models significantly and even achieves comparable performance in comparison with the ranking models. Moreover, we further verify the effectiveness of IntTower on a large-scale advertisement pre-ranking system. The code of IntTower is publicly available

PlatoGL: Effective and Scalable Deep Graph Learning System for Graph-enhanced Real-Time Recommendation

Recently, graph neural network (GNN) approaches have received huge interests in recommendation tasks due to their ability of learning more effective user and item representations. However, existing GNN-based recommendation models cannot support real-time recommendation where the model keeps its freshness by continuously training the streaming data that users produced, leading to negative impact on recommendation performance. To fully support graph-enhanced large-scale recommendation in real-time scenarios, a deep graph learning system is required to dynamically store the streaming data as a graph structure and enable the development of any GNN model incorporated with the capabilities of real-time training and online inference. However, such requirements rule out existing deep graph learning solutions. In this paper, we propose a new deep graph learning system called PlatoGL, where (1) an effective block-based graph storage is designed with non-trivial insertion/deletion mechanism for updating the graph topology in-milliseconds, (2) a non-trivial multi-blocks neighbour sampling method is proposed for efficient graph query, and (3) a cache technique is exploited to improve the storage stability. We have deployed PlatoGL in Wechat, and leveraged its capability in various content recommendation scenarios including live-streaming, article and micro-video. Comprehensive experiments on both deployment performance and benchmark performance~(w.r.t. its key features) demonstrate its effectiveness and scalability. One real-time GNN-based model, developed with PlatoGL, now serves the major online traffic in WeChat live-streaming recommendation scenario.

Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences

Sequential recommendation predicts users' next behaviors with their historical interactions. Recommending with longer sequences improves recommendation accuracy and increases the degree of personalization. As sequences get longer, existing works have not yet addressed the following two main challenges. Firstly, modeling long-range intra-sequence dependency is difficult with increasing sequence lengths. Secondly, it requires efficient memory and computational speeds. In this paper, we propose a Sparse Attentive Memory (SAM) network for long sequential user behavior modeling. SAM supports efficient training and real-time inference for user behavior sequences with lengths on the scale of thousands. In SAM, we model the target item as the query and the long sequence as the knowledge database, where the former continuously elicits relevant information from the latter. SAM simultaneously models target-sequence dependencies and long-range intra-sequence dependencies with O(L) complexity and O(1) number of sequential updates, which can only be achieved by the self-attention mechanism with O(L2) complexity. Extensive empirical results demonstrate that our proposed solution is effective not only in long user behavior modeling but also on short sequences modeling. Implemented on sequences of length 1000, SAM is successfully deployed on one of the largest international E-commerce platforms. This inference time is within 30ms, with a substantial 7.30% click-through rate improvement for the online A/B test. To the best of our knowledge, it is the first end-to-end long user sequence modeling framework that models intra-sequence and target-sequence dependencies with the aforementioned degree of efficiency and successfully deployed on a large-scale real-time industrial recommender system.

Knowledge Enhanced Multi-Interest Network for the Generation of Recommendation Candidates

Candidate generation task requires that candidates related to user interests need to be extracted in realtime. Previous works usually transform a user's behavior sequence to a unified embedding, which can not reflect the user's multiple interests. Some recent works like Comirec and Octopus use multi-channel structures to capture users' diverse interests. They cluster users' historical behaviors into several groups, claiming that one group represents one interest. However, these methods have some limitations. First, an item may correspond to multiple interests of users, thereby simply allocating it to just one interest group will make the modeling of users' interests coarse-grained and inaccurate. Second, explaining user interests at the level of items is rather vague and not convincing. In this paper, we propose a Knowledge Enhanced Multi-Interest Network: KEMI, which exploits knowledge graphs to help learn users' diverse interest representations via heterogeneous graph neural networks (HGNNs) and a novel dual memory network. Specifically, we use HGNNs to capture the semantic representation of knowledge entities and a novel dual memory network to learn a user's diverse interests from his behavior sequence. Through memory slots of the user memory network and the item memory network, we can learn multiple interests for each user and each item. Meanwhile, by binding the entities to the channels of memory networks, we enable it to be explained from the perspective of the knowledge graph, which enhances the interpretability and understanding of user interests. We conduct extensive experiments on two industrial and publicly available datasets. Experimental results demonstrate that our model achieves significant improvements over state-of-the-art baseline models.

Multi-Faceted Hierarchical Multi-Task Learning for Recommender Systems

There have been many studies on improving the efficiency of shared learning in Multi-Task Learning (MTL). Previous works focused on the "micro" sharing perspective for a small number of tasks, while in Recommender Systems (RS) and many other AI applications, we often need to model a large number of tasks. For example, when using MTL to model various user behaviors in RS, if we differentiate new users and new items from old ones, the number of tasks will increase exponentially with multidimensional relations. This work proposes a Multi-Faceted Hierarchical MTL model (MFH) that exploits the multidimensional task relations in large scale MTLs with a nested hierarchical tree structure. MFH maximizes the shared learning through multi-facets of sharing and improves the performance with heterogeneous task tower design. For the first time, MFH addresses the "macro" perspective of shared learning and defines a "switcher" structure to conceptualize the structures of macro shared learning. We evaluate MFH and SOTA models in a large industry video platform of 10 billion samples and hundreds of millions of monthly active users. Results show that MFH outperforms SOTA MTL models significantly in both offline and online evaluations across all user groups, especially remarkable for new users with an online increase of 9.1% in app time per user and 1.85% in next-day retention rate. MFH currently has been deployed in WeSee, Tencent News, QQ Little World and Tencent Video, several products of Tencent. MFH is especially beneficial to the cold-start problems in RS where new users and new items often suffer from a "local overfitting" phenomenon that we first formalize in this paper.

BRIGHT - Graph Neural Networks in Real-time Fraud Detection

Detecting fraudulent transactions is an essential component to control risk in e-commerce marketplaces. Apart from rule-based and machine learning filters that are already deployed in production, we want to enable efficient real-time inference with graph neural networks (GNNs), which is useful to catch multihop risk propagation in a transaction graph. However, two challenges arise in the implementation of GNNs in production. First, future information in a dynamic graph should not be considered in message passing to predict the past. Second, the latency of graph query and GNN model inference is usually up to hundreds of milliseconds, which is costly for some critical online services. To tackle these challenges, we propose a Batch and Real-time Inception GrapH Topology (BRIGHT) framework to conduct an end-to-end GNN learning that allows efficient online real-time inference.

BRIGHT framework consists of a graph transformation module (Two-Stage Directed Graph) and a corresponding GNN architecture (Lambda Neural Network). The Two-Stage Directed Graph guarantees that the information passed through neighbors is only from the historical payment transactions. It consists of two subgraphs representing historical relationships and real-time links, respectively. The Lambda Neural Network decouples inference into two stages: batch inference of entity embeddings and real-time inference of transaction prediction. Our experiments show that BRIGHT outperforms the baseline models by >2% in average w.r.t. precision. Furthermore, BRIGHT is computationally efficient for real-time fraud detection. Regarding end-to-end performance (including neighbor query and inference), BRIGHT can reduce the P99 latency by >75%. For the inference stage, our speedup is on average 7.8× compared to the traditional GNN.

STARDOM: Semantic Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction

We study the search traffic forecasting problem for guaranteed search advertising (GSA) application in e-commerce platforms. The consumers express their purchase intents by posing queries to the e-commerce search engine. GSA is a type of guaranteed delivery (GD) advertising strategy, which forecasts the traffic of search queries, and charges the advertisers according to the predicted volumes of search queries the advertisers willing to buy. We employ the time series forecasting method to make the search traffic prediction. Different from existing time series prediction methods, search queries are semantically meaningful, with semantically similar queries possessing similar time series. And they can be grouped according to the brands or categories they belong to, exhibiting hierarchical structures. To fully take advantage of these characteristics, we design a SemanTic AwaRe Deep hierarchical fOrecasting Model (STARDOM for short) which explores the queries' semantic information and the hierarchical structures formed by the queries. Specifically, to exploit hierarchical structure, we propose a reconciliation learning module. It leverages deep learning model to learn the reconciliation relation between the hierarchical series in the latent space automatically, and forces the coherence constraints through a distill reconciliation loss. To exploit semantic information, we propose a semantic representation module and generate semantic aware series embeddings for queries. Extensive experiments are conducted to confirm the effectiveness of the proposed method.

Towards Fair Workload Assessment via Homogeneous Order Grouping in Last-mile Delivery

The popularity of e-commerce has promoted the rapid development of the logistics industry in recent years. As an important step in logistics, last-mile delivery from delivery stations to customers' addresses is now mainly finished by couriers, which requires accurate workload assessment based on actual efforts. However, the state-of-the-practice assessment methods neglect a vital factor that orders with the same customer's address (i.e., Homogeneous orders) can be delivered in a group (i.e., in a single trip) or separately (i.e., in multiple trips). It would cause unfair assessment among couriers if following the same rule. Thus, grouping homogeneous order accurately in the workload assessment is significant for achieving fair courier's workload assessment. To this end, we design, implement, and deploy a nationwide homogeneous order grouping system called FHOG for improving the accuracy of homogeneous order grouping in last-mile delivery for fair courier's workload assessment. FHOG utilizes the courier's reporting behavior for order inspection, collection, and delivery to identify homogeneous orders in the delivery station simultaneously for homogeneous order grouping. Compared with the state-of-the-practice method, our evaluation shows FHOG can effectively reduce order amounts with the higher and lower assessed courier's workload. We further deploy FHOG online in 8336 delivery stations to provide homogeneous order grouping service for more than 120 thousand couriers and 12 million daily orders. The results of the two surveys show that the couriers' acceptance rate is improved by 67% with FHOG after the promotion.

Efficient Compression Method for Roadside LiDAR Data

Roadside LiDAR (Light Detection and Ranging) sensors are recently being explored for intelligent transportation systems aiming at safer and faster traffic management and vehicular operations. A key challenge in such systems is to efficiently transfer massive point-cloud data from the roadside LiDAR devices to the edge connected through a 5G network for real-time processing. In this paper, we consider the problem of compressing roadside (i.e. static) LiDAR data in real-time that provides a unique condition unexplored by current methods. Existing point-cloud compression methods assume moving LiDARs (that are mounted on vehicles) and do not exploit spatial consistency across frames over time.

To this end, we develop a novel grouped wavelet technique for <u>s</u>tatic roadside <u>Li</u>DAR data <u>c</u>ompression (i.e. SLiC). Our method compresses LiDAR data both spatially and temporally using a kd-tree data structure based on Haar wavelet coefficients. Experimental results show that SLiC can compress up to 1.9× more effectively than the state-of-the-art compression method can do. Moreover, SLiC is computationally more efficient to achieve 2× improvement in bandwidth usage over the best alternative. Even with this impressive gain in communication and storage efficiency, SLiC retains down-the-pipeline application's accuracy.

MEMENTO: Neural Model for Estimating Individual Treatment Effects for Multiple Treatments

Learning individual level treatment effects from observational data is a problem of growing interest. For instance, inferring the effect of delivery promises on purchase of products on an e-commerce site or selecting the most effective treatment for a specific patient. Although the scenarios where we want to estimate the treatment effects in presence of multiple treatments is quite common in real life, most existing works related to individual treatment effect (ITE) are focused primarily on binary treatments and do not have a natural extension to the multi-treatment scenarios. In this paper we present MEMENTO ? a methodology and a framework to estimate individual treatment effect for multi-treatment scenarios, where the treatments are discrete and finite. Our approach is based on obtaining matching representations of the confounders for the various treatment types. This is achieved through minimization of an upper bound on the sum of factual and counterfactual losses. Experiments on real and semi-synthetic datasets show that MEMENTO is able to outperform known techniques for multi-treatment scenarios by close to 10% in certain use-cases. The proposed framework has been deployed for the problem of identifying minimum order quantity of a product in Amazon in an emerging marketplace and has re- sulted in a 4.7% reduction in shipping costs as proved from an A/B experiment.

Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection

eBay's experimentation platform runs hundreds of A/B tests on any given day. The platform integrates with the tracking infrastructure and customer experience servers, provides the sampling service for experiments, and has the responsibility to monitor the progress of each A/B test. There are many challenges especially when it is required to ensure experiment quality at the large scale. We discuss two automated test quality monitoring processes and methodologies, namely randomization validation using population stability index (PSI) and sample ratio mismatch (a.k.a. sample delta) detection using sequential analysis. The automated processes assist the experimentation platform to run high quality and trustworthy tests not only effectively on a large scale, but also efficiently by minimizing false positive monitoring alarms to experimenters.

MIC: Model-agnostic Integrated Cross-channel Recommender

Semantically connecting users and items is a fundamental problem for the matching stage of an industrial recommender system. Recent advances in this topic are based on multi-channel retrieval to efficiently measure users' interest on items from the massive candidate pool. However, existing studies are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Embedding-based Retrieval (U2I), thus access to the limited correlation between users and items which solely entail from partial information of latent interactions. In this paper, we propose a model-agnostic integrated cross-channel (MIC) approach for the large-scale recommendation, which maximally leverages the inherent multi-channel mutual information to enhance the matching performance. Specifically, MIC robustly models correlation within user-item, user-user, and item-item from latent interactions in a universal schema. For each channel, MIC naturally aligns pairs with semantic similarity and distinguishes them otherwise with more uniform anisotropic representation space. While state-of-the-art methods require specific architectural design, MIC intuitively considers them as a whole by enabling the complete information flow among users and items. Thus MIC can be easily plugged into other retrieval recommender systems. Extensive experiments show that our MIC helps several state-of-the-art models boost their performance on four real-world benchmarks. The satisfactory deployment of the proposed MIC on industrial online services empirically proves its scalability and flexibility.

Guided Text-based Item Exploration

Exploratory Data Analysis (EDA) provides guidance to users to help them refine their needs and find items of interest in large volumes of structured data. In this paper, we develop GUIDES, a framework for guided Text-based Item Exploration (TIE). TIE raises new challenges: (i) the need to abstract and query textual data and (ii) the need to combine queries on both structured and unstructured content. GUIDES represents text dimensions such as sentiment and topics, and introduces new text-based operators that are seamlessly integrated with traditional EDA operators. To train TIE policies, it relies on a multi-reward function that captures different textual dimensions, and extends the Deep Q-Networks (DQN) architecture with multi-objective optimization. Our experiments on Amazon and IMDb, two real-world datasets, demonstrate the necessity of capturing fine-grained text dimensions, the superiority of using both text-based and attribute-based operators over attribute-based operators only, and the need for multi-objective optimization.

Multimodal Meta-Learning for Cold-Start Sequential Recommendation

In this paper, we study the task of cold-start sequential recommendation, where new users with very short interaction sequences come with time. We cast this problem as a few-shot learning problem and adopt a meta-learning approach to developing our solution. For our task, a major obstacle of effective knowledge transfer that is there exists significant characteristic divergence between old and new interaction sequences for meta-learning. To address the above issues, we purpose a Multimodal MetaLearning (denoted as MML) approach that incorporates multimodal side information of items (e.g., text and image) into the meta-learning process, to stabilize and improve the meta-learning process for cold-start sequential recommendation. In specific, we design a group of multimodal meta-learners corresponding to each kind of modality, where ID features are used to develop the main meta-learner and the rest text and image features are used to develop auxiliary meta-learners. Instead of simply combing the predictions from different meta-learners, we design an adaptive, learnable fusion layer to integrate the predictions based on different modalities. Meanwhile, we design a cold-start item embedding generator, which utilize multimodal side information to warm up the ID embeddings of new items. Extensive offline and online experiments demonstrate that MML can significantly improve the recommendation performance for cold-start users compared with baseline models. Our code is released at

Learning-to-Spell: Weak Supervision based Query Correction in E-Commerce Search with Small Strong Labels

For an E-commerce search engine, users finding the right product critically depend on spell correction. A misspelled query can fetch totally unrelated results which in turn leads to a bad customer experience. Around 32% of queries have spelling mistakes on our e-commerce search engine. The spell problem becomes more challenging when most spell errors arise from customers with little or no exposure to the English language besides the usual source of accidental mistyping on keyboard. These spell errors are heavily influenced by the colloquial and spoken accents of the customers. This limits the benefit from using generic spell correction systems which are learnt from cleaner English sources like Brown Corpus and Wikipedia with a very low focus on phonetic/vernacular spell errors. In this work, we present a novel approach towards spell correction that effectively solves a very diverse set of spell errors and outperforms several state-of-the-art systems in the domain of E-commerce search. Our strategy combines Learning-to-Rank on a small strongly labelled data with multiple learners trained with weakly labelled data. We report the effectiveness of our solution WellSpell (Weak and strong Labels for Learning to Spell) with both the offline evaluations and online A/B experiment.

Observability of SQL Hints in Oracle

Observability is a critical requirement of increasingly complex and cloud-first data management systems. In most commercial databases, this relies on telemetry like logs, traces, and metrics, which helps to identify, mitigate, and resolve issues expeditiously. SQL monitoring tools, for example, can show how a query is performing. One area that has received comparatively less attention is the observability of the query optimizer whose inner workings are often shrouded in mystery. Optimizer traces can illuminate the plan selection process for a query, but they are comprehensible only to human experts and are not easily machine-parsable to remediate sub-optimal plans. Hints are directives that guide the optimizer toward specific directions. While hints can be used manually, they are often used by automatic SQL plan management tools that can quickly identify and resolve regressions by selecting alternate plans. It is important to know when input hints are inapplicable so that the tools can try other strategies. For example, a manual hint may have syntax errors, or an index in an automatic hint may have been accidentally dropped. In this paper, we describe the design and implementation of Oracle's hint observability framework which provides a comprehensive usage report of all hints, manual or otherwise, used to compile a query. The report, which is available directly in the execution plan in a human-understandable and machine-readable format, can be used to automate any necessary corrective actions. This feature is available in Oracle Autonomous Database 19c.

High Availability Framework and Query Fault Tolerance for Hybrid Distributed Database Systems

Modern commercial database systems are increasingly evolving into a hybrid distributed system model where a primary database host system enlists the services of a loosely coupled secondary system that acts as an accelerator. Often the secondary system is a distributed system that can perform specific tasks massively parallelized with results fed back to the host database. Similar models can also be seen in architectures that separate compute from storage. As the scale of the system grows, failures of nodes become common, and the architectural goal is to recover the system with minimal disruption to the workload as seen by the user. This paper introduces a new framework that allows a host database to efficiently manage the availability of a massive secondary distributed system and describes a mechanism to achieve query fault tolerance at the primary database by transparently re-executing query (sub)plans on the secondary distributed system. The focus is on improving two important aspects of disruption ? downtime and transparency to the user. The proposed mechanisms achieve quick recovery, reduced duration of downtime and isolation of errors during query execution, thus improving execution transparency for the users.

Sub-Task Imputation via Self-Labelling to Train Image Moderation Models on Sparse Noisy Data

E-commerce marketplaces protect shopper experience and trust at scale by deploying deep learning models trained on human annotated moderation data, for the identification and removal of advert imagery that does not comply with moderation policies (a.k.a. defective images). However, human moderation labels can be hard to source for smaller advert programs that target specific device types with separate formats or for recently launched locales with unique moderation policies. Additionally, the sourced labels can be noisy due to annotator biases or policy rules clubbing multiple types of transgressions into a single category. Therefore, training advert image moderation models necessitates an approach that can effectively improve the sample efficiency of training, weed out noise and discover latent moderation sub-labels in one go.

Our work demonstrates the merits of automated sub-label discovery using self-labelling. We show that self-labelling approaches can be used to decompose an image moderation task into its hidden sub-tasks (corresponding to intercepting a single sub-label) in an unsupervised manner, thus helping with cases where the granularity of labels is inadequate. This enables us to bootstrap useful representations quickly, via low-capacity but fast-learning teacher models that each specialize in a single distinct sub-task of the main classification task. These sub-task specialists then distil their logits to a high-capacity but slow-learning generalist student model, thus allowing it to perform well on complex moderation tasks with relatively fewer labels than vanilla supervised training. We conduct all our experiments on the moderation of sexually explicit advert images (though this method can be utilized for any defect type) and show a sizeable improvement in NPV (+30.2% absolute gain) viz-a-viz regular supervised baselines at a 1% FPR level. A long-term A/B test of our deployed model shows a significant relative reduction (-45.6%) in the prevalence of such advertisements compared to the previously deployed model.

MOOMIN: Deep Molecular Omics Network for Anti-Cancer Drug Combination Therapy

We propose the molecular omics network (MOOMIN) a multimodal graph neural network used by AstraZeneca oncologists to predict the synergy of drug combinations for cancer treatment. Our model learns drug representations at multiple scales based on a drug-protein interaction network and metadata. Structural properties of compounds and proteins are encoded to create vertex features for a message-passing scheme that operates on the bipartite interaction graph. Propagated messages form multi-resolution drug representations which we utilized to create drug pair descriptors. By conditioning the drug combination representations on the cancer cell type we define a synergy scoring function that can inductively score unseen pairs of drugs. Experimental results on the synergy scoring task demonstrate that MOOMIN outperforms state-of-the-art graph fingerprinting, proximity preserving node embedding, and existing deep learning approaches. Further results establish that the predictive performance of our model is robust to hyperparameter changes. We demonstrate that the model makes high-quality predictions over a wide range of cancer cell line tissues, out-of-sample predictions can be validated with external synergy databases, and that the proposed model is data efficient at learning.

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shopping platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges. We study the performance using our pre-trained model as backbones for diverse downstream tasks, including category classification, attribute extraction, product matching, product clustering, and adult product recognition. Experimental results show that our proposed method outperforms the baseline in each downstream task regarding both single modality and multiple modalities.

Selective Tensorized Multi-layer LSTM for Orbit Prediction

Although the collision of space objects not only incurs a high cost but also threatens human life, the risk of collision between satellites has increased, as the number of satellites has rapidly grown due to the significant interests in many space applications. However, it is not trivial to monitor the behavior of the satellite in real-time since the communication between the ground station and spacecraft is dynamic and sparse, and there is an increased latency due to the long distance. Accordingly, it is strongly required to predict the orbit of a satellite to prevent unexpected contingencies such as a collision. Therefore, the real-time monitoring and accurate orbit prediction are required. Furthermore, it is necessary to compress the prediction model, while achieving a high prediction performance in order to be deployable in the real systems. Although several machine learning and deep learning-based prediction approaches have been studied to address such issues, most of them have applied only basic machine learning models for orbit prediction without considering the size, running time, and complexity of the prediction model. In this research, we propose Selective Tensorized multi-layer LSTM (ST-LSTM) for orbit prediction, which not only improves the orbit prediction performance but also compresses the size of the model that can be applied in practical deployable scenarios. To evaluate our model, we use the real orbit dataset collected from the Korea Multi-Purpose Satellites (KOMPSAT-3 and KOMPSAT-3A) of the Korea Aerospace Research Institute (KARI) for 5 years. In addition, we compare our ST-LSTM to other machine learning-based regression models, LSTM, and basic tensorized LSTM models with regard to the prediction performance, model compression rate, and running time.

PEMP: Leveraging Physics Properties to Enhance Molecular Property Prediction

Molecular property prediction is essential for drug discovery. In recent years, deep learning methods have been introduced to this area and achieved state-of-the-art performances. However, most of existing methods ignore the intrinsic relations between molecular properties which can be utilized to improve the performances of corresponding prediction tasks. In this paper, we propose a new approach, namely Physics properties Enhanced Molecular Property prediction (PEMP), to utilize relations between molecular properties revealed by previous physics theory and physical chemistry studies. Specifically, we enhance the training of the chemical and physiological property predictors with related physics property prediction tasks. We design two different methods for PEMP, respectively based on multi-task learning and transfer learning. Both methods include a model-agnostic molecule representation module and a property prediction module. In our implementation, we adopt both the state-of-the-art molecule embedding models under the supervised learning paradigm and the pretraining paradigm as the molecule representation module of PEMP, respectively. Experimental results on public benchmark MoleculeNet show that the proposed methods have the ability to outperform corresponding state-of-the-art models.

WARNER: Weakly-Supervised Neural Network to Identify Eviction Filing Hotspots in the Absence of Court Records

The widespread eviction of tenants across the United States has metamorphosed into a challenging public-policy problem. In particular, eviction exacerbates several income-based, educational, and health inequities in society, e.g., eviction disproportionately affects low-income renting families, many of whom belong to underrepresented minority groups. Despite growing interest in understanding and mitigating the eviction crisis, there are several legal and infrastructural obstacles to data acquisition at scale that limit our understanding of the distribution of eviction across the United States. To circumvent existing challenges in data acquisition, we propose WARNER, a novel Machine Learning (ML) framework that predicts eviction filing hotspots in US counties from unlabeled satellite imagery dataset. We account for the lack of labeled training data in this domain by leveraging sociological insights to propose a novel approach to generate probabilistic labels for a subset of an unlabeled dataset of satellite imagery, which is then used to train a neural network model to identify eviction filing hotspots. Our experimental results show that WARNER acheives a higher predictive performance than several strong baselines. Further, the superiority of WARNER can be generalized to different counties across the United States. Our proposed framework has the potential to assist NGOs and policymakers in designing well-informed (data-driven) resource allocation plans to improve the nationwide housing stability. This work is conducted in collaboration with The Child Poverty Action Lab (a leading non-profit leveraging data-driven approaches to inform actions for relieving poverty and relevant problems in Dallas County, TX). The code can be accessed via

A Dual Channel Intent Evolution Network for Predicting Period-Aware Travel Intentions at Fliggy

Fliggy of Alibaba group is one of the largest online travel platform (OTPs) in China, which provides travel products and travel experiences for tens of millions of online users by the personalized recommendation system (RS). User's future travel intent prediction is one key problem in travel scenario, which decides where and what to recommend, e.g., traveling to a surrounding city or a distant city. Such travel intent prediction problem has a lot of important applications, e.g., to push a notification with surrounding scenic spots recommendation to a user with intent to travel around, or to enable personalized promotion strategies to users with different intents. Existing studies on user's intent are largely sub-optimal for users' travel intent prediction at OTPs, since they rarely pay attentions to the characteristics of the travel industry, namely, user behavior sparsity due to low frequency of travel, spatial-temporal periodicity patterns, and the correlations between user's online and offline behaviors. In this paper, to address these challenges, we propose a dual channel intent evolution network based online-offline periodicity-aware network, DCIEN, for user's future travel intent prediction. In particular, it consists of two basic components including 1) Spatial-temporal Intent Patterns Network(ST-IPN), which exploits users' periodic intent patterns from offline data based on convolutional neural networks; 2) Periodicity-aware Intent Evolution Network(PA-IEN), which captures user's instant intent from online behaviors data and the interactions between online and offline intents. Extensive offline and online experiments on a real-world OTP demonstrate the superior performance of DCIEN over state-of-the-art methods.

Towards an Awareness of Time Series Anomaly Detection Models' Adversarial Vulnerability

Time series anomaly detection is extensively studied in statistics, economics, and computer science. Over the years, numerous methods have been proposed for time series anomaly detection using deep learning-based methods. Many of these methods demonstrate state-of-the-art performance on benchmark datasets, giving the false impression that these systems are robust and deployable in many practical and industrial real-world scenarios. In this paper, we demonstrate that the performance of state-of-the-art anomaly detection methods is degraded substantially by adding only small adversarial perturbations to the sensor data. We use different scoring metrics such as prediction errors, anomaly, and classification scores over several public and private datasets ranging from aerospace applications, server machines, to cyber-physical systems in power plants. Under well-known adversarial attacks from Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) methods, we demonstrate that state-of-the-art deep neural networks (DNNs) and graph neural networks (GNNs) methods, which claim to be robust against anomalies and have been possibly integrated in real-life systems, have their performance drop to as low as 0%. To the best of our understanding, we demonstrate, for the first time, the vulnerabilities of anomaly detection systems against adversarial attacks. The overarching goal of this research is to raise awareness towards the adversarial vulnerabilities of time series anomaly detectors.

CTRL: Cooperative Traffic Tolling via Reinforcement Learning

People have been working long to tackle the traffic congestion problem. Among the different measures, traffic tolling has been recognized as an effective way to mitigate citywide congestion. However, traditional tolling methods can not deal with the dynamic traffic flow in cities. Meanwhile, thanks to the development of traffic sensing technology, how to set appropriate dynamic tolling according to real time traffic observations has attracted research attention in recent years.

In this paper, we put the dynamic tolling problem in a reinforcement learning setting and try to tackle the three key challenges of complex state representation, pricing action credit assignment, and route price relative competition. We propose a soft actor-critic method with (1) a route-level state attention, (2) an interpretable and provable reward design, and (3) a competition-aware Q attention. Extensive experiments on real datasets have shown the superior performance of our proposed method. In addition, interesting analysis on pricing actions and vehicle routes have demonstrated why the proposed method can outperform baselines.

Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks

With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e.g., e-commerce and news feed sites). To achieve better allocation, the input of recent RL-based ads allocation methods is upgraded from point-wise single item to list-wise item arrangement. However, this also results in a high-dimensional space of state-action pairs, making it difficult to learn list-wise representations with good generalization ability. This further hinders the exploration of RL agents and causes poor sample efficiency. To address this problem, we propose a novel RL-based approach for ads allocation which learns better list-wise representations by leveraging task-specific signals on Meituan food delivery platform. Specifically, we propose three different auxiliary tasks based on reconstruction, prediction, and contrastive learning respectively according to prior domain knowledge on ads allocation. We conduct extensive experiments on Meituan food delivery platform to evaluate the effectiveness of the proposed auxiliary tasks. Both offline and online experimental results show that the proposed method can learn better list-wise representations and achieve higher revenue for the platform compared to the state-of-the-art baselines.

DuARUS: Automatic Geo-object Change Detection with Street-view Imagery for Updating Road Database at Baidu Maps

As the core foundation of web mapping, each geographic object (geo-object), such as a traffic sign, plays a vital role in navigation and intelligent driving. Determining how to obtain the latest high-precision geo-object information is a classic topic in updating road databases. Benefiting from the cost-effective attribute and availability of the positioning equipment and camera, the vision-based update pattern is becoming increasingly popular in the industry. Generally speaking, the road database update mainly includes three phases: geo-object recognition, localization, and change detection. Previous change detection strategies are mainly performed by comparing the historical road information (i.e., geo-object type and position) with the new geographic data of geo-objects collected from the street-view imagery. However, limited by the localization precision of the positioning equipment and the discriminative power of the vanilla differential-based method, the accuracy, recall, and efficiency of previous systems for geo-object change detection are greatly impaired. In addition, the artificially prescribed production standards make the geo-object position in the map data deviate from its position in the real world, as well as some geo-objects do not need to be updated (e.g., temporary speed limit), which further yields many false-positive detections and significantly increases the labor costs of existing systems. To address these challenges, we propose a novel framework called DuARUS for automatic geo-object change detection with street-view imagery. In this paper, we mainly focus on automatic geo-object localization and change detection. Specifically, for geo-object localization, we propose a two-stage, integrated localization algorithm based on image matching and monocular depth estimation. Furthermore, to achieve automatic change detection, vision-based representation learning and scene understanding strategies are introduced to build a large-scale geo-object semantic map, which can provide sufficient multimodal information support for change detection. Based on such artful modeling, we recast the complicated, labor-based change detection problem as a vanilla binary classification task, which is a robust and efficient strategy that contributes to resolving this problem. By combining these operations, we construct an industrial-grade, fully automatic production system for road database updates. Extensive experiments conducted on large-scale, real-world datasets from Baidu Maps demonstrate the superiority and effectiveness of the system. Moreover, this system has already been deployed in production at Baidu Maps since July 2020, handling 96% of automatic road database updates. DuARUS improves the annual update mileage from millions to tens of millions, and it achieves weekly updates.

DuTraffic: Live Traffic Condition Prediction with Trajectory Data and Street Views at Baidu Maps

The task of live traffic condition prediction, which aims at predicting live traffic conditions (i.e., fast, slow, and congested) based on traffic information on roads, plays a vital role in intelligent transportation systems, such as navigation, route planning, and ride-hailing services. Existing solutions have adopted aggregated trajectory data to generate traffic estimates, which inevitably suffer from GPS drift caused by cluttered urban road scenarios. In addition, the trajectory information alone is insufficient to provide evidence for sudden traffic situations and perception of street-wise elements. To alleviate these problems, in this paper, we present DuTraffic, which is a robust and production-ready solution for live traffic condition prediction by taking both trajectory data and street views into account. Specifically, the vision-based detection and segmentation modules are developed to forecast traffic flow by using street views. Then, we propose a spatial-temporal-based module, TRST-Net, to learn the latent trajectory representation. Finally, a bilinear model is introduced to mix these two representations and then predicts live traffic conditions with trajectory data and street views in a mutually complementary manner. The task is recast as a multi-task learning problem, which could benefit from the strong representation of latent space manifold modeling. Extensive experiments conducted on large-scale, real-world datasets from Baidu Maps demonstrate the superiority and effectiveness of DuTraffic. In addition, DuTraffic has already been deployed in production at Baidu Maps since December 2020, handling tens of millions of requests every day. This demonstrates that DuTraffic is a practical and robust industrial solution for live traffic condition prediction.

Temporal and Heterogeneous Graph Neural Network for Financial Time Series Prediction

The price movement prediction of stock market has been a classical yet challenging problem, with the attention of both economists and computer scientists. In recent years, graph neural network has significantly improved the prediction performance by employing deep learning on company relations. However, existing relation graphs are usually constructed by handcraft human labeling or nature language processing, which are suffering from heavy resource requirement and low accuracy. Besides, they cannot effectively response to the dynamic changes in relation graphs. Therefore, in this paper, we propose a temporal and heterogeneous graph neural network-based (THGNN) approach to learn the dynamic relations among price movements in financial time series. In particular, we first generate the company relation graph for each trading day according to their historic price. Then we leverage a transformer encoder to encode the price movement information into temporal representations. Afterward, we propose a heterogeneous graph attention network to jointly optimize the embeddings of the financial time series data by transformer encoder and infer the probability of target movements. Finally, we conduct extensive experiments on the stock market in the United States and China. The results demonstrate the effectiveness and superior performance of our proposed methods compared with state-of-the-art baselines. Moreover, we also deploy the proposed THGNN in a real-world quantitative algorithm trading system, the accumulated portfolio return obtained by our method significantly outperforms other baselines.

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Conventional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all models are directly trained and evaluated on a real-world system from moderate- to large-scale setups. Experimental evaluations show that the independent and "selfish'' load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of the application and deployment of MARL methods for network load balancing are analysed, which helps draw the attention of the learning and network communities to such challenges.

An Actor-critic Reinforcement Learning Model for Optimal Bidding in Online Display Advertising

The real-time bidding (RTB) paradigm allows the advertisers to submit a bid for each impression in online display advertising. A usual demand of the advertisers is to maximize the total value of winning impressions under constraints on some key performance indicators. Unfortunately, the existing RTB research in industrial applications can hardly achieve the optimum due to the stochastic decision scenarios and complex consumer behaviors. In this study, we address the application of RTB to mobile gaming where the in-app purchase action is of high uncertainty, making it challenging to evaluate individual impression opportunities. We first formulate the bidding process into a constrained optimization problem and then propose an actor-critic reinforcement learning (ACRL) model for obtaining the optimal policy under a dynamic decision environment. To avoid feeding too many samples with zero labels to the model, we provide a new way to quantify impression opportunities by integrating the in-app actions, such as conversion and purchase, and the characteristics of the candidate ad inventories. Moreover, the proposed ACRL learns a Gaussian distribution to simulate the audience's decision in a more real bidding scenario by taking additional contextual side information about both media and the audience. We also introduce how to deploy the learned model online to help adjust the final bid. At last, we conduct comprehensive offline experiments to demonstrate the effectiveness of ACRL and carefully set an online A/B testing experiment. The online experimental results verify the efficacy of the proposed ACRL in terms of multiple critical commercial indicators. ACRL has been deployed in the Tencent online display advertising platform and impacts billions of traffic every day. We believe proposed modifications for optimal bidding problems in RTB are practically innovative and can inspire the relative works in this field.

Offline Reinforcement Learning for Mobile Notifications

Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms. They are interesting recommender systems to machine learning practitioners with more sequential and long-term feedback considerations. Most machine learning applications in notification systems are built around response-prediction models, trying to attribute both short-term impact and long-term impact to a notification decision. However, a user's experience depends on a sequence of notifications and attributing impact to a single notification is not always accurate, if not impossible. In this paper, we argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed. We propose an offline reinforcement learning framework to optimize sequential notification decisions for driving user engagement. We describe a state-marginalized importance sampling policy evaluation approach, which can be used to evaluate the policy offline and tune learning hyperparameters. Through simulations that approximate the notifications ecosystem, we demonstrate the performance and benefits of the offline evaluation approach as a part of the reinforcement learning modeling approach. Finally, we collect data through online exploration in the production system, train an offline Double Deep Q-Network and launch a successful policy online. We also discuss the practical considerations and results obtained by deploying these policies for a large-scale recommendation system use-case.

Hierarchical Reinforcement Learning using Gaussian Random Trajectory Generation in Autonomous Furniture Assembly

In this paper, we propose a Gaussian Random Trajectory guided Hierarchical Reinforcement Learning (GRT-HL) method for autonomous furniture assembly. The furniture assembly problem is formulated as a comprehensive human-like long-horizon manipulation task that requires a long-term planning and a sophisticated control. Our proposed model, GRT-HL, draws inspirations from the semi-supervised adversarial autoencoders, and learns latent representations of the position trajectories of the end-effector. The high-level policy generates an optimal trajectory for furniture assembly, considering the structural limitations of the robotic agents. Given the trajectory drawn from the high-level policy, the low-level policy makes a plan and controls the end-effector. We first evaluate the performance of GRT-HL compared to the state-of-the-art reinforcement learning methods in furniture assembly tasks. We demonstrate that GRT-HL successfully solves the long-horizon problem with extremely sparse rewards by generating the trajectory for planning.

Graph-based Weakly Supervised Framework for Semantic Relevance Learning in E-commerce

Product searching is fundamental in online e-commerce systems, it needs to quickly and accurately find the products that users required. Relevance is essential for e-commerce search, which role is avoiding displaying products that do not match search intent and optimizing user experience. Measuring semantic relevance is necessary because distributional biases between search queries and product titles may lead to large lexical differences between relevant textual expressions. Several problems limit the performance of semantic relevance learning, including extremely long-tail product distribution and low-quality labeled data. Recent works attempt to conduct relevance learning through user behaviors. However, noisy user behavior can easily cause inadequately semantic modeling. Therefore, it is valuable but challenging to utilize user behavior in relevance learning. In this paper, we first propose a weakly supervised contrastive learning framework that focuses on how to provide effective semantic supervision and generate reasonable representation. We utilize topology structure information contained in a user behavior heterogeneous graph to design a semantically aware data construction strategy. Besides, we propose a contrastive learning framework suitable for e-commerce scenarios with targeted improvements in data augmentation and training objectives. For relevance calculation, we propose a novel hybrid method that combines fine-tuning and transfer learning. It eliminates the negative impacts caused by distributional bias and guarantees semantic matching capabilities. Extensive experiments and analyses show the promising performance of proposed methods in relevance learning.

QuickSkill: Novice Skill Estimation in Online Multiplayer Games

Matchmaking systems are vital for creating fair matches in online multiplayer games, which directly affects players' satisfactions and game experience. Most of the matchmaking systems largely rely on precise estimation of players' game skills to construct equitable games. However, the skill rating of a novice is usually inaccurate, as current matchmaking rating algorithms require considerable amount of games for learning the true skill of a new player. Using these unreliable skill scores at early stages for matchmaking usually leads to disparities in terms of team performance, which causes negative game experience. This is known as the "cold-start" problem for matchmaking rating algorithms.

To overcome this conundrum, this paper proposes QuickSKill, a deep learning based novice skill estimation framework to quickly probe abilities of new players in online multiplayer games. QuickSKill extracts sequential performance features from initial few games of a player to predict his/her future skill rating with a dedicated neural network, thus delivering accurate skill estimation at the player's early game stage. By employing QuickSKill for matchmaking, game fairness can be dramatically improved in the initial cold-start period. We conduct experiments in a popular mobile multiplayer game in both offline and online scenarios. Results obtained with two real-world anonymized gaming datasets demonstrate that proposed QuickSKill delivers precise estimation of game skills for novices, leading to significantly lower team skill disparities and better player game experience. To the best of our knowledge, proposed QuickSKill is the first framework that tackles the cold-start problem for traditional skill rating algorithms.

SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance

Ad relevance modeling plays a critical role in online advertising systems including Microsoft Bing. To leverage powerful transformers like BERT in this low-latency setting, many existing approaches perform ad-side computations offline. While efficient, these approaches are unable to serve cold start ads, resulting in poor relevance predictions for such ads. This work aims to design a new, low-latency BERT via structured pruning to empower real-time online inference for cold start ads relevance on a CPU platform. Our challenge is that previous methods typically prune all layers of the transformer to a high, uniform sparsity, thereby producing models which cannot achieve satisfactory inference speed with an acceptable accuracy.

In this paper, we propose SwiftPruner - an efficient framework that leverages evolution-based search to automatically find the best-performing layer-wise sparse BERT model under the desired latency constraint. Different from existing evolution algorithms that conduct random mutations, we propose a reinforced mutator with a latency-aware multi-objective reward to conduct better mutations for efficiently searching the large space of layer-wise sparse models. Extensive experiments demonstrate that our method consistently achieves higher ROC AUC and lower latency than the uniform sparse baseline and state-of-the-art search methods. Remarkably, under our latency requirement of 1900us on CPU, SwiftPruner achieves a 0.86% higher AUC than the state-of-the-art uniform sparse baseline for BERT-Mini on a large scale real-world dataset. Online A/B testing shows that our model also achieves a significant 11.7% cut in the ratio of defective cold start ads with satisfactory real-time serving latency.

Measuring Friendship Closeness: A Perspective of Social Identity Theory

Measuring the closeness of friendships is an important problem that finds numerous applications in practice. For example, online gaming platforms often host friendship-enhancing events in which a user (called the source) only invites his/her friend (called the target) to play together. In this scenario, the measure of friendship closeness is the backbone for understanding source invitation and target adoption behaviors, and underpins the recommendation of promising targets for the sources. However, most existing measures for friendship closeness only consider the information between the source and target but ignore the information of groups where they are located, which renders inferior results. To address this issue, we present new measures for friendship closeness based on the social identity theory (SIT), which describes the inclination that a target endorses behaviors of users inside the same group. The core of SIT is the process that a target assesses groups of users as them or us. Unfortunately, this process is difficult to be captured due to perceptual factors. To this end, we seamlessly reify the factors of SIT into quantitative measures, which consider local and global information of a target's group. We conduct extensive experiments to evaluate the effectiveness of our proposal against 8 state-of-the-art methods on 3 online gaming datasets. In particular, we demonstrate that our solution can outperform the best competitor on the behavior prediction (resp. online target recommendation) by up to 23.2% (resp. 34.2%) in the corresponding evaluation metric.

Scenario-Adaptive and Self-Supervised Model for Multi-Scenario Personalized Recommendation

Multi-scenario recommendation is dedicated to retrieve relevant items for users in multiple scenarios, which is ubiquitous in industrial recommendation systems. These scenarios enjoy portions of overlaps in users and items, while the distribution of different scenarios is different. The key point of multi-scenario modeling is to efficiently maximize the use of whole-scenario information and granularly generate adaptive representations both for users and items among multiple scenarios. we summarize three practical challenges which are not well solved for multi-scenario modeling: (1) Lacking of fine-grained and decoupled information transfer controls among multiple scenarios. (2) Insufficient exploitation of entire space samples. (3) Item's multi-scenario representation disentanglement problem. In this paper, we propose a Scenario-Adaptive and Self-Supervised (SASS) model to solve the three challenges mentioned above. Specifically, we design a Multi-Layer Scenario Adaptive Transfer (ML-SAT) module with scenario-adaptive gate units to select and fuse effective transfer information from whole scenario to individual scenario in a quite fine-grained and decoupled way. To sufficiently exploit the power of entire space samples, a two-stage training process including pre-training and fine-tune is introduced. The pre-training stage is based on a scenario-supervised contrastive learning task with the training samples drawn from labeled and unlabeled data spaces. The model is created symmetrically both in user side and item side, so that we can get distinguishing representations of items in different scenarios. Extensive experimental results on public and industrial datasets demonstrate the superiority of the SASS model over state-of-the-art methods. This model also achieves more than 8.0% improvement on Average Watching Time Per User in online A/B tests. SASS has been successfully deployed on multi-scenario short video recommendation platform of Taobao in Alibaba.

KEEP: An Industrial Pre-Training Framework for Online Recommendation via Knowledge Extraction and Plugging

An industrial recommender system generally presents a hybrid list that contains results from multiple subsystems. In practice, each subsystem is optimized with its own feedback data to avoid the disturbance among different subsystems. However, we argue that such data usage may lead to sub-optimal online performance because of thedata sparsity. To alleviate this issue, we propose to extract knowledge from thesuper-domain that contains web-scale and long-time impression data, and further assist the online recommendation task (downstream task). To this end, we propose a novel industrial KnowlEdge Extraction and Plugging (KEEP) framework, which is a two-stage framework that consists of 1) a supervised pre-training knowledge extraction module on super-domain, and 2) a plug-in network that incorporates the extracted knowledge into the downstream model. This makes it friendly for incremental training of online recommendation. Moreover, we design an efficient empirical approach for KEEP and introduce our hands-on experience during the implementation of KEEP in a large-scale industrial system. Experiments conducted on two real-world datasets demonstrate that KEEP can achieve promising results. It is notable that KEEP has also been deployed on the display advertising system in Alibaba, bringing a lift of +5.4% CTR and +4.7% RPM.

Network Report: A Structured Description for Network Datasets

The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others provide some contexts or basic statistics. As a result, critical information may be unintentionally dropped, and network dataset consumers may misunderstand or overlook critical aspects. Inappropriately using a network dataset can lead to severe consequences (e.g., discrimination) especially when machine learning models on networks are deployed in high-stake domains. Challenges arise as networks are often used across different domains (e.g., network science, physics, etc) and have complex structures. To facilitate the communication between network dataset providers and consumers, we propose network report. A network report is a structured description that summarizes and contextualizes a network dataset. Network report extends the idea of dataset reports (e.g., Datasheets for Datasets) from prior work with network-specific descriptions of the non-i.i.d. nature, demographic information, network characteristics, etc. We hope network reports encourage transparency and accountability in network research and development across different fields.

Towards Edge-Cloud Collaborative Machine Learning: A Quality-aware Task Partition Framework

Edge-cloud collaborative tasks with real-world services emerge in recent years and attract worldwide attention. Unfortunately, state-of-the-art edge-cloud collaborative machine-learning services are still not that reliable due to the data heterogeneity on the edge, where we usually have access to a mixed-up training set, which is intrinsically collected from various distributions of underlying tasks. Finding such hidden tasks that need to be revealed from given datasets is called the Task Partition problem. Manual task partition is usually expensive, unscalable, and biased. Accordingly, we propose Quality-aware Task Partition (QTP) problem, in which final tasks are partitioned by the performance of task models. To the best of our knowledge, this work is the first one to study the QTP problem with an emphasis on task quality. We also implement a public service, HiLens on Huawei Cloud, to support the whole process. We develop a polynomial-time algorithm namely the Task-Forest algorithm (TForest). TForest shows its superiority based on a case study with 57 real-world cameras. Compared with STOA baselines, TForest has on average 9.2% higher F1-scores and requires 43.1% fewer samples when deploying new cameras. Partial code of the framework has been adopted and released to KubeEdge-Sedna.

A Practical Distributed ADMM Solver for Billion-Scale Generalized Assignment Problems

Assigning items to owners is a common problem found in various real-world applications, for example, audience-channel matching in marketing campaigns, borrower-lender matching in loan management, and shopper-merchant matching in e-commerce. Given an objective and multiple constraints, an assignment problem can be formulated as a constrained optimization problem. Such assignment problems are usually NP-hard [21], so when the number of items or the number of owners is large, solving for exact solutions becomes challenging. In this paper, we are interested in solving constrained assignment problems with hundreds of millions of items. Thus, with just tens of owners, the number of decision variables is at billion-scale. This scale is usually seen in the internet industry, which makes decisions for large groups of users. We relax the possible integer constraint, and formulate a general optimization problem that covers commonly seen assignment problems. Its objective function is convex. Its constraints are either linear, or convex and separable by items. We study to solve our generalized assignment problems in the Bregman Alternating Direction Method of Multipliers (BADMM) framework where we exploit Bregman divergence to transform the Augmented Lagrangian into a separable form, and solve many subproblems in parallel. The entire solution can thus be implemented using a MapReduce-style distributed computation framework. We present experiment results on both synthetic and real-world datasets to verify its accuracy and scalability.

SASNet: Stage-aware Sequential Matching for Online Travel Recommendation

Sequential matching, which aims to predict the item a user will next interact with in the sequential context of the user's historical behaviors, is widely adopted in recommender systems. Existing works mainly characterize the sequential context as the dependencies of user interactions, which is less effective for online travel recommendation where users' behaviors are highly correlated with theirstages in the travel life cycle. Specifically, users on an online travel platform (OTP) usually go through different stages (e.g., exploring a destination, planning an itinerary), and make several correlated interactions (e.g., booking a flight, reserving a hotel, renting a car) at each stage. In this paper, we propose to capture the deep sequential context by modeling the evolving of user stages, and develop a novel stage-aware deep sequential matching network (SASNet) that incorporates inter-stage and intra-stage dependencies over stage-augmented interaction sequence for more accurate and interpretable recommendation. Extensive experiments on real-world datasets validate the superiority of our model for both online travel recommendation and general next-item recommendation. Our model has been successfully deployed at Fliggy, one of the most popular OTPs in China, and shows good performance in serving online traffic.

Breast Cancer Early Detection with Time Series Classification

Breast cancer has become the leading cause of women cancer death worldwide. Despite the consensus that breast cancer early detection can significantly reduce treatment difficulty and cancer mortality, people still are reluctant to go to hospital for regular checkups due to the high costs incurred. A timely, private, affordable, and effective household breast cancer early detection solution is badly needed. In this paper, we propose a household solution that utilizes pairs of sensors embedded in the bra to measure the thermal and moisture time series data (BTMTSD) of the breast surface and conduct time series classification (TSC) to diagnose breast cancer. Three main challenges are encountered when doing BTMTSD classification, (1) small supervised dataset, which is a common limitation of medical research, (2) noisy time series with unique noise patterns, and (3) complex interplay patterns across multiple time series dimensions. To mitigate these problems, we incorporate multiple data augmentation and transformation techniques with various deep learning TSC approaches and compare their performances for the BTMTSD classification task. Experimental results validate the effectiveness of our framework in providing reliable breast cancer early detection.

Cross-Domain Product Search with Knowledge Graph

The notion personalization lies on the core of a real-world product search system, whose aim is to understand the user's search intent in a fine-grained level. The existing solutions mainly achieve this purpose through a coarse-grained semantic matching in terms of the query and item's description or the collective click correlations. Besides the issued query, the historical search behaviors of a user would cover lots of her personalized interests, which is a promising avenue to alleviate the semantic gap between users, items and queries. However, as to a specific domain, a user's search behaviors are generally sparse or even unavailable (i.e., cold-start users). How to exploit the search behaviors from the other relevant domain and enable effective fine-grained intent understanding remains largely unexplored for product search. Moreover, the semantic gap could be further aggravated since the properties of an item could evolve over time (e.g., the price adjustment for a mobile phone or the business plan update for a financial item), which is also mainly overlooked by the existing solutions.

To this end, we are interested in bridging the semantic gap via a marriage between cross-domain transfer learning and knowledge graph. Specifically, we propose a simple yet effective knowledge graph based information propagation framework for cross-domain product search (named KIPS). In KIPS, we firstly utilize a shared knowledge graph relevant to both source and target domains as a semantic backbone to facilitate the information propagation across domains. Then, we build individual collaborative knowledge graphs to model both long-term interests/characteristics and short-term interests/characteristics of a user/item respectively. In order to harness cross-domain interest correlations, two unsupervised strategies to guide the interest learning and alignment are introduced: maximum mean discrepancy (MMD) and kg-aware contrastive learning. In detail, the MMD is utilized to support a coarse-grained domain alignment over the user's long-term interests across two domains. Then, the kg-aware contrastive learning process conducts a fine-grained interest alignment based on the shared knowledge graph. Experiments over two real-world large-scale datasets demonstrate the effectiveness of KIPS over a series of strong baselines. Our online A/B test also shows substantial performance gain on multiple metrics. Currently, KIPS has been deployed in AliPay for financial product search. Both the code implementation and the two datasets used for evaluation will be released online publicly.

Approximated Doubly Robust Search Relevance Estimation

Extracting query-document relevance from the sparse, biased clickthrough log is among the most fundamental tasks in the web search system. Prior art mainly learns a relevance judgment model with semantic features of the query and document and ignores directly counterfactual relevance evaluation from the clicking log. Though the learned semantic matching models can provide relevance signals for tail queries as long as the semantic feature is available. However, such a paradigm lacks the capability to introspectively adjust the biased relevance estimation whenever it conflicts with massive implicit user feedback. The counterfactual evaluation methods, on the contrary, ensure unbiased relevance estimation with sufficient click information. However, they suffer from the sparse or even missing clicks caused by the long-tailed query distribution.

In this paper, we propose to unify the counterfactual evaluating and learning approaches for unbiased relevance estimation on search queries with various popularities. Specifically, we theoretically develop a doubly robust estimator with low bias and variance, which intentionally combines the benefits of existing relevance evaluating and learning approaches. We further instantiate the proposed unbiased relevance estimation framework in Baidu search, with comprehensive practical solutions designed regarding the data pipeline for click behavior tracking and online relevance estimation with an approximated deep neural network. Finally, we present extensive empirical evaluations to verify the effectiveness of our proposed framework, finding that it is robust in practice and manages to improve online ranking performance substantially.

SESSION: CIKM'22 Short Papers

Scaling Up Mass-Based Clustering

This paper addresses the problem of scaling up the mass-based clustering paradigm to handle large datasets. The existing algorithm MBScan computes and stores all pairwise distances, resulting in quadratic time and space complexity. However, we observe that mass-based clustering requires information about only a tiny fraction of all possible data point pairs. We propose three optimizations to MBScan for quickly finding such pairs and computing their distances. We empirically evaluate our work on ten real-world and synthetic datasets. Our experiments show that our approach results in fast and memory-efficient clustering with no loss in the quality of clusters.

Probing the Robustness of Pre-trained Language Models for Entity Matching

The paradigm of fine-tuning Pre-trained Language Models (PLMs) has been successful in Entity Matching (EM). Despite their remarkable performance, PLMs exhibit tendency to learn spurious correlations from training data. In this work, we aim at investigating whether PLM-based entity matching models can be trusted in real-world applications where data distribution is different from that of training. To this end, we design an evaluation benchmark to assess the robustness of EM models to facilitate their deployment in the real-world settings. Our assessments reveal that data imbalance in the training data is a key problem for robustness. We also find that data augmentation alone is not sufficient to make a model robust. As a remedy, we prescribe simple modifications that can improve the robustness of PLM-based EM models. Our experiments show that while yielding superior results for in-domain generalization, our proposed model significantly improves the model robustness, compared to state-of-the-art EM models.

SERF: Interpretable Sleep Staging using Embeddings, Rules, and Features

The accuracy of recent deep learning based clinical decision support systems is promising. However, lack of model interpretability remains an obstacle to widespread adoption of artificial intelligence in healthcare. Using sleep as a case study, we propose a generalizable method to combine clinical interpretability with high accuracy derived from black-box deep learning.

Clinician-determined sleep stages from polysomnogram (PSG) remain the gold standard for evaluating sleep quality. However, PSG manual annotation by experts is expensive and time-prohibitive. We propose SERF, interpretable Sleep staging using Embeddings, Rules, and Features to read PSG. SERF provides interpretation of classified sleep stages through meaningful features derived from the AASM Manual for the Scoring of Sleep and Associated Events.

In SERF, the embeddings obtained from a hybrid of convolutional and recurrent neural networks are transposed to the interpretable feature space. These representative interpretable features are used to train simple models like a shallow decision tree for classification. Model results are validated on two publicly available datasets. SERF surpasses the current state-of-the-art for interpretable sleep staging by 2%. Using Gradient Boosted Trees as the classifier, SERF obtains 0.766 κ and 0.870 AUC-ROC, within 2% of the current state-of-the-art black-box models.

Improving Imitation Learning by Merging Experts Trajectories

This paper proposes an original approach based on expert trajectories combination and Deep Reinforcement Learning to provide a better MineCraft player. The combination is based on the idea that the problem is naturally decomposable and the search space presents large plateaus. We use two steps approach to build a better trajectory from all existed expert trajectories and consequently to extract an optimal policy. The first step uses Birch clustering approach and images cosine similarity to obtain compact representation and substantial state and action space reduction. To reduce the overall complexity, the image distances are computed in images latent space trained by an encoder-decoder model. In the second step, we first eliminate plateaus to keep only the nodes with non-zero rewards then we compare trajectories using the Bellman equation and an appropriate value function. By checking the incremental compatibility of the trajectory of compact representations, we build the solution combining the best compatible sub-trajectories of the experts. The experimental results on NeurIPS MineRL 2020 challenge show that training the actors model on the most rewarding extracted subset of trajectories leads to achieve state-of-the-art performances on the MineCraft environment. The paper's source code is available here:

TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

Robust test collections are crucial for Information Retrieval research. Recently there is a growing interest in evaluating retrieval systems for domain-specific retrieval tasks, however these tasks often lack a reliable test collection with human-annotated relevance assessments following the Cranfield paradigm. In the medical domain, the TripClick collection was recently proposed, which contains click log data from the Trip search engine and includes two click-based test sets. However the clicks are biased to the retrieval model used, which remains unknown, and a previous study shows that the test sets have a low judgement coverage for the Top-10 results of lexical and neural retrieval models. In this paper we present the novel, relevance judgement test collection TripJudge for TripClick health retrieval. We collect relevance judgements in an annotation campaign and ensure the quality and reusability of TripJudge by a variety of ranking methods for pool creation, by multiple judgements per query-document pair and by an at least moderate inter-annotator agreement. We compare system evaluation with TripJudge and TripClick and find that that click and judgement-based evaluation can lead to substantially different system rankings.

Interpretability of BERT Latent Space through Knowledge Graphs

The advent of pretrained language have renovated the ways of handling natural languages, improving the quality of systems that rely on them. BERT played a crucial role in revolutionizing the Natural Language Processing (NLP) area. However, the deep learning framework it implements lacks interpretability. Thus, recent research efforts aimed to explain what BERT learns from the text sources exploited to pre-train its linguistic model. In this paper, we analyze the latent vector space resulting from the BERT context-aware word embeddings. We focus on assessing whether regions of the BERT vector space hold an explicit meaning attributable to a Knowledge Graph (KG). First, we prove the existence of explicitly meaningful areas through the Link Prediction (LP) task. Then, we demonstrate these regions being linked to explicit ontology concepts of a KG by learning classification patterns. To the best of our knowledge, this is the first attempt at interpreting the BERT learned linguistic knowledge through a KG relying on its pretrained context-aware word embeddings.

Unsupervised Question Clarity Prediction through Retrieved Item Coherency

Despite recent progress on conversational systems, they still do not perform smoothly when faced with ambiguous requests. When questions are unclear, conversational systems should have the ability to ask clarifying questions, rather than assuming a particular interpretation or simply responding that they do not understand. While the research community has paid substantial attention to the problem of predicting query ambiguity in traditional search contexts, researchers have paid relatively little attention to predicting when this ambiguity is sufficient to warrant clarification in the context of conversational systems. In this paper, we propose an unsupervised method for predicting the need for clarification. This method is based on the measured coherency of results from an initial answer retrieval step, under the assumption that a less ambiguous query is more likely to retrieve more coherent results when compared to an ambiguous query. We build a graph from retrieved items based on their context similarity, treating measures of graph connectivity as indicators of ambiguity. We evaluate our approach on two open-domain conversational question answering datasets, ClariQ and AmbigNQ, comparing it with neural and non-neural baselines. Our unsupervised approach performs as well as supervised approaches while providing better generalization.

IEEE13-AdvAttack A Novel Dataset for Benchmarking the Power of Adversarial Attacks against Fault Prediction Systems in Smart Electrical Grid

Due to their economic and significant importance, fault detection tasks in intelligent electrical grids are vital. Although numerous smart grid (SG) applications, such as fault detection and load forecasting, have adopted data-driven approaches, the robustness and security of these data-driven algorithms have not been widely examined. One of the greatest obstacles in the research of the security of smart grids is the lack of publicly accessible datasets that permit testing the system's resilience against various types of assault. In this paper, we present IEEE13-AdvAttack, a large-scaled simulated dataset based on the IEEE-13 test node feeder suitable for supervised tasks under SG. The dataset includes both conventional and renewable energy resources. We examine the robustness of fault type classification and fault zone classification systems to adversarial attacks. Through the release of datasets, benchmarking, and assessment of smart grid failure prediction systems against adversarial assaults, we seek to encourage the implementation of machine-learned security models in the context of smart grids. The benchmarking data and code for fault prediction are made publicly available on

A Multi-Domain Benchmark for Personalized Search Evaluation

Personalization in Information Retrieval has been a hot topic in both academia and industry for the past two decades. However, there is still a lack of high-quality standard benchmark datasets for conducting offline comparative evaluations in this context. To mitigate this problem, in the past few years, approaches to derive synthetic datasets suited for evaluating Personalized Search models have been proposed. In this paper, we put forward a novel evaluation benchmark for Personalized Search with more than 18 million documents and 1.9 million queries across four domains. We present a detailed description of the benchmark construction procedure, highlighting its characteristics and challenges. We provide baseline performance including pre-trained neural models, opening room for the evaluation of personalized approaches, as well as domain adaptation and transfer learning scenarios. We make both datasets and models available for future research.

CS-MLGCN: Multiplex Graph Convolutional Networks for Community Search in Multiplex Networks

Community Search (CS) is one of the fundamental tasks in network science and has attracted much attention due to its ability to discover personalized communities with a wide range of applications. Given any query nodes, CS seeks to find a densely connected subgraph containing query nodes. Most existing approaches usually study networks with a single type of proximity between nodes, which defines a single view of a network. However, in many applications such as biological, social, and transportation networks, interactions between objects span multiple aspects, yielding networks with multiple views, called multiplex networks. Existing CS approaches in multiplex networks adopt pre-defined subgraph patterns to model the communities, which cannot find communities that do not have such pre-defined patterns in real-world networks. In this paper, we propose a query-driven graph convolutional network in multiplex networks, CS-MLGCN, that can capture flexible community structures by learning from the ground-truth communities in a data-driven fashion. CS-MLGCN first combines the local query-dependent structure and global graph embedding in each type of proximity and then uses an attention mechanism to incorporate information on different types of relations. Experiments on real-world graphs with ground-truth communities validate the quality of the solutions we obtain and the efficiency of our model.

A Mask-based Output Layer for Multi-level Hierarchical Classification

This paper proposes a novel mask-based output layer for multi-level hierarchical classification, addressing the limitations of existing methods which (i) often do not embed the taxonomy structure being used, (ii) use a complex backbone neural network with n disjoint output layers that do not constraint each other, (iii) may output predictions that are often inconsistent with the taxonomy in place, and (iv) have often a fixed value of n. Specifically, we propose a model agnostic output layer that embeds the taxonomy and that can be combined with any model. Our proposed output layer implements a top-down divide-and-conquer strategy through a masking mechanism to enforce that predictions comply with the embedded hierarchy structure. Focusing on image classification, we evaluate the performance of our proposed output layer on three different datasets, each with a three-level hierarchical structure. Experiments on these datasets show that our proposed mask-based output layer allows to improve several multi-level hierarchical classification models using various performance metrics.

Marine-tree: A Large-scale Marine Organisms Dataset for Hierarchical Image Classification

This paper presents Marine-tree, a large-scale hierarchical annotated dataset for marine organism classification. Marine-tree contains more than 160k annotated images divided into 60 classes organised in a hierarchy-tree structure using an adapted CATAMI (Collaborative and Automated Tools for the Analysis of Marine Imagery and video) classification scheme. Images were meticulously collected by scuba divers using the RLS (Reef Life Survey) methodology and later annotated by experts in the field. We also propose a hierarchical loss function that can be applied to any multi-level hierarchical classification model, which takes into account the parent-child relationship between predictions and uses it to penalize inconsistent predictions. Experimental results demonstrate thatMarine-tree and the proposed hierarchical loss function are a good contribution for both research in underwater imagery and hierarchical classification.

Deep Ordinal Neural Network for Length of Stay Estimation in the Intensive Care Units

Length of Stay (LoS) estimation is important for efficient healthcare resource management. Since the distribution of LoS is highly skewed, some previous works frame the LoS estimation as a multi-class classification problem by dividing the range of LoS into buckets. However, they ignore the ordinal relationship between labels. The distribution of bucketed LoS, with a heavy head and a heavy tail, is still imbalanced since the long tail is grouped into the last bucket. This paper proposes a Deep Ordinal neural network for Length of stay Estimation in the intensive care units (DOSE). DOSE can exploit the ordinal relationship and mitigate the skewness. The ordinal classification problem is decomposed into a series of binary classification sub-problems by using multiple binary classifiers. To maintain consistency among binary classifiers, the monotonicity constraint penalty is proposed. The number of samples whose labels are higher or lower than a given threshold is at the same level due to the heavy head and tail of the distribution. Therefore, the training data of each binary classifier are balanced. Experiments are conducted on the real-world healthcare dataset. DOSE outperforms all baseline methods in all metrics. The distribution of the prediction of DOSE is more aligned with the ground truth.

Predicting Guiding Entities for Entity Aspect Linking

Entity linking can disambiguate mentions of an entity in text. However, there are many different aspects of an entity that could be discussed but are not differentiable by entity links, for example, the entity "oyster'' in the context of "food'' or "ecosystems''. Entity aspect linking provides such fine-grained explicit semantics for entity links by identifying the most relevant aspect of an entity in the given context. We propose a novel entity aspect linking approach that outperforms several neural and non-neural baselines on a large-scale entity aspect linking test collection. Our approach uses a supervised neural entity ranking system to predict relevant entities for the context. These entities are then used to guide the system to the correct aspect.

DialogID: A Dialogic Instruction Dataset for Improving Teaching Effectiveness in Online Environments

Online dialogic instructions are a set of pedagogical instructions used in real-world online educational contexts to motivate students, help understand learning materials, and build effective study habits. In spite of the popularity and advantages of online learning, the education technology and educational data mining communities still suffer from the lack of large-scale, high-quality, and well-annotated teaching instruction datasets to study computational approaches to automatically detect online dialogic instructions and further improve the online teaching effectiveness. Therefore, in this paper, we present a dataset of online dialogic instruction detection, DialogID, which contains 30,431 effective dialogic instructions. These teaching instructions are well annotated into 8 categories. Furthermore, we utilize the prevalent pre-trained language models (PLMs) and propose a simple yet effective adversarial training learning paradigm to improve the quality and generalization of dialogic instruction detection. Extensive experiments demonstrate that our approach outperforms a wide range of baseline methods. The data and our code are available for research purposes from:

Discriminative Language Model via Self-Teaching for Dense Retrieval

Dense retrieval (DR) has shown promising results in many information retrieval (IR) related tasks, whose foundation is high-quality text representations for effective search. Taking the pre-trained language models (PLMs) as the text encoders has become a popular choice in DR. However, the learned representations based on these PLMs often lose the discriminative power, and thus hurt the recall performance, particularly as PLMs consider too much content of the input texts. Therefore, in this work, we propose to pre-train a discriminative language representation model, called DiscBERT, for DR. The key idea is that a good text representation should be able to automatically keep those discriminative features that could well distinguish different texts from each other in the semantic space. Specifically, inspired by knowledge distillation, we employ a simple yet effective training method, called self-teaching, to distill the model's knowledge constructed when training on the sampled representative tokens of a text sequence into the model's knowledge for the entire text sequence. By further fine-tuning on publicly available retrieval benchmark datasets, DiscBERT can outperform the state-of-the-art retrieval methods.

Knowledge Tracing Model with Learning and Forgetting Behavior

The Knowledge Tracing (KT) task aims to trace the changes of students' knowledge state in real time according to students' historical learning behavior, and predict students' future learning performance. The modern KT models have two problems. One is that these KT models can't reflect students' actual knowledge level. Most KT models only judge students' knowledge state based on their performance in exercises, and poor performance will lead to a decline in knowledge state. However, the essence of students' learning process is the process of acquiring knowledge, which is also a manifestation of learning behavior. Even if they answer the exercises incorrectly, they will still gain knowledge. The other problem is that many KT models don't pay enough attention to the impact of students' forgetting behavior on the knowledge state in the learning process. In fact, learning and forgetting behavior run through students' learning process, and their effects on students' knowledge state shouldn't be ignored. In this paper, based on educational psychology theory, we propose a knowledge tracing model with learning and forgetting behavior (LFBKT). LFBKT comprehensively considers the factors that affect learning and forgetting behavior to build the knowledge acquisition layer, knowledge absorption layer and knowledge forgetting layer. In addition, LFBKT introduces difficulty information to enrich the information of the exercise itself, while taking into account other answering performances besides the answer. Experimental results on two public datasets show that LFBKT can better trace students' knowledge state and outperforms existing models in terms of ACC and AUC.

An Empirical Cross Domain-Specific Entity Recognition with Domain Vector

Recognizing terminology entities across domains from professional texts is an important but challenging task in NLP. Most existing methods focus on recognizing generic entities, but few methods are to recognize the domain-specific entities across domains due to the very large discrepancy of entity representations between the source and target domains. To address this issue, we introduce domain vectors and context vectors to represent domain-specific semantics of entities and domain-irrelevant semantics of the context words, respectively. Based on the two types of vectors, we present a simple yet effective novel cross-domain named entity recognition approach, which aligns entity distributions between domains and separates entity distributions from context distributions for easily identifying entities. Experimental results demonstrate that the proposed approach can obtain significant improvement compared to existing cross-domain NER methods.

Trusted Media Challenge Dataset and User Study

The emergence of fake media that can be easily created by technology has the potential to generate potent misinformation causing harm to both society and individuals. To tackle the issue, we have organized the Trusted Media Challenge (TMC) to explore how Artificial Intelligence (AI) technologies could be leveraged to combat fake media. To enable further research, we are releasing the dataset from the TMC, consists of 4,380 fake and 2,563 real videos, with various video and audio manipulation methods employed to produce different types of fake media. We have also carried out a user study to demonstrate the quality of the TMC dataset and to compare the performance of humans and AI models. The results show that the TMC dataset can fool human participants in many cases. The TMC dataset is available for research purposes upon request via

Scalable Graph Representation Learning via Locality-Sensitive Hashing

A massive amount of research on graph representation learning has been carried out to learn dense features as graph embedding for information networks, thereby capturing the semantics in complex networks and benefiting a variety of downstream tasks. Most of the existing studies focus on structural properties, such as distances and neighborhood proximity between nodes. However, real-world information networks are dominated by the low-degree nodes because they are not only sparse but also subject to the Power law form. Due to the sparsity, proximity-based methods are incapable of deriving satisfactory representations for these tail nodes. To address this challenge, we propose a novel approach, Content-Preserving Locality-Sensitive Hashing~(CP-LSH), by incorporating the content information for representation learning. Specifically, we aim at preserving LSH-based content similarity between nodes to leverage the knowledge from popular nodes to long-tail nodes. We also propose a novel hashing trick to reduce the redundant space consumption so that CP-LSH is capable of tackling industry-scale data. Extensive offline experiments have been conducted on three large-scale public datasets. We also deploy CP-LSH to real-world recommendation systems in one of the largest e-commerce platforms for online experiments. Experimental results demonstrate that CP-LSH outperforms competitive baseline methods in node classification and link prediction tasks. Besides, the results of online experiments also indicate that CP-LSH is practical and robust for real-world production systems.

CFS-MTL: A Causal Feature Selection Mechanism for Multi-task Learning via Pseudo-intervention

Multi-task learning (MTL) has been successfully applied to a wide range of real-world applications. However, MTL models often suffer from performance degradation with negative transfer due to sharing all features without distinguishing their helpfulness for all tasks. To this end, many works on feature selection for multi-task learning (FS-MTL) have been proposed to alleviate negative transfer between tasks by learning features selectively for each specific task. However, due to latent confounders between features and task targets, the correlations captured by the feature selection modules proposed in these works may fail to reflect the actual effect of the features on the targets. This paper explains negative transfer in FS-MTL from a causal perspective and presents a novel architecture called Causal Feature Selection for Multi-task Learning(CFS-MTL). This method incorporates the idea of causal inference into feature selection for multi-task learning via pseudo-intervention. It aims to select features with more stable causal effects rather than spurious correlations for each task by regularizing the distance between feature ITEs and feature importance. We conduct extensive experiments based on three real-world datasets to demonstrate that our proposed CFS-MTL outperforms state-of-the-art MTL models significantly in the AUC metric.

Dynamic Explicit Embedding Representation for Numerical Features in Deep CTR Prediction

Click-Through Rate (CTR) prediction is a key problem in web search, recommendation systems, and online advertising display. Deep CTR models have achieved good performance due to adoption of the feature embedding and interaction. However, most research has focused on learning better feature interactions, with little attention to embedding representation. In this work, we propose a Dynamic Explicit Embedding Representation (DEER) for numerical features in deep CTR prediction, which can provide explicit and dynamic embedding representation for numerical features. The DEER framework is able to discretize numerical features automatically and dynamically, which can overcome the discontinuity problem in the representation of numeric information. Our methods are tested on two public datasets, and the experimental results show DEER can be applied to various deep CTR models, which also improve the performance effectively.

A Dataset for Burned Area Delineation and Severity Estimation from Satellite Imagery

The ability to correctly identify areas damaged by forest wildfires is essential to plan and monitor the restoration process and estimate the environmental damages after such catastrophic events. The wide availability of satellite data, combined with the recent development of machine learning and deep learning methodologies applied to the computer vision field, makes it extremely interesting to apply the aforementioned techniques to the field of automatic burned area detection. One of the main issues in such a context is the limited amount of labeled data, especially in the context of semantic segmentation. In this paper, we introduce a publicly available dataset for the burned area detection problem for semantic segmentation. The dataset contains 73 satellite images of different forests damaged by wildfires across Europe with a resolution of up to 10m per pixel. Data were collected from the Sentinel-2 L2A satellite mission and the target labels were generated from the Copernicus Emergency Management Service (EMS) annotations, with five different severity levels, ranging from undamaged to completely destroyed. Finally, we report the benchmark values obtained by applying a Convolutional Neural Network on the proposed dataset to address the burned area identification problem.

On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs

Graph neural networks (GNNs) have been widely used in various graph-related problems such as node classification and graph classification, where the superior performance is mainly established when natural node features are available. However, it is not well understood how GNNs work without natural node features, especially regarding the various ways to construct artificial ones. In this paper, we point out the two types of artificial node features, i.e., positional and structural node features, and provide insights on why each of them is more appropriate for certain tasks, i.e., positional node classification, structural node classification, and graph classification. Extensive experimental results on 10 benchmark datasets validate our insights, thus leading to a practical guideline on the choices between different artificial node features for GNNs on non-attributed graphs. The code is available at

LCD: Adaptive Label Correction for Denoising Music Recommendation

Music recommendation is usually modeled as a Click-Through Rate (CTR) prediction problem, which estimates the probability of a user listening a recommended song. CTR prediction can be formulated as a binary classification problem where the played songs are labeled as positive samples and the skipped songs are labeled as negative samples. However, such naively defined labels are noisy and biased in practice, causing inaccurate model predictions. In this work, we first identify serious label noise issues in an industrial music App, and then propose an adaptive <u>L</u>abel <u>C</u>orrection method for <u>D</u>enoising (LCD) music recommendation by ensembling the noisy labels and the model outputs to encourage a consensus prediction. Extensive offline experiments are conducted to evaluate the effectiveness of LCD on both industrial and public datasets. Furthermore, in a one-week online AB test, LCD also significantly increases both the music play count and time per user by 1% to 5%.

Effective Neural Team Formation via Negative Samples

Forming teams of experts who collectively hold a set of required skills and can successfully cooperate is challenging due to the vast pool of feasible candidates with diverse backgrounds, skills, and personalities. Neural models have been proposed to address scalability while maintaining efficacy by learning the distributions of experts and skills from successful teams in the past in order to recommend future teams. However, such models are prone to overfitting when training data suffers from a long-tailed distribution, i.e., few experts have most of the successful collaborations, and the majority has participated sparingly. In this paper, we present an optimization objective that leverages both successful and virtually unsuccessful teams to overcome the long-tailed distribution problem. We propose three negative sampling heuristics that can be seamlessly employed during the training of neural models. We study the synergistic effects of negative samples on the performance of neural models compared to lack thereof on two large-scale benchmark datasets of computer science publications and movies, respectively. Our experiments show that neural models that take unsuccessful teams (negative samples) into account are more efficient and effective in training and inference, respectively.

OpeNTF: A Benchmark Library for Neural Team Formation

We contribute OpeNTF, an open-source python-based benchmark library to support neural team formation research. Team formation falls under social information retrieval (Social IR), where the right group of experts should be retrieved to solve a task, which is intractable due to the vast pool of feasible candidates with diverse skills. Even though neural networks could successfully address efficiency while maintaining efficacy, they lack standard implementation and experimental details, which calls for excessive efforts in repeating or reproducing the results in new domains. OpeNTF provides a standard and reproducible platform for neural team formation. It incorporates a host of canonical neural models along with three large-scale training datasets from varying domains. Leveraging an object-oriented structure, OpeNTF readily accommodates the addition of new neural models and training datasets. The first of its kind in neural team formation, OpeNTF also offers negative sampling heuristics that can be seamlessly integrated during model training to boost efficiency and to improve the effectiveness of inference.

GFlow-FT: Pick a Child Network via Gradient Flow for Efficient Fine-Tuning in Recommendation Systems

Conversion Rate (CVR) prediction is a crucial task in online advertising systems. Existing single-domain CVR prediction models suffer from the data sparsity problem since few users purchase items after clicking. In recent years, a robust and effective technique called fine-tuning can transfer knowledge from a data-rich source domain to enhance the CVR prediction performance in a data-sparse target domain. However, since most CVR prediction models have a large number of parameters, fine-tuning all the parameters on a data-sparse domain may lead to over-fitting. In this paper, we propose a general and efficient transfer learning method called Gradient-Flow based Fine-Tuning (GFlow-FT), which only needs to update a subset of parameters (called child network) via pruning the gradients to restrain gradient norm against over-fitting. In addition, our method employs the gradient-flow based measure via calculating the Hessian-gradient product as the criteria for picking the child network, which is superior to the magnitude-based and loss-based measure from empirical results. Extensive experimental results on three real-world datasets from recommendation systems show that GFlow-FT can significantly improve the performance of CVR prediction compared with state-of-the-art fine-tuning approaches.

MASR: A Model-Agnostic Sparse Routing Architecture for Arbitrary Order Feature Sharing in Multi-Task Learning

Multi-task learning (MTL) has experienced rapid growth in recent years. A typical way of conducting MTL with deep neural networks (DNNs) is either establishing a sort of global feature sharing mechanism across all tasks or assigning each task an individual set of parameters with cross-connections. However, these existing approaches leverage DNNs only to share features of a certain order. Several modelsdemonstrated that explicitly modeling feature sharing with both low-order and high-order features can boost performance. To this end, we propose a model-agnostic sparse routing architecture called MASR, which emphasizes arbitrary order feature sharing for multi-task learning. It is able to choose specific orders of features to route for a given task through learnable latent variables. Moreover, MASR is model-agnostic and can be combined with existing MTL models to share features of both low-order and high-order. Extensive experimental results on several real-world datasets not only confirm the significant improvement of MASR performed to existing MTL models but also outperform existing hybrid architectures in terms of AUC metric.

Semi-Supervised Learning with Data Augmentation for Tabular Data

Data augmentation-based semi-supervised learning (SSL) methods have made great progress in computer vision and natural language processing areas. One of the most important factors is that the semantic structure invariance of these data allows the augmentation procedure (e.g., rotating images or masking words) to thoroughly utilize the enormous amount of unlabeled data. However, the tabular data does not possess an obvious invariant structure, and therefore similar data augmentation methods do not apply to it. To fill this gap, we present a simple yet efficient data augmentation method particular designed for tabular data and apply it to the SSL algorithm: SDAT (Semi-supervised learning with Data Augmentation for Tabular data). We adopt a multi-task learning framework that consists of two components: the data augmentation procedure and the consistency training procedure. The data augmentation procedure which perturbs in latent space employs a variational auto-encoder (VAE) to generate the reconstructed samples as augmented samples. The consistency training procedure constrains the predictions to be invariant between the augmented samples and the corresponding original samples. By sharing a representation network (encoder), we jointly train the two components to improve effectiveness and efficiency. Extensive experimental studies validate the effectiveness of the proposed method on the tabular datasets.

Adaptive Graph Spatial-Temporal Transformer Network for Traffic Forecasting

Traffic forecasting can be highly challenging due to complex spatial-temporal correlations and non-linear traffic patterns. Existing works mostly model such spatial-temporal dependencies by considering spatial correlations and temporal correlations separately, or within a sliding temporal window, and fail to model the direct spatial-temporal correlations. Inspired by the recent success of transformers in the graph domain, in this paper, we propose to directly model the cross-spatial-temporal correlations on the adaptive spatial-temporal graph using local multi-head self-attentions. We then propose a novel Adaptive Graph Spatial-Temporal Transformer Network (ASTTN), which stacks multiple spatial-temporal attention layers to apply self-attention on the input graph, followed by linear layers for predictions. Experimental results on public traffic network datasets, METR-LA PEMS-BAY, PeMSD4, and PeMSD7, demonstrate the superior performance of our model.

Subspace Co-clustering with Two-Way Graph Convolution

Subspace clustering aims to cluster high dimensional data lying in a union of low-dimensional subspaces. It has shown good results on the task of image clustering but text clustering, using document-term matrices, proved more impervious to advances based on this approach. We hypothesize that this is because, compared to image data, text data is generally higher dimensional and sparser. This renders subspace clustering impractical in such a context. Here, we leverage subspace clustering for text by addressing these issues. We first extend the concept of subspace clustering to co-clustering, which has been extensively used on document-term matrices due to the resulting interplay between the document and term representations. We then address the sparsity problem through a two-way graph convolution, which promotes the grouping effect that has been credited for the effectiveness of some subspace clustering models. The proposed formulation results in an algorithm that is efficient both in terms of computational and spatial complexity. We show the competitiveness of our model w.r.t the state-of-the-art on document-term attributed graph datasets in terms of performance and efficiency.

On the Mining of Time Series Data Counterfactual Explanations using Barycenters

EXplainable Artificial Intelligence (XAI) methods are increasingly accepted as effective tools to trace complex machine learning models' decision-making processes. There are two underlying XAI paradigms: (1) traditional factual methods and (2) emerging counterfactual models. The first family of methods uses feature attribution techniques that alter the feature space and observe the impact on the decision function. Counterfactual models aim at providing the smallest possible change to the feature vector that can change the prediction outcome. In this paper, we propose TimeX, a new model-agnostic time series counterfactual explanation algorithm that provides sparse, interpretable, and contiguous explanations. We validate our model using real-world time series datasets and show that our approach can generate explanations with up to 20% fewer outliers in comparison with other state-of-the-art competing baselines.

MalNet: A Large-Scale Image Database of Malicious Software

Computer vision is playing an increasingly important role in automated malware detection with the rise of the image-based binary representation. These binary images are fast to generate, require no feature engineering, and are resilient to popular obfuscation methods. Significant research has been conducted in this area, however, it has been restricted to small-scale or private datasets that only a few industry labs and research teams have access to. This lack of availability hinders examination of existing work, development of new research, and dissemination of ideas. We release MalNet-Image, the largest public cybersecurity image database, offering 24x more images and 70x more classes than existing databases (available at MalNet-Image contains over 1.2 million malware images-across 47 types and 696 families---democratizing image-based malware capabilities by enabling researchers and practitioners to evaluate techniques that were previously reported in propriety settings. We report the first million-scale malware detection results on binary images. MalNet-Image unlocks new and unique opportunities to advance the frontiers of machine learning, enabling new research directions into vision-based cyber defenses, multi-class imbalanced classification, and interpretable security.

KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos

Recommender systems deployed in real-world applications can have inherent exposure bias, which leads to the biased logged data plaguing the researchers. A fundamental way to address this thorny problem is to collect users' interactions on randomly expose items, i.e., the missing-at-random data. A few works have asked certain users to rate or select randomly recommended items, e.g., Yahoo!, Coat, and OpenBandit. However, these datasets are either too small in size or lack key information, such as unique user ID or the features of users/items. In this work, we present KuaiRand, an unbiased sequential recommendation dataset containing millions of intervened interactions on randomly exposed videos, collected from the video-sharing mobile App, Kuaishou. Different from existing datasets, KuaiRand records 12 kinds of user feedback signals (e.g., click, like, and view time) on randomly exposed videos inserted in the recommendation feeds in two weeks. To facilitate model learning, we further collect rich features of users and items as well as users' behavior history. By releasing this dataset, we enable the research of advanced debiasing large-scale recommendation scenarios for the first time. Also, with its distinctive features, KuaiRand can support various other research directions such as interactive recommendation, long sequential behavior modeling, and multi-task learning. The dataset is available at

End-to-end Multi-task Learning Framework for Spatio-Temporal Grounding in Video Corpus

In this paper, we consider a novel task, Video Corpus Spatio-Temporal Grounding (VCSTG) for material selection and spatio-temporal adaption in intelligent video editing. Given a text query depicting an object and a corpus of untrimmed and unsegmented videos, VCSTG aims to localize a sequence of spatio-temporal object tubes from the video corpus. Existing methods tackle the VCSTG task in a multi-stage approach, which encodes the query and video representation independently for each task, leading to local optimum. In this paper, we propose a novel one-stage multi-task learning based framework named MTSTG for the VCSTG task. MTSTG learns unified query and video representation for video retrieval, temporal grounding and spatial grounding tasks. Video-level, frame-level and object-level contrastive learning are introduced to measure the mutual information between query and video at different granularity. Comprehensive experiments demonstrate our newly proposed framework outperforms the state-of-the-art multi-stage methods on VidSTG dataset.

Local Contrastive Feature Learning for Tabular Data

Contrastive self-supervised learning has been successfully used in many domains, such as images, texts, graphs, etc., to learn features without requiring label information. In this paper, we propose a new local contrastive feature learning (LoCL) framework, and our theme is to learn local patterns/features from tabular data. In order to create a niche for local learning, we use feature correlations to create a maximum-spanning tree, and break the tree into feature subsets, with strongly correlated features being assigned next to each other. Convolutional learning of the features is used to learn latent feature space, regulated by contrastive and reconstruction losses. Experiments on public tabular datasets show the effectiveness of the proposed method versus state-of-the-art baseline methods.

Binary Transformation Method for Multi-Label Stream Classification

Data streams produce extensive data with high throughput from various domains and require copious amounts of computational resources and energy. Many data streams are generated as multi-labeled and classifying this data is computationally demanding. Some of the most well-known methods for Multi-Label Stream Classification are Problem Transformation schemes; however, previous work on this area does not satisfy the efficiency demands of multi-label data streams. In this study, we propose a novel Problem Transformation method for Multi-Label Stream Classification called Binary Transformation, which utilizes regression algorithms by transforming the labels into a continuous value. We compare our method against three of the leading problem transformation methods using eight datasets. Our results show that Binary Transformation achieves statistically similar effectiveness and provides a much higher level of efficiency.

SpCQL: A Semantic Parsing Dataset for Converting Natural Language into Cypher

The Neo4j query language Cypher enables efficient querying for graphs and has become the most popular graph database language. Due to its complexities, semantic parsing (similar to Text-to-SQL) that translates natural language queries to Cypher becomes highly desirable. We propose the first Text-to-CQL dataset, SpCQL, which contains one Neo4j graph database, 10,000 manually annotated natural language queries and the matching Cypher queries (CQL). Correspondingly, based on this dataset, we define a new semantic parsing task Text-to-CQL. The Text-to-CQL task differs from the traditional Text-to-SQL task due to CQL being more flexible and versatile, especially for schema queries, which brings precedented challenges for the translation process. Although current SOTA Text-to-SQL models utilize SQL schema and contents, they do not scale up to large-scale graph databases. Besides, due to the absence of the primary and foreign keys in Cypher, which are essential for the multi-table Text-to-SQL task, existing Text-to-SQL models are rendered ineffective in this new task and have to be adapted to work. We propose three baselines based on the Seq2Seq framework and conduct experiments on the SpCQL dataset. The experiments yield undesirable results for existing models, hence pressing for subsequent research that considers the characteristics of SQL. The dataset is available at

Fusing Geometric and Scene Information for Cross-View Geo-Localization

Cross-view geo-localization is to match scene images (e.g. ground-view images) with geo-tagged aerial images, which is crucial to a wide range of applications such as autonomous driving and street view navigation. Existing methods can neither address the perspective difference well nor effectively capture the scene information. In this work, we propose a Geometric and Scene Information Fusion (GSIF) model for more accurate cross-view geo-localization. GSIF first learns the geometric information of scene images and aerial images via log-polar transformation and spatial-attention aggregation to alleviate the perspective difference. Then, it mines the scene information of scene images via Sky View Factor (SVF) extraction. Finally, both geometric information and scene information are fused for image matching, and a balanced loss function is introduced to boost the matching accuracy. Experimental results on two real datasets show that our model can significantly outperforms the existing methods.

Calibrated Conversion Rate Prediction via Knowledge Distillation under Delayed Feedback in Online Advertising

Prevailing calibration methods may fail to generalize well due to the pervasively delayed feedback issue in online advertising. That is, the labels of recent samples are more likely to be inaccurate because of the delayed feedback by users, while the old samples with complete feedback may suffer from the data shift compared to the recent ones. In this paper, we propose to calibrate conversion rate prediction models considering delayed feedback via the knowledge distillation technique. Specifically, we deploy a teacher model modeling by the samples with complete feedback to learn long-term conversion patterns and a student model modeling by the recent data to reduce the impact of data shift. We also devise a distillation loss to buoy the student model to learn from the teacher. Experimental results on two real-world advertising conversion rate prediction datasets demonstrate that our method can provide more calibrated predictions compared with the existing ones. We also exhibit that our method can be extended to different base models.

SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse

Scientific topics, claims and resources are increasingly debated as part of online discourse, where prominent examples include discourse related to COVID-19 or climate change. This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines. For instance, communication studies aim at a deeper understanding of biases, quality or spreading patterns of scientific information, whereas computational methods have been proposed to extract, classify or verify scientific claims using NLP and IR techniques. However, research across disciplines currently suffers from both a lack of robust definitions of the various forms of science-relatedness as well as appropriate ground truth data for distinguishing them. In this work, we contribute (a) an annotation framework and corresponding definitions for different forms of scientific relatedness of online discourse in tweets, (b) an expert-annotated dataset of 1261 tweets obtained through our labeling framework reaching an average Fleiss Kappa κ of 0.63, (c) a multi-label classifier trained on our data able to detect science- relatedness with 89% F1 and also able to detect distinct forms of scientific knowledge (claims, references). With this work, we aim to lay the foundation for developing and evaluating robust methods for analysing science as part of large-scale online discourse.

OpenHGNN: An Open Source Toolkit for Heterogeneous Graph Neural Network

Heterogeneous Graph Neural Networks (HGNNs), as a kind of powerful graph representation learning methods on heterogeneous graphs, have attracted increasing attention of many researchers. Although, several existing libraries have supported HGNNs, they just provide the most basic models and operators. Building and benchmarking various downstream tasks on HGNNs is still painful and time consuming with them. In this paper, we will introduce OpenHGNN, an open-source toolkit for HGNNs. OpenHGNN defines a unified and standard pipeline for training and testing, which can allow users to run a model on a specific dataset with just one command line. OpenHGNN has integrated 20+ mainstream HGNNs and 20+ heterogeneous graph datasets, which can be used for various advanced tasks, such as node classification, link prediction, and recommendation. In addition, thanks to the modularized design of OpenHGNN, it can be extended to meet users' customized needs. We also release several novel and useful tools and features, including leaderboard, autoML, design space, and visualization, to provide users with better usage experiences. OpenHGNN is an open-source project, and the source code is available at

Long-tail Mixup for Extreme Multi-label Classification

Extreme multi-label classification (XMC) aims at finding multiple relevant labels for a given sample from a huge label set at the industrial scale. The XMC problem inherently poses two challenges: scalability and label sparsity - the number of labels is too large, and labels follow the long-tail distribution. To resolve these problems, we propose a novel Mixup-based augmentation method for long-tail labels, called TailMix. Building upon the partition-based model, TailMix utilizes the context vectors generated from the label attention layer. It first selectively chooses two context vectors using the inverse propensity score of labels and the label proximity graph representing the co-occurrence of labels. Using two context vectors, it augments new samples with the long-tail label to improve the accuracy of long-tail labels. Despite its simplicity, experimental results show that TailMix consistently outperforms other augmentation methods on three benchmark datasets, especially for long-tail labels in terms of two metrics, PSP@k and PSN@k.

Stochastic Optimization of Text Set Generation for Learning Multiple Query Intent Representations

Learning multiple intent representations for queries has potential applications in facet generation, document ranking, search result diversification, and search explanation. The state-of-the-art model for this task assumes that there is a sequence of intent representations. In this paper, we argue that the model should not be penalized as long as it generates an accurate and complete set of intent representations. Based on this intuition, we propose a stochastic permutation invariant approach for optimizing such networks. We extrinsically evaluate the proposed approach on a facet generation task and demonstrate significant improvements compared to competitive baselines. Our analysis shows that the proposed permutation invariant approach has the highest impact on queries with more potential intents.

Unified Knowledge Prompt Pre-training for Customer Service Dialogues

Dialogue bots have been widely applied in customer service scenarios to provide timely and user-friendly experience. These bots must classify the appropriate domain of a dialogue, understand the intent of users, and generate proper responses. Existing dialogue pre-training models are designed only for several dialogue tasks and ignore weakly-supervised expert knowledge in customer service dialogues. In this paper, we propose a novel unified knowledge prompt pre-training framework, UFA (Unified Model F or All Tasks), for customer service dialogues. We formulate all the tasks of customer service dialogues as a unified text-to-text generation task and introduce a knowledge-driven prompt strategy to jointly learn from a mixture of distinct dialogue tasks. We pre-train UFA on a large-scale Chinese customer service corpus collected from practical scenarios and get significant improvements on both natural language understanding (NLU) and natural language generation (NLG) benchmarks.

Causal Intervention for Sentiment De-biasing in Recommendation

Biases and de-biasing in recommender systems have received increasing attention recently. This study focuses on a newly identified bias, i.e., sentiment bias, which is defined as the divergence in recommendation performance between positive users/items and negative users/items. Existing methods typically employ a regularization strategy to eliminate the bias. However, blindly fitting the data without modifying the training procedure would result in a biased model, sacrificing recommendation performance.

In this study, we resolve the sentiment bias with causal reasoning. We develop a causal graph to model the cause-effect relationships in recommender systems, in which the sentiment polarity presented by review text acts as a confounder between user/item representations and observed ratings. The existence of confounders inspires us to go beyond conditional probability and embrace causal inference. To that aim, we use causal intervention in model training to remove the negative effect of sentiment bias. Furthermore, during model inference, we adjust the prediction score to produce personalized recommendations. Extensive experiments on five benchmark datasets validate that the deconfounded training can remove the sentiment bias and the inference adjustment is helpful to improve recommendation accuracy.

Query-Aware Sequential Recommendation

Sequential recommenders aim to capture users' dynamic interests from their historical action sequences, but remain challenging due to data sparsity issues, as well as the noisy and complex relationships among items in a sequence. Several approaches have sought to alleviate these issues using side-information, such as item content (e.g., images), action types (e.g., click, purchase). While useful, we argue one of the main contextual signals is largely ignored-namely users' queries. When users browse and consume products (e.g., music, movies), their sequential interactions are usually a combination of queries, clicks (etc.). Most interaction datasets discard queries, and corresponding methods simply model sequential behaviors over items and thus ignore this critical context of user interactions.

In this work, we argue that user queries should be an important contextual cue for sequential recommendation. First, we propose a new query-aware sequential recommendation setting, i.e. incorpo- rating explicit user queries to model users' intent. Next, we propose a model, namely Query-SeqRec, to (1) incorporate query information into user behavior sequences; and (2) improve model generalization ability using query-item co-occurrence information. Last, we demonstrate the effectiveness of incorporating query features in sequential recommendation on three datasets.1

Semi-supervised Continual Learning with Meta Self-training

Continual learning (CL) aims to enhance sequential learning by alleviating the forgetting of previously acquired knowledge. Recent advances in CL lack consideration of the real-world scenarios, where labeled data are scarce and unlabeled data are abundant. To narrow this gap, we focus on semi-supervised continual learning (SSCL). We exploit unlabeled data under limited supervision in the CL setting and demonstrate the feasibility of semi-supervised learning in CL. In this work, we propose a novel method, namely Meta-SSCL, which combines meta-learning with pseudo-labeling and data augmentations to learn a sequence of semi-supervised tasks without catastrophic forgetting. Extensive experiments on CL benchmark text classification datasets show that our method achieves promising results in SSCL.

Extreme Systematic Reviews: A Large Literature Screening Dataset to Support Environmental Policymaking

The United States Environmental Protection Agency (EPA) periodically releases Integrated Science Assessments (ISAs) that synthesize the latest research on each of six air pollutants to inform environmental policymaking. To guarantee the best possible coverage of relevant literature, EPA scientists spend months manually screening hundreds of thousands of references to identify a small proportion to be cited in an ISA. The challenge of extreme scale and the pursuit of maximum recall calls for effective machine-assisted approaches to reducing the time and effort required by the screening process. This work introduces the ISA literature screening dataset and the associated research challenges to the information and knowledge management community. Our pilot experiments show that combining multiple approaches in tackling this challenge is both promising and necessary. The dataset is available at

META-CODE: Community Detection via Exploratory Learning in Topologically Unknown Networks

The discovery of community structures in social networks has gained considerable attention as a fundamental problem for various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often unknown, thereby rendering established community detection approaches ineffective without costly data acquisition. To tackle this challenge, we present META-CODE, a novel end-to-end solution for detecting overlapping communities in networks with unknown topology via exploratory learning aided by easy-to-collect node metadata. Specifically, META-CODE consists of three steps: 1) initial network inference, 2) node-level community-affiliation embedding based on graph neural networks (GNNs) trained by our new reconstruction loss, and 3) network exploration via community-affiliation-based node queries, where Steps 2 and 3 are performed iteratively. Experimental results demonstrate that META-CODE exhibits (a) superiority over benchmark methods for overlapping community detection, (b) the effectiveness of our training model, and (c) fast network exploration.

AMinerGNN: Heterogeneous Graph Neural Network for Paper Click-through Rate Prediction with Fusion Query

Paper recommendation with user-generated keyword is to suggest papers that simultaneously meet user's interests and are relevant to the input keyword. This is a recommendation task with two queries, a.k.a. user ID and keyword. However, existing methods focus on recommendation according to one query, a.k.a. user ID, and are not applicable to solving this problem. In this paper, we propose a novel click-through rate (CTR) prediction model with heterogeneous graph neural network, called AMinerGNN, to recommend papers with two queries. Specifically, AMinerGNN constructs a heterogeneous graph to project user, paper, and keyword into the same embedding space by graph representation learning. To process two queries, a novel query attentive fusion layer is designed to recognize their importances dynamically and then fuse them as one query to build a unified and end-to-end recommender system. Experimental results on our proposed dataset and online A/B tests prove the superiority of AMinerGNN.

Pattern Adaptive Specialist Network for Learning Trading Patterns in Stock Market

Stock prediction is a challenging task due to the uncertainty of stock markets. Despite the success of previous works, most of them rely on the assumption that stock data follow the identically identical distribution while the existence of multiple trading patterns in stock market violates it, ignoring multiple patterns in stock market will inevitably lead to the performance decline, and the lack of pattern prior knowledge further hinders the learning of patterns. In this paper, we propose a novel training process Pattern Adaptive Training based on Optimal Transport (OT) to train a set of predictors specializing in diverse patterns while without any prior pattern knowledge and inconsistent assumption. Based on this process, we further mine the potential fitness rank among specialists and design the Pattern Adaptive Specialist Network (PASN) with proposed ranking based selector to choose appropriate specialist predictor for samples. Extensive experimental results show that our method achieves best IC and other metrics on real-world stock datasets.

Deep Presentation Bias Integrated Framework for CTR Prediction

In online advertising, click-through rate (CTR) prediction typically utilizes click data to train models for estimating the probability of a user clicking on an item. However, the different presentations of an item, including its position and contextual items, etc., will affect the user's attention and lead to different click propensities, thus the presentation bias arises. Most previous works generally consider position bias and pay less attention to overall presentation bias including context. Simultaneously, since the final presentation list is unreachable during online inference, the bias independence assumption is adopted so that the debiased relevance can be directly used for ranking. But this assumption is difficult to hold because the click propensity to the item presentation varies with user intent. Therefore, predicted CTR with personalized click propensity rather than debiased relevance should be closer to real CTR. In this work, we propose a Deep Presentation Bias Integrated Framework (DPBIF). With DPBIF, the presentation block containing item and contextual items on the same screen is introduced into user behavior sequence and predicted target item for personalizing the integration of presentation bias caused by different click propensities into CTR prediction network. While avoiding modeling with the independence assumption, the network is capable of estimating multiple integrated CTRs under different presentations for each item. The multiple CTRs are used to transform the ranking problem into an item-to-position assignment problem so that the Kuhn-Munkres (KM) algorithm is employed to optimize the global benefit of the presentation list. Extensive offline experiments and online A/B tests are performed in a real-world system to demonstrate the effectiveness of the proposed framework.

GDA-HIN: A Generalized Domain Adaptive Model across Heterogeneous Information Networks

Domain adaptation using graph-structured networks learns label-discriminative and network-invariant node embeddings by sharing graph parameters. Most existing works focus on domain adaptation of homogeneous networks. The few works that study heterogeneous cases only consider shared node types but ignore private node types in individual networks. However, for given source and target heterogeneous networks, they generally contain shared and private node types, where private types bring an extra challenge for graph domain adaptation. In this paper, we investigate Heterogeneous Information Networks (HINs) with both shared and private node types and propose a Generalized Domain Adaptive model across HINs (GDA-HIN) to handle the domain shift between them. GDA-HIN can not only align the distribution of identical-type nodes and edges in two HINs but also make full use of different-type nodes and edges to improve the performance of knowledge transfer. Extensive experiments on several datasets demonstrate that GDA-HIN can outperform state-of-the-art methods in various domain adaptation tasks across heterogeneous networks.

LGP: Few-Shot Class-Evolutionary Learning on Dynamic Graphs

Graph few-shot learning aims to learn how to quickly adapt to new tasks using only a few labeled data, which transfers learned knowledge of base classes to novel classes. Existing methods are mainly designed for static graphs, while many real-world graphs are dynamic and evolving over time, resulting in a phenomenon of structure and class evolutions. To address the challenges caused by the phenomenon, in this paper, we propose a novel algorithm named Learning to Generate Parameters (LGP) to deal with few-shot class-evolutionary learning on dynamic graphs. Specifically, for the structure evolution, LGP integrates ensemble learning into a backbone network to effectively learn invariant representation across different snapshots within a dynamic graph. For the class evolution, LGP adopts a meta-learning strategy that can learn to generate the classified parameters of novel classes via the parameters of the base classes. Therefore, LGP can quickly adapt to new tasks on a combination of base and novel classes. Besides, LGP utilizes an attention mechanism to capture the evolutionary pattern between the novel and based classes. Extensive experiments on a real-world dataset demonstrate the effectiveness of LGP.

An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models

Tabular data typically contains private and important information; thus, precautions must be taken before they are shared with others. Although several methods (e.g., differential privacy and k-anonymity) have been proposed to prevent information leakage, in recent years, tabular data synthesis models have become popular because they can well trade-off between data utility and privacy. However, recent research has shown that generative models for image data are susceptible to the membership inference attack, which can determine whether a given record was used to train a victim synthesis model. In this paper, we investigate the membership inference attack in the context of tabular data synthesis. We conduct experiments on 4 state-of-the-art tabular data synthesis models under two attack scenarios (i.e., one black-box and one white-box attack), and find that the membership inference attack can seriously jeopardize these models. We next conduct experiments to evaluate how well two popular differentially-private deep learning training algorithms, DP-SGD and DP-GAN, can protect the models against the attack. Our key finding is that both algorithms can largely alleviate this threat by sacrificing the generation quality.

NILK: Entity Linking Dataset Targeting NIL-linking Cases

The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. NIL-linking has two sub-tasks: NIL-detection and NIL-disambiguation. NIL-detection identifies NIL-mentions in the text. Then, NIL-disambiguation determines if some NIL-mentions refer to the same out-of-knowledge base entity. Although multiple existing datasets can be adapted for NIL-detection, none of them address the problem of NIL-disambiguation. This paper presents NILK, a new dataset for NIL-linking processing, constructed from WikiData and Wikipedia dumps from two different timestamps. The NILK dataset has two main features: 1) It marks NIL-mentions for NIL-detection by extracting mentions which belong to newly added entities in Wikipedia text. 2) It provides an entity label for NIL-disambiguation by marking NIL-mentions with WikiData IDs from the newer dump. We make available the annotated dataset along with the code1. The NILK dataset is available at:

RealGraphGPU: A High-Performance GPU-Based Graph Engine toward Large-Scale Real-World Network Analysis

A graph, consisting of vertices and edges, has been widely adopted for network analysis. Recently, with the increasing size of real-world networks, many graph engines have been studied to efficiently process large-scale real-world graphs. RealGraph, one of the state-of-the-art single-machine-based graph engines, efficiently processes storage-to-memory I/Os by considering unique characteristics of real-world graphs. Via an in-depth analysis of RealGraph, however, we found that there is still a chance for more performance improvement in the computation part of RealGraph despite its great I/O processing ability. Motivated by this, in this paper, we propose RealGraphGPU, a GPU-based single-machine graph engine. We design the core components required for GPU-based graph processing and incorporate them into the architecture of RealGraph. Further, we propose two optimizations that successfully address the technical issues that could cause the performance degradation in the GPU-based graph engine: buffer pre-checking and edge-based workload allocation strategies. Through extensive evaluation with 6 real-world datasets, we demonstrate that (1) RealGraphGPU improves RealGraph by up to 546%, (2) RealGraphGPU outperforms existing state-of-the-art graph engines dramatically, and (3) the optimizations are all effective in large-scale graph processing.

Intra-session Context-aware Feed Recommendation in Live Systems

Feed recommendation allows users to constantly browse items until feel uninterested and leave the session, which differs from traditional recommendation scenarios. Within a session, user's decision to continue browsing or not substantially affects occurrences of later clicks. However, such type of exposure bias is generally ignored or not explicitly modeled in most feed recommendation studies. In this paper, we model this effect as part of intra-session context, and propose a novel intra-session Context-aware Feed Recommendation (INSCAFER) framework to maximize the total views and total clicks simultaneously. User click and browsing decisions are jointly learned by a multi-task setting, and the intra-session context is encoded by the session-wise exposed item sequence. We deploy our model on Alipay with all key business benchmarks improved. Our method sheds some lights on feed recommendation studies which aim to optimize session-level click and view metrics.

MCSCSet: A Specialist-annotated Dataset for Medical-domain Chinese Spelling Correction

Chinese Spelling Correction (CSC) is gaining increasing attention in recent years. Despite its extensive use in many applications, such as search engine and optical character recognition system, little has been explored in medical scenarios in which complex and uncommon medical entities are easily misspelled. Correcting the misspellings of medical entities is arguably more difficult than those in the open domain due to its requirements of specific domain knowledge. In this work, we define the task of Medical-domain Chinese Spelling Correction (MCSC) and propose MCSCSet, a large-scale specialist-annotated dataset that contains about 200k samples. In contrast to existing open-domain CSC datasets, MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists. Our work further offers a medical-domain confusion set consisting of the common error-prone characters in medicine and their corresponding misspellings. Extensive empirical studies have shown significant gaps between the open-domain and medical-domain spelling correction, highlighting the need to develop high-quality datasets that allow for CSC in specific domains. Moreover, our work benchmarks several representative methods, establishing baselines for future work.

AI-Augmented Art Psychotherapy through a Hierarchical Co-Attention Mechanism

One of the significant social problems emerging in modern society is mental illness, and a growing number of people are seeking psychological help. Art therapy is a technique that can alleviate psychological and emotional conflicts through creation. However, the expression of a drawing varies by individuals, and the subjective judgments made by art therapists raise the need to secure an objective assessment. In this paper, we present M2C (Multimodal classification with 2-stage Co-attention), a deep learning model that predicts stress from art therapy psychological test data. M2C employs a co-attention mechanism that combines two modalities-drawings and post-questionnaire answers-to complement the weaknesses of each, which corresponds to therapists' psychometric diagnostic processes. The results of the experiment show that M2C yielded higher performance than other state-of-the-art single- or multi-modal models, demonstrating the effectiveness of the co-attention approach that reflects the diagnosis process.

GReS: Graphical Cross-domain Recommendation for Supply Chain Platform

Supply Chain Platforms (SCPs) provide downstream industries with raw materials. Compared with traditional e-commerce platforms, data in SCPs is more sparse due to limited user interests. To tackle the data sparsity problem, one can apply Cross-Domain Recommendation (CDR) to improve the recommendation performance of the target domain with the source domain information. However, applying CDR to SCPs directly ignores hierarchical structures of commodities in SCPs, which reduce recommendation performance. In this paper, we take the catering platform as an example and propose GReS, a graphical CDR model. The model first constructs a tree-shaped graph to represent the hierarchy of different nodes of dishes and ingredients, and then applies our proposed Tree2vec method combining GCN and BERT models to embed the graph for recommendations. Experimental results show that GReS significantly outperforms state-of-the-art methods in CDR for SCPs.

Personal Entity, Concept, and Named Entity Linking in Conversations

Building conversational agents that can have natural and knowledge-grounded interactions with humans requires understanding user utterances. Entity Linking (EL) is an effective technique for understanding natural language text and connecting it to external knowledge. It is, however, shown that the existing EL methods developed for annotating documents are suboptimal for conversations, where concepts and personal entities (e.g., "my cars'') are essential for understanding user utterances. In this paper, we introduce a collection and a tool for entity linking in conversations. We provide EL annotations for 1,327 conversational utterances, consisting of links to named entities, concepts, and personal entities. The dataset is used for training our toolkit for conversational entity linking, CREL. Unlike existing EL methods, CREL is developed to identify both named entities and concepts. It also utilizes coreference resolution techniques to identify personal entities and their references to the explicit entity mentions in the conversations. We compare CREL with state-of-the-art techniques and show that it outperforms all existing baselines.

Commonsense Knowledge Base Completion with Relational Graph Attention Network and Pre-trained Language Model

Many commonsense knowledge graphs (CKGs) still suffer from incompleteness although they have been applied in many natural language processing tasks successfully. Due to the scale and sparsity of CKGs, existing knowledge base completion models are not still competent for CKGs. In this paper, we propose a commonsense knowledge base completion (CKBC) model which learns the structural representations and contextual representations of CKG nodes and relations, respectively by a relational graph attention network and a pre-trained language model. Based on these two types of representations, the scoring decoder in our model achieves a more accurate prediction for a given triple. Our empirical studies on the representative CKG ConceptNet demonstrate our model's superiority over the state-of-the-art CKBC models.

Convolutional Transformer Networks for Epileptic Seizure Detection

Epilepsy is a chronic neurological disease that affects many people in the world. Automatic epileptic seizure detection based on electroencephalogram (EEG) signals is of great significance and has been widely studied. The current deep learning epilepsy detection algorithms are often designed to be relatively simple and seldom consider the characteristics of EEG signals. In this paper, we propose a promising epilepsy detection model based on convolutional transformer networks. We demonstrate that integrating convolution and transformer modules can achieve higher detection performance. Our convolutional transformer model is composed of two branches: one extracts time-domain features from multiple inputs of channel-exchanged EEG signals, and the other handle frequency-domain representations. Experiments on two EEG datasets show that our model offers state-of-the-art performance. Particularly on the CHB-MIT dataset, our model achieves 96.02% in average sensitivity and 97.94% in average specificity, outperforming other existing methods with clear margins.

Mining Entry Gates for Points of Interest

In this paper, we propose two algorithms for identifying entry gates for Points of Interest (PoIs) using polygon representations of the PoIs (PoI polygons) and the Global Positioning System (GPS) trajectories of the Delivery Partners (DPs) obtained from their smartphones in the context of online food delivery platforms. PoIs include residential complexes, office complexes, and educational institutes where customers can order from. Identifying entry gates of PoIs helps avoid delivery hassles by routing the DPs to the nearest entry gates for customers within the PoIs. The DPs mark 'reached' on their smartphone applications when they reach the entry gate or the parking spot of the PoI. However, it is not possible to ensure compliance, and the 'reached' locations are dispersed throughout the PoI. The first algorithm is based on density-based clustering of GPS traces where the DPs mark 'reached'. The clusters that overlap with the PoI polygon as measured by a metric that we propose, namely Cluster Fraction in Polygon (CFIP), are declared as entry gate clusters. The second algorithm obtains the entry gate clusters as density-based clustering of intersections of GPS trajectories of the DPs with the PoI polygon edges. The entry gates are obtained as median centroids of the entry gate clusters for both the algorithms which are then snapped to the nearest polygon edge in the case of the first algorithm. We evaluate the algorithms for a few thousand PoIs across 9 large cities in India using appropriately defined precision and recall metrics. For single-gate PoIs, we obtain a mean precision of 84%, a mean recall of 77%, and an average haversine distance error of 14.7 meters for the first algorithm. For the second algorithm, the mean precision is the same as the first algorithm while the recall obtained is 78% and the average haversine distance error is 14.3 meters. The algorithmically identified gates were evaluated by manual validation. To the best of our knowledge, this is the first published work with metrics that solves for a ''last-last mile" entity of digital maps for India, i.e., the entry gates.

Models and Benchmarks for Representation Learning of Partially Observed Subgraphs

Subgraphs are rich substructures in graphs, and their nodes and edges can be partially observed in real-world tasks. Under partial observation, existing node- or subgraph-level message-passing produces suboptimal representations. In this paper, we formulate a novel task of learning representations of partially observed subgraphs. To solve this problem, we propose Partial Subgraph InfoMax (PSI) framework and generalize existing InfoMax models, including DGI, InfoGraph, MVGRL, and GraphCL, into our framework. These models maximize the mutual information between the partial subgraph's summary and various substructures from nodes to full subgraphs. In addition, we suggest a novel two-stage model with k-hop PSI, which reconstructs the representation of the full subgraph and improves its expressiveness from different local-global structures. Under training and evaluation protocols designed for this problem, we conduct experiments on three real-world datasets and demonstrate that PSI models outperform baselines.

Bootstrapped Knowledge Graph Embedding based on Neighbor Expansion

Most Knowledge Graph(KG) embedding models require negative sampling to learn the representations of KG by discriminating the differences between positive and negative triples. Knowledge representation learning tasks such as link prediction are heavily influenced by the quality of negative samples. Despite many attempts, generating high-quality negative samples remains a challenge. In this paper, we propose a novel framework, Bootstrapped Knowledge graph Embedding based on Neighbor Expansion (BKENE), which learns representations of KG without using negative samples. Our model avoids using augmentation methods that can alter the semantic information when creating the two semantically similar views of KG. In particular, we generate an alternative view of KG by aggregating the information of the expanded neighbor of each node with multi-hop relation. Experimental results show that our BKENE outperforms the state-of-the-art methods for link prediction tasks.

Debiasing Neighbor Aggregation for Graph Neural Network in Recommender Systems

Graph neural networks (GNNs) have achieved remarkable success in recommender systems by representing users and items based on their historical interactions. However, little attention was paid to GNN's vulnerability to exposure bias: users are exposed to a limited number of items so that a system only learns a biased view of user preference to result in suboptimal recommendation quality. Although inverse propensity weighting is known to recognize and alleviate exposure bias, it usually works on the final objective with the model outputs, whereas GNN can also be biased during neighbor aggregation. In this paper, we propose a simple but effective approach, neighbor aggregation via inverse propensity (NAVIP) for GNNs. Specifically, given a user-item bipartite graph, we first derive propensity score of each user-item interaction in the graph. Then, inverse of the propensity score with Laplacian normalization is applied to debias neighbor aggregation from exposure bias. We validate the effectiveness of our approach through our extensive experiments on two public and Amazon Alexa datasets where the performance enhances up to 14.2%.

Context-aware Traffic Flow Forecasting in New Roads

This paper focuses on the problem of forecasting daily traffic of new roads, where very little data is available for prediction. We propose a novel prediction model based on Generative Adversarial Networks (GAN) that learns the subtle patterns of the changes in the traffic flow according to the various contextual factors. Then the trained generator makes a prediction via generating a realistic traffic flow data of a target new road given its weather and day type. Both the quantitative and qualitative results of our extensive experiments indicate the effectiveness of our method.

Is It Enough Just Looking at the Title?: Leveraging Body Text To Enrich Title Words Towards Accurate News Recommendation

In a news recommender system, a user tends to click on a news article if she is interested in its topic understood by looking at its title. Such a behavior is possible since, when viewing the title, humans naturally think of the contextual meaning of each title word by leveraging their own background knowledge. Motivated by this, we propose a novel personalized news recommendation framework CAST (Context-aware Attention network with a Selection module for Title word representation), which is capable of enriching title words by leveraging body text that fully provides the whole content of a given article as the context. Through extensive experiments, we demonstrate (1) the effectiveness of core modules in CAST, (2) the superiority of CAST over 9 state-of-the-art news recommendation methods, and (3) the interpretability with CAST.

EEG-Oriented Self-Supervised Learning and Cluster-Aware Adaptation

Recently, deep learning-based electroencephalogram (EEG) analysis and decoding have gained widespread attention to monitor a user's clinical condition or identify his/her intention/emotion. Nevertheless, the existing methods mostly model EEG signals with limited viewpoints or restricted concerns about the characteristics of the EEG signals, thus suffering from representing complex spatio-spectro-temporal patterns as well as inter-subject variability. In this work, we propose novel EEG-oriented self-supervised learning methods to discover complex and diverse patterns of spatio-spectral characteristics and spatio-temporal dynamics. Combined with the proposed self-supervised representation learning, we also devise a feature normalization strategy to resolve an inter-subject variability problem via clustering. We demonstrated the validity of the proposed framework on three publicly available datasets by comparing with state-of-the-art methods.

Neuron Specific Pruning for Communication Efficient Federated Learning

Federated Learning (FL) is a distributed training framework where a model is collaboratively trained over a set of clients without communicating their private data to the central server. However, each client shares the parameters of its local model. The first challenge faced by the FL is high communication cost due to the size of Deep Neural Network (DNN) models. Pruning is an efficient technique to reduce the number of parameters in DNN models, in which insignificant neurons are removed from the model. This paper introduces a federated pruning method based on Neuron Importance Scope Propagation (NISP) algorithm. The importance scores of output layer neurons are back-propagated layer-wise to every neuron in the network. The central server iteratively broadcasts the sparsified weights to all selected clients. Then, each participating client intermittently downloads the mask vector and reconstructs the weights in their original form. The locally updated model is pruned using the mask vector and shared with the server. After receiving model updates from each participating client, the server reconstructs and aggregates the weights. Experiments on MNIST and CIFAR10 datasets demonstrate that the proposed approach achieves accuracy close to Federated Averaging (FedAvg) algorithm with less communication cost.

Efficient Data Augmentation Policy for Electrocardiograms

We present the taxonomy of data augmentation for electrocardiogram (ECG) after reviewing various ECG augmentation methods. On the basis of the taxonomy, we demonstrate the effect of augmentation methods on the ECG classification via extensive experiments. Initially, we examine the performance trend as the magnitude of distortion increases and identify the optimal distortion magnitude. Secondly, we investigate the synergistic combinations of the transformations and identify the pairs of transformations with the greatest positive effect. Finally, based on our experimental findings, we propose an efficient augmentation policy and demonstrate that it outperforms previous augmentation policies.

A Multi-grained Dataset for News Event Triggered Knowledge Update

Keeping knowledge facts up-to-date is labored and costly as the world rapidly changes and new information emerges every second. In this work, we introduce a novel task, news event triggered knowledge update. Given an existing article about a topic with a news event about the topic, the aim of our task is to generate an updated article according to the information from the news event. We create a multi-grained dataset for the investigation of our task. The articles from Wikipedia are collected and aligned with news events at multiple language units, including the citation text, the first paragraph, and the full content of the news article. Baseline models are also explored at three levels of knowledge update, including the first paragraph, the summary, and the full content of the knowledge facts.

A Hierarchical User Behavior Modeling Framework for Cross-Domain Click-Through Rate Prediction

Click-through rate (CTR) prediction is a long-standing problem in advertising systems. Existing single-domain CTR prediction methods suffer from the data sparsity problem since few users can click advertisements on many items. Recently, cross-domain CTR prediction leverages the relatively richer information from a source domain to improve the performance on a target domain with sparser information, but it cannot explicitly capture users' diverse interests in different domains. In this paper, we propose a novel hierarchical user behavior modeling framework for cross-domain CTR prediction, named HBMNet. HBMNet contains two main components: an element-wise behavior transfer(EWBT) layer and a user representation layer. EWBT layer transfers the information collected from one domain by element-level masks to dynamically highlight the informative elements in another domain. The user representation layer performs behavior-level attention between these behavior representations and the ranking item representation. Extensive experimental results on two cross-domain datasets show that the proposed HBMNet outperforms SOTA models.

Do Simpler Statistical Methods Perform Better in Multivariate Long Sequence Time-Series Forecasting?

Long sequence time-series forecasting has become a central problem in multivariate time-series analysis due to its difficulty of consistently maintaining low prediction errors. Recent research has concentrated on developing large deep learning frameworks such as Informer and SCINet with remarkable results. However, these complex approaches were not benchmarked with simpler statistical methods and hence this part of the puzzle is missing for multivariate long sequence time-series forecasting (MLSTF). We investigate two simple statistical methods for MLSTF and provide analysis to indicate that linear regression owns a lower upper bound of error than deep learning methods and SNaive can act as an effective nonparametric method with unpredictable trends. Evaluations across six real-world datasets demonstrate that linear regression and SNaive are able to achieve state-of-the-art performance for MLSTF.

Cooperative Max-Pressure Enhanced Traffic Signal Control

Adaptive traffic signal control is an important and challenging real-world problem that fits well with the task framework of deep reinforcement learning. As one of the critical design elements, the environmental state plays a crucial role in traffic signal control decisions. The state definitions of most existing works mostly contain lane-level queue length, intersection phase, and other features. However, these works are heuristically designed in representing states. This results in highly sensitive and unstable performances of next actions. The paper proposes a <u>C</u>ooperative <u>M</u>ax-<u>P</u>ressure enhanced <u>S</u>tate <u>L</u>earning for the traffic signal control (CMP-SL), which is inspired by the advanced pressure definition for an intersection in the transportation field to cope with this problem. First, our CMP-SL explicitly extends the cooperative max-pressure to the state definition of a target intersection, aiming to obtain accurate environment information by including the traffic pressures of surrounding intersections. From then on, a graph attention mechanism (GAT) is used to learn the state representation of the target intersection in our spatial-temporal state module. Second, since the state is coupled with the reward in reinforcement learning, our method takes the cooperative max-pressure of the target intersection into the reward definition. Furthermore, a temporal convolutional network (TCN) based sequence model is used to capture the historical state of traffic flow. And the historical spatial-temporal and the current spatial state features are concatenated into a DQN network to predict the Q value and generate each phase action. Finally, experiments with two real-world traffic datasets demonstrate that our method achieves shorter vehicle average times and higher network throughput than the state-of-the-art models.

An Exploratory Study of Information Cocoon on Short-form Video Platform

In recent years, short-form video platforms have emerged rapidly and attracted a large and wide variety of users, with the help of advanced recommendation algorithms. Despite the great success, the algorithms have caused some negative effects, such as information cocoon, algorithm unfairness,etc. In this work, we focus on theinformation cocoon that measures overwhelmingly homogeneity of users' video consumption. Specifically, we conduct an exploratory study of this phenomenon on a top short-form video platform, with one-year behavioral records of new users. First, we evaluate the evolution of users' information cocoons and find the limitation of the diversity of video content that users consume. In addition, we further explore user cocoons via the correlation analysis from three aspects, including user demographics, video content, and user-recommender interactions driven by algorithms and user preferences. Correspondingly, we observe that video content plays a more significant role in affecting user cocoons than demographics does. In terms of user-recommender interactions, more accurate personalization does not contribute to more severe information cocoons necessarily, while users with narrow preferences are more likely to be trapped. In summary, our study illuminates the current concern of information cocoons that may hurt user experience on short-form video platforms, and offers potential directions for mitigation implied by the correlation analysis.

Prototypical Contrastive Learning and Adaptive Interest Selection for Candidate Generation in Recommendations

Deep Candidate Generation plays an important role in large-scale recommender systems. It takes user history behaviors as inputs and learns user and item latent embeddings for candidate generation. In the literature, conventional methods suffer from two problems. First, a user has multiple embeddings to reflect various interests, and such number is fixed. However, taking into account different levels of user activeness, a fixed number of interest embeddings is sub-optimal. For example, for less active users, they may need fewer embeddings to represent their interests compared to active users. Second, the negative samples are often generated by strategies with unobserved supervision, and similar items could have different labels. Such a problem is termed as class collision. In this paper, we aim to advance the typical two-tower DNN candidate generation model. Specifically, an Adaptive Interest Selection Layer is designed to learn the number of user embeddings adaptively in an end-to-end way, according to the level of their activeness. Furthermore, we propose a Prototypical Contrastive Learning Module to tackle the class collision problem introduced by negative sampling. Extensive experimental evaluations show that the proposed scheme remarkably outperforms competitive baselines on multiple benchmarks.

Dual-Augment Graph Neural Network for Fraud Detection

Graph Neural Networks (GNNs) have drawn attention due to their excellent performance in fraud detection tasks, which reveal fraudsters by aggregating the features of their neighbors. However, some fraudsters typically tend to alleviate their suspiciousness by connecting with many benign ones. Besides, label-imbalanced neighborhood also deteriorates fraud detection accuracy. Such behaviors violate the homophily assumption and worsen the performance of GNN-based fraud detectors. In this paper, we propose a Dual-Augment Graph Neural Network (DAGNN) for fraud detection tasks. In DAGNN, we design a two-pathway framework including disparity augment (DA) pathway and similarity augment (SA) pathway. Accordingly, we devise two novel information aggregation strategies. One is to augment the disparity between target node and its heterogenous neighbors in original topology. The other is to augment its similarity to homogenous neighbors in a relatively label-balanced neighborhood. The experimental results compared with the state-of-the-art models on two real-world datasets demonstrate the superiority of the proposed DAGNN.

CNewsTS - A Large-scale Chinese News Dataset with Hierarchical Topic Category and Summary

In this paper, we present a large Chinese news article dataset with 4.4 million articles. These articles are obtained from different news channels and sources. They are labeled with multi-level topic categories, and some of them also have summaries. This is the first Chinese news dataset that has both hierarchical topic labels and article full texts. And it is also the largest Chinese news topic dataset. We describe the data collection, annotation and quality evaluation process. The basic statistics of the dataset, comparison with other datasets and benchmark experiments are also presented.

SmartQuery: An Active Learning Framework for Graph Neural Networks through Hybrid Uncertainty Reduction

Graph neural networks have achieved significant success in representation learning. However, the performance gains come at a cost; acquiring comprehensive labeled data for training can be prohibitively expensive. Active learning mitigates this issue by searching the unexplored data space and prioritizing the selection of data to maximize model's performance gain. In this paper, we propose a novel method SMARTQUERY, a framework to learn a graph neural network with very few labeled nodes using a hybrid uncertainty reduction function. This is achieved using two key steps: (a) design a multi-stage active graph learning framework by exploiting diverse explicit graph information and (b) introduce label propagation to efficiently exploit known labels to assess the implicit embedding information. Using a comprehensive set of experiments on three network datasets, we demonstrate the competitive performance of our method against state-of-the-arts on very few labeled data (up to 5 labeled nodes per class).

An Extreme Semi-supervised Framework Based on Transformer for Network Intrusion Detection

Network intrusion detection (NID) aims to detect various network attacks and is an important task for guaranteeing network security. However, existing NID methods usually require a large amount of labeled data for training, which is impractical in many real application scenarios due to the high cost. To address this issue, we proposed an extreme semi-supervised framework based on transformer (ESet) for NID. ESeT first developed a multi-level feature extraction module to learn both packet-level byte encoded features and flow-level frequency domain features to enrich the information for detection. Then, during the semi-supervised learning, ESeT designed the dual-encoding transformer to fuse the extracted features for intrusion detection and introduced the credibility selector to reduce the negative impacts of incorrect pseudo-labeling of unlabeled data. The experiment results show that our model achieves excellent performance (F1-score: 97.60%) with only a small proportion of labeled data (1%) on CIC-IDS2017 and CSE-CIC-IDS2018 datasets.

Heterogeneous Hypergraph Neural Network for Friend Recommendation with Human Mobility

Friend recommendation from human mobility is a vital real-world application of location-based social networks (LBSN). It is necessary to recognize patterns from human mobility to assist friend recommendation because previous works have shown complex relations between them. However, most of previous works either modelled social networks and user trajectories separately, or only used classical simple graph-based methods with an edge linking two nodes that cannot fully model the complex data structure of LBSN. Inspired by the fact that hyperedges can connect multiple nodes of different types, we model user trajectories and check-in records as hyperedges in a novel heterogeneous LBSN hypergraph to represent complex spatio-temporal information. And then, we design a type-specific attention mechanism for an end-to-end trainable heterogeneous hypergraph neural network (HHGNN) with supervised contrastive learning, which can learn hypergraph node embedding for the next friend recommendation task. At last, our model HHGNN outperforms the state-of-the-art methods on four real-world city datasets, while ablation studies also confirm the effectiveness of each model part.

Relation-aware Blocking for Scalable Recommendation Systems

Recommender systems contain rich relation information. The multiple relations in a recommender system form a heterogeneous information network. How to efficiently find similar users and items based on hop-n relations in heterogeneous information networks is one significant challenge to develop scalable recommender systems in the era of big data. Hashing has been popularly used for dimensionality reduction and data size reduction. Current hashing techniques mainly focus on hashing for directly related (i.e. hop-1) features. This paper proposes to develop relation-aware hashing techniques to bridge this gap. The proposed approaches use locality sensitive hashing (LSH) and consider hop-n relations in an information network to construct user or item blocks. They help facilitate efficient neighborhood formation and recommendation making. The experiments conducted on a large-scale real-life dataset show that the proposed approaches are effective.

Invariance Testing and Feature Selection Using Sparse Linear Layers

Machine learning testing and evaluation are largely overlooked by the community. In many cases, the only way to conduct testing is through formula-based scores, e.g., accuracy, f1, etc. However, these simple statistical scores cannot fully represent the performance of the ML model. Therefore, new testing frameworks are attracting more attention. In this work, we propose a novel invariance testing approach that does not utilise traditional statistical scores. Instead, we train a series of sparse linear layers which are more easily to be compared due to their sparsity. We then use different divergence functions to numerically compare them and fuse the difference scores into a visual matrix. Additionally, testing using sparse linear layers allows us to conduct a novel testing oracle: associativity: by comparing merged weights and weights obtained by combined augmentation. We then assess whether a model is invariant by checking the visual matrix, the associativity, and its sparse layers. Finally, we also show that this testing approach can potentially provide an actionable item for feature selection.

JavaScript&Me, A Tool to Support Research into Code Transformation and Browser Security

Doing research into code variations and their applications to browser security is challenging. One of the most important aspects of this research is to choose a relevant dataset on which machine learning algorithms can be applied to yield useful results. Although JavaScript code is widely available on various sources, such as package managers, code hosting platforms, and websites, collecting a large corpus of JavaScript and curating it is not a simple task. We present a novel open-source tool that helps with this task by allowing the automatic and systematic collection, processing, and transformation of JavaScript code. These three steps are performed by independent modules, and each one can be extended to incorporate new features, such as additional code sources, or transformation tools, adding to the flexibility of our tool and expanding its usability. Additionally, we use our tool to create a corpus of around 270k JavaScript files, including regular, minified, and obfuscated code, on which we perform a brief analysis. The conclusions from this analysis show the importance of properly curating a dataset before using it in research tasks, such as machine learning classifiers, reinforcing the relevance of our tool.

Knowledge Distillation via Hypersphere Features Distribution Transfer

Knowledge distillation (KD) is a widely applicable DNN (Deep Neural Network) compression technology, which aims to transfer knowledge from a pretrained teacher neural network to a target student neural network. In practice, an enormous teacher is extracted through the compression of a neural network to train a relatively compact student. In general, current KD approaches mostly minimize divergence between the intermediate layers or logits of the teacher network and student network. However, these methods ignore important features distribution in the teacher network space, which leads to the defect of current KD approaches in the fine-grained categorization task, e.g., metric learning. For this, we propose a novel approach that transfers features distribution in the hyperspherical space from the teacher network to the student network. Specifically, our approach facilitates the student to learn the distribution among samples in the teacher and reduces the intra-class variance. Extensive experimental evaluations on three well-known metric learning datasets show that our method can distill higher-level knowledge from the teacher network and achieve state-of-the-art performance.

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

Learning rate is one of the most important hyper-parameters that has significant influence for neural network training. Learning rate schedules are widely used in real practice to adjust the learning rate according to pre-defined schedules for the fast convergence and good generalization. However, existing learning rate schedules are all heuristic algorithms and lack theoretical support. Therefore, people usually choose the learning rate schedules through multiple ad-hoc trial, and the obtained learning rate schedules are sub-optimal. To boost the performance of the obtained sub-optimal learning rate schedule, we propose a generic learning rate schedule plugin, called LEArning Rate Perturbation (LEAP), which can be applied to various learning rate schedules to improve the model training by introducing a certain perturbation to the learning rate. We found that, with such simple yet effective strategy, training processing exponentially favors flat minima rather than sharp minima with guaranteed convergence, which leads to better generalization ability. In addition, we conduct extensive experiments which show that training with LEAP can improve the performance of various deep learning models on diverse datasets using various learning rate schedules (including constant learning rate).

Efficient Non-sampling Expert Finding

Expert finding aims at seeking potential users to answer new questions in Community Question Answering (CQA) websites. Most existing methods focus on designing matching frameworks between questions and experts, and rely on negative sampling technology for model training. However, sampling would lose lots of useful information about experts and questions, and make these sampling-based methods suffer the bias and non-robust issues, which may lead to an insufficient matching performance for expert findings. In this paper, we propose a novel Efficient Non-sampling Expert Finding model, named ENEF, which could learn accurate representations of questions and experts from whole training data. In our approach, we adopt a rather basic question encoder and a simple matching framework, then an efficient whole-data optimization method is elaborately designed to learn the model parameters without negative sampling with rather a low space and time complexity. Extensive experimental results on four real-world CQA datasets demonstrate that our model ENEF could achieve better performance and faster training efficiency than existing state-of-the-art expert finding methods.

ExpertBert: Pretraining Expert Finding

Expert Finding is an important task in Community Question Answering (CQA) platforms, which could help route questions to potential expertise users to answer. The key is to model the question content and experts based on their historical answered questions accurately. Recently Pretrained Language Models (PLMs, e.g., Bert) have shown superior text modeling ability and have been used in expert finding preliminary. However, most PLMs-based models focus on the corpus or document granularity during pretraining, which is inconsistent with the downstream expert modeling and finding task. In this paper, we propose an expert-level pretraining language model named ExpertBert, aiming to model questions, experts as well as question-expert matching effectively in a pretraining manner. In our approach, we aggregate the historical answered questions of an expert as the expert-specific input.Besides, we integrate the target question into the input and design a label-augmented Masked Language Model (MLM) task to further capture the matching pattern between question and experts, which makes the pretraining objectives that more closely resemble the downstream expert finding task. Experimental results and detailed analysis on real-world CQA datasets demonstrate the effectiveness of our ExpertBert.

Embedding Global and Local Influences for Dynamic Graphs

Graph embedding is becoming increasingly popular due to its ability of representing large-scale graph data by mapping nodes to low-dimensional space. Current research usually focuses on transductive learning, which aims to generates fixed node embeddings by training the whole graph. However, dynamic graph changes constantly with new node additions and interactions. Unlike transductive learning, inductive learning attempts to dynamically generate node embeddings over time even for unseen nodes, which is more suitable for real-world applications. Therefore, we propose an inductive dynamic graph embedding method called AGLI by aggregating <u>g</u>lobal and <u>l</u>ocal <u>i</u>nfluences. We propose an aggregator function that integrates global influence with local influence to generate node embeddings at any time. We conduct extensive experiments on several real-world datasets and compare AGLI with several state-of-the-art baseline methods on various tasks. The experimental results show that AGLI achieves better performance than the state-of-the-art baseline methods.

Memory Augmented Graph Learning Networks for Multivariate Time Series Forecasting

Multivariate time series (MTS) forecasting is a challenging task. In MTS forecasting, We need to consider both intra-series temporal correlations and inter-series spatial correlations simultaneously. However, existing methods capture spatial correlations from the local data of the time series, without taking the global historical information of time series into account. In addition, most methods base on graph neural network mining for the temporal correlations tend to the redundancy of information at adjacent time points in the time-series data, which introduces noise. In this paper, we propose a memory augmented graph learning network (MAGL), which captures the spatial correlations in terms of the global historical features of MTS. Specifically, we use a memory unit to learn from the local data of MTS. The memory unit records the global historical features of the time series, which is used to mine the spatial correlations. We also design a temporal feature distiller to reduce the noise in extracting temporal features. We extensively evaluate our model on four real-world datasets, comparing with several state-of-the-art methods. The experimental results show MAGL outperforms the state-of-the-art baseline methods on several datasets.

MomNet: Gender Prediction using Mechanism of Working Memory

In social media analysis, gender prediction is one of the most important tasks of user profiling. Web users often post messages in a timeline manner to record their living moments. These messages containing texts and images, constitute long multi-modal data that potentially represents the living style, preference, or opinion regarding users. Therefore, it is feasible to predict the gender of a user by utilizing such living moments. However, the rich modalities (time, length, text, and image) of living moments with difficult challenges have not been fully exploited by the research communities for practical applications. To this end, we propose a novel gender prediction framework based on user-posted living Moments MomNet). The MomNet mainly consists of a moment memory module and a central executive module inspired by the two characteristics of working memory theory. One is that humans can associate related information to facilitate memory. Our moment memory module aggregates similar uni-modal moments of a user to form different chunks and encode the chunks into moment memory representations. The other is that humans coordinate information from different modalities to make judgments. Our central executive module is designed to coordinate comprehensive attentions of moment memory representations from texts, images, and their combinations. Finally, a softmax classifier is used to predict gender. Extensive experiments conducted on a real-world public dataset show that our framework achieves 86.63% accuracy and outperforms all state-of-the-art methods in terms of accuracy.

Meta-Reinforcement Learning for Multiple Traffic Signals Control

Despite the success of recent reinforcement learning (RL) in traffic signal control which has shown to outperform the conventional control methods, current RL-based methods require large amounts of samples to learn and lack the generalization ability to a new environment. In order to solve these problems, we propose a new context-based meta-RL model that disentangles task inference and control, which improves the meta-training efficiency and accelerates the learning process in a new environment. Moreover, the Graph Attention Network is employed to achieve effective cooperation between intersections. The experiments show that our method not only improves the traffic control efficiency but also converges faster and performs more stably, compared with traditional, RL-based, and meta-RL-based traffic control methods.

Sampling Enclosing Subgraphs for Link Prediction

Link prediction is a fundamental problem for graph-structured data (e.g., social networks, drug side-effect networks, etc.). Graph neural networks have offered robust solutions for this problem, specifically by learning the representation of the subgraph enclosing the target link (i.e., pair of nodes). However, these solutions do not scale well to large graphs as extraction and operation on enclosing subgraphs are computationally expensive. This paper presents a scalable link prediction solution, that we call ScaLed, which utilizes sparse enclosing subgraphs to make predictions. To extract sparse enclosing subgraphs, ScaLed takes multiple random walks from a target pair of nodes, then operates on the sampled enclosing subgraph induced by all visited nodes. By leveraging the smaller sampled enclosing subgraph, ScaLed can scale to larger graphs with much less overhead while maintaining high accuracy. Through comprehensive experiments, we have shown that ScaLed can produce comparable accuracy to those reported by the existing subgraph representation learning frameworks while being less computationally demanding.

PyKale: Knowledge-Aware Machine Learning from Multiple Sources in Python

PyKale is a Python library for Knowledge-aware machine learning from multiple sources of data to enable/accelerate interdisciplinary research. It embodies green machine learning principles to reduce repetitions/redundancy, reuse existing resources, and recycle learning models across areas. We propose a pipeline-based application programming interface (API) so all machine learning workflows follow a standardized six-step pipeline. PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, particularly multimodal learning and transfer learning. To be more accessible, it separates code and configurations to enable non-programmers to configure systems without coding. PyKale is officially part of the PyTorch ecosystem and includes interdisciplinary examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging:

Scalable Multiple Kernel k-means Clustering

With its simplicity and effectiveness, k-means is immensely popular, but it cannot perform well on complex nonlinear datasets. Multiple kernel k-means (MKKM) demonstrates the ability to describe highly complex nonlinear separable data structures. However, its speed requirement cannot scale as well as the data size grows beyond tens of thousands. Nowadays, digital data explosion mandates more scalable clustering methods to assist the machine learning tasks in easy-to-access form. To address the issue, we propose to employ the Nystrom scheme for MKKM clustering, termed scalable multiple kernel k-means clustering. It significantly reduces the computational complexity by replacing the original kernel matrix with a low-rank approximation. Analytically and empirically, we demonstrate that our method performs as well as existing state-of-the-art methods, but at a significantly lower compute cost, allowing us to scale the method more effectively for clustering tasks.

Self-Paced and Discrete Multiple Kernel k-Means

Multiple Kernel K-means (MKKM) uses various kernels from different sources to improve clustering performance. However, most of the existing models are non-convex, which is prone to be stuck into bad local optimum, especially with noise and outliers. To address the issue, we propose a novel Self-Paced and Discrete Multiple Kernel K-Means (SPD-MKKM). It learns the MKKM model in a meaningful order by progressing both samples and kernels from easy to complex, which is beneficial to avoid bad local optimum. In addition, whereas existing methods optimize in two stages: learning the relaxation matrix and then finding the discrete one by extra discretization, our work can directly gain the discrete cluster indicator matrix without extra process. What's more, a well-designed alternative optimization is employed to reduce the overall computational complexity via using the coordinate descent technique. Finally, thorough experiments performed on real-world datasets illustrated the excellence and efficacy of our method.

Personalized Federated Recommendation via Joint Representation Learning, User Clustering, and Model Adaptation

Federated recommendation applies federated learning techniques in recommendation systems to help protect user privacy by exchanging models instead of raw user data between user devices and the central server. Due to the heterogeneity in user's attributes and local data, attaining personalized models is critical to help improve the federated recommendation performance. In this paper, we propose a Graph Neural Network based Personalized Federated Recommendation (PerFedRec) framework via joint representation learning, user clustering, and model adaptation. Specifically, we construct a collaborative graph and incorporate attribute information to jointly learn the representation through a federated GNN. Based on these learned representations, we cluster users into different user groups and learn personalized models for each cluster. Then each user learns a personalized model by combining the global federated model, the cluster-level federated model, and the user's fine-tuned local model. To alleviate the heavy communication burden, we intelligently select a few representative users (instead of randomly picked users) from each cluster to participate in training. Experiments on real-world datasets show that our proposed method achieves superior performance over existing methods.

Urban Region Profiling via Multi-Graph Representation Learning

Profiling urban regions is essential for urban analytics and planning. Although existing studies have made great efforts to learn urban region representation from multi-source urban data, there are still limitations on modelling local-level signals, developing an effective yet integrated fusion framework, and performing well in regions with high variance socioeconomic attributes. Thus, we propose a multi-graph representation learning framework, called Region2Vec, for urban region profiling. Specifically, except that human mobility is encoded for inter-region relations, geographic neighborhood is introduced for capturing geographical contextual information while POI side information is adopted for representing intra-region information. Then, graphs are used to capture accessibility, vicinity, and functionality correlations among regions. An encoder-decoder multi-graph fusion module is further proposed to jointly learn comprehensive representations. Experiments on real-world datasets show that Region2Vec can be employed in three applications and outperforms all state-of-the-art baselines. Particularly, Region2Vec has better performance than previous studies in regions with high variance socioeconomic attributes.

See Clicks Differently: Modeling User Clicking Alternatively with Multi Classifiers for CTR Prediction

Many recommender systems optimize click through rates (CTRs) as one of their core goals, and it further breaks down to predicting each item's click probability for a user (user-item click probability) and recommending the top ones to this particular user. User-item click probability is then estimated as a single term, and the basic assumption is that the user has different preferences over items. This is presumably true, but from real-world data, we observe that some people are naturally more active in clicking on items while some are not. This intrinsic tendency contributes to their user-item click probabilities. Besides this, when a user sees a particular item she likes, the click probability for this item increases due to this user-item preference.

Therefore, instead of estimating the user-item click probability directly, we break it down into two finer attributes: user's intrinsic tendency of clicking and user-item preference. Inspired by studies that emphasize item features for overall enhancements and research progress in multi-task learning, we for the first time design a Multi Classifier Click Rate prediction model (MultiCR) to better exploit item-level information by building a separate classifier for each item. Furthermore, in addition to utilizing static user features, we learn implicit connections between user's item preferences and the often-overlooked indirect user behaviors (e.g., click histories from other services within the app). In a common new-campaign/new-service scenario, MultiCR outperforms various baselines in large-scale offline and online experiments and demonstrates good resilience when the amount of training data decreases.

A Prerequisite Attention Model for Knowledge Proficiency Diagnosis of Students

With the rapid development of intelligent education platforms, how to enhance the performance of diagnosing students' knowledge proficiency has become an important issue, e.g., by incorporating the prerequisite relation of knowledge concepts. Unfortunately, the differentiated influence from different predecessor concepts to successor concepts is still underexplored in existing approaches. To this end, we propose a Prerequisite Attention model for Knowledge Proficiency diagnosis of students (PAKP) to learn the attentive weights of precursor concepts on successor concepts and model it for inferring the knowledge proficiency. Specifically, given the student response records and knowledge prerequisite graph, we design an embedding layer to output the representations of students, exercises, and concepts. Influence coefficient among concepts is calculated via an efficient attention mechanism in a fusion layer. Finally, the performance of each student is predicted based on the mined student and exercise factors. Extensive experiments on real-data sets demonstrate that PAKP exhibits great efficiency and interpretability advantages without accuracy loss.

Curriculum Contrastive Learning for Fake News Detection

Due to the rapid spread of fake news on social media, society and economy have been negatively affected in many ways. How to effectively identify fake news is a challenging problem that has received great attention from academic and industry. Existing deep learning methods for fake news detection require a large amount of labeled data to train the model, but obtaining labeled data is a time-consuming and labor-intensive process. To extract useful information from a large amount of unlabeled data, some contrastive learning methods for fake news detection are proposed. However, existing contrastive learning methods only randomly sample negative samples at different training stages, resulting in the role of negative samples not being fully played. Intuitively, increasing the contrastive difficulty of negative samples gradually in a way similar to human learning will contribute to improve the performance of the model. Inspired by the idea of curriculum learning, we propose a curriculum contrastive model (CCFD) for fake news detection which automatically select and train negative samples with different difficulty at different training stages. Furthermore, we also propose three new augmentation methods which consider the importance of edges and node attributes in the propagation structure to obtain more effective positive samples. The experimental results on three public datasets show that our model CCFD outperforms the existing state-of-the-art models for fake news detection.

A Contrastive Pre-training Approach to Discriminative Autoencoder for Dense Retrieval

Dense retrieval (DR) has shown promising results in information retrieval. In essence, DR requires high-quality text representations to support effective search in the representation space. Recent studies have shown that pre-trained autoencoder-based language models with a weak decoder can provide high-quality text representations, boosting the effectiveness and few-shot ability of DR models. However, even a weak autoregressive decoder has the bypass effect on the encoder. More importantly, the discriminative ability of learned representations may be limited since each token is treated equally important in decoding the input texts. To address the above problems, in this paper, we propose a contrastive pre-training approach to learn a discriminative autoencoder with a lightweight multi-layer perception (MLP) decoder. The basic idea is to generate word distributions of input text in a non-autoregressive fashion and pull the word distributions of two masked versions of one text close while pushing away from others. We theoretically show that our contrastive strategy can suppress the common words and highlight the representative words in decoding, leading to discriminative representations. Empirical results show that our method can significantly outperform the state-of-the-art autoencoder-based language models and other pre-trained models for dense retrieval.

Robustness of Sketched Linear Classifiers to Adversarial Attacks

Linear classifiers are well-known to be vulnerable to adversarial attacks: they may predict incorrect labels for input data that are adversarially modified with small perturbations. However, this phenomenon has not been properly understood in the context of sketch-based linear classifiers, typically used in memory-constrained paradigms, which rely on random projections of the features for model compression. In this paper, we propose novel Fast-Gradient-Sign Method (FGSM) attacks for sketched classifiers in full, partial, and black-box information settings with regards to their internal parameters. We perform extensive experiments on the MNIST dataset to characterize their robustness as a function of perturbation budget. Our results suggest that, in the full-information setting, these classifiers are less accurate on unaltered input than their uncompressed counterparts but just as susceptible to adversarial attacks. But in more realistic partial and black-box information settings, sketching improves robustness while having lower memory footprint.

Locality Aware Temporal FMs for Crime Prediction

Crime forecasting techniques can play a leading role in hindering crime occurrences, especially in areas under possible threat. In this paper, we propose Locality Aware Temporal Factorization Machines (LTFMs) for crime prediction. Its locality representation module deploys a spatial encoder to estimate the regional dependencies using Graph Convolutional Networks (GCNs). Then, the Point of Interest (POI) encoder computes the weighted attentive aggregation of location, crime, and POI latent representations. The dynamic crime representation module utilizes the transformer-based positional encodings to capture the dependencies among space, time, and crime categories. The encodings learnt from locality representation and crime category encoders, are projected into a factorization machine-based architecture via a shared feed-forward network. An extensive comparison with state-of-art techniques, using Chicago and New York's criminal records, shows the significance of LTFMs.

Contextualized Formula Search Using Math Abstract Meaning Representation

In math formula search, relevance is determined not only by the similarity of formulas in isolation, but also by their surrounding context. We introduce MathAMR, a new unified representation for sentences containing math. MathAMR generalizes Abstract Meaning Representation (AMR) graphs to include math formula operations and arguments. We then use Sentence-BERT to embed linearized MathAMR graphs for use in formula retrieval. In our first experiment, we compare MathAMR against raw text using the same formula representation (Operator Trees), and find that MathAMR produces more effective rankings. We then apply our MathAMR embeddings to reranking runs from the ARQMath-2 formula retrieval task, where in most cases effectiveness measures are improved. The strongest reranked run matches the best P$'$@10 for an original run, and exceeds the original runs in nDCG$'$@10.

Not All Neighbors are Friendly: Learning to Choose Hop Features to Improve Node Classification

The fundamental operation of Graph Neural Networks (GNNs) is the feature aggregation step performed over neighbors of the node based on the structure of the graph. In addition to its own features, the node gets additional combined features from its neighbors for each hop. These aggregated features help define the similarity or dissimilarity of the nodes with respect to the labels and are useful for tasks like node classification. However, in real-world data, features of neighbors at different hops may not correlate with the node's features. Thus, any indiscriminate feature aggregation by GNN might cause the addition of noisy features leading to degradation in model's performance. In this work, we show that selective aggregation leads to better performance than default aggregation on the node classification task. Furthermore, we propose Dual-Net GNN architecture with a classifier model and a selector model. The classifier model trains over a subset of input node features to predict node labels while the selector model learns to provide optimal input subset to the classifier for best performance. These two models are trained jointly to learn the best subset of features that give higher accuracy in node label predictions. With extensive experiments, we show that our proposed model outperforms the state-of-the-art GNN models with remarkable improvements up to 27.8%.

Music4All-Onion -- A Large-Scale Multi-faceted Content-Centric Music Recommendation Dataset

When we appreciate a piece of music, it is most naturally because of its content, including rhythmic, tonal, and timbral elements as well as its lyrics and semantics. This suggests that the human affinity for music is inherently content-driven. This kind of information is, however, still frequently neglected by mainstream recommendation models based on collaborative filtering that rely solely on user-item interactions to recommend items to users. A major reason for this neglect is the lack of standardized datasets that provide both collaborative and content information. The work at hand addresses this shortcoming by introducing Music4All-Onion, a large-scale, multi-modal music dataset. The dataset expands the Music4All dataset by including 26 additional audio, video, and metadata characteristics for 109,269 music pieces. In addition, it provides a set of 252,984,396 listening records of 119,140 users, extracted from the online music platform, which allows leveraging user-item interactions as well. We organize distinct item content features in an onion model according to their semantics, and perform a comprehensive examination of the impact of different layers of this model (e.g., audio features, user-generated content, and derivative content) on content-driven music recommendation, demonstrating how various content features influence accuracy, novelty, and fairness of music recommendation systems. In summary, with Music4All-Onion, we seek to bridge the gap between collaborative filtering music recommender systems and content-centric music recommendation requirements.

Towards Confidence-aware Calibrated Recommendation

Recommender systems utilize users' historical data to learn and predict their future interests, providing them with suggestions tailored to their tastes. Calibration ensures that the distribution of recommended item categories is consistent with the user's historical data. Mitigating miscalibration brings various benefits to a recommender system. For example, it becomes less likely that a system overlooks categories with less interaction on a user's profile by only recommending popular categories. Despite the notable success, calibration methods have several drawbacks, such as limiting the diversity of the recommended items and not considering the calibration confidence. This work, presents a set of properties that address various aspects of a desired calibrated recommender system. Considering these properties, we propose a confidence-aware optimization-based re-ranking algorithm to find the balance between calibration, relevance, and item diversity, while simultaneously accounting for calibration confidence based on user profile size. Our model outperforms state-of-the-art methods in terms of various accuracy and beyond-accuracy metrics for different user groups.

Expressions Causing Differences in Emotion Recognition in Social Networking Service Documents

It is often difficult to correctly infer a writer's emotion from text exchanged online, and differences in recognition between writers and readers can be problematic. In this paper, we propose a new framework for detecting sentences that create differences in emotion recognition between the writer and the reader and for detecting the kinds of expressions that cause such differences. The proposed framework consists of a bidirectional encoder representations from transformers (BERT)-based detector that detects sentences causing differences in emotion recognition and an analysis that acquires expressions that characteristically appear in such sentences. The detector, based on a Japanese SNS-document dataset with emotion labels annotated by both the writer and three readers of the social networking service (SNS) documents, detected "hidden-anger sentences" with AUC = 0.772; these sentences gave rise to differences in the recognition of anger. Because SNS documents contain many sentences whose meaning is extremely difficult to interpret, by analyzing the sentences detected by this detector, we obtained several expressions that appear characteristically in hidden-anger sentences. The detected sentences and expressions do not convey anger explicitly, and it is difficult to infer the writer's anger, but if the implicit anger is pointed out, it becomes possible to guess why the writer is angry. Put into practical use, this framework would likely have the ability to mitigate problems based on misunderstandings.

Locality Sensitive Hashing with Temporal and Spatial Constraints for Efficient Population Record Linkage

Record linkage is the process of identifying which records within or across databases refer to the same entity. Min-hash based Locality Sensitive Hashing (LSH) is commonly used in record linkage as a blocking technique to reduce the number of records to be compared. However, when applied on large databases, min-hash LSH can yield highly skewed block size distributions and many redundant record pair comparisons, where only few of those correspond to true matches (records that refer to the same entity). Furthermore, min-hash LSH is highly parameter sensitive and requires trial and error to determine the optimal trade-off between blocking quality and efficiency of the record pair comparison step. In this paper, we present a novel method to improve the scalability and robustness of min-hash LSH for linking large population databases by exploiting temporal and spatial information available in personal data, and by filtering record pairs based on block sizes and min-hash similarity. Our evaluation on three real-world data sets shows that our method can improve the efficiency of record pair comparison by 75% to 99%, whereas the final average linkage precision can be improved by 28% at the cost of a reduction in the average recall by 4%.

ReFine: Re-randomization before Fine-tuning for Cross-domain Few-shot Learning

Cross-domain few-shot learning (CD-FSL), where there are few target samples under extreme differences between source and target domains, has recently attracted huge attention. Recent studies on CD-FSL generally focus on transfer learning based approaches, where a neural network is pre-trained on popular labeled source domain datasets and then transferred to target domain data. Although the labeled datasets may provide suitable initial parameters for the target data, the domain difference between the source and target might hinder fine-tuning on the target domain. This paper proposes a simple yet powerful method that re-randomizes the parameters fitted on the source domain before adapting to the target data. The re-randomization resets source-specific parameters of the source pre-trained model and thus facilitates fine-tuning on the target domain, improving few-shot performance.

Implicit Session Contexts for Next-Item Recommendations

\noindent Session-based recommender systems capture the short-term interest of a user within a session. Session contexts (i.e., a user's high-level interests or intents within a session) are not explicitly given in most datasets, and implicitly inferring session context as an aggregation of item-level attributes is crude. In this paper, we propose \method, which implicitly contextualizes sessions. \method first generates implicit contexts for sessions by creating a session-item graph, learning graph embeddings, and clustering to assign sessions to contexts. \method then trains a session context predictor and uses the predicted contexts' embeddings to enhance the next-item prediction accuracy. Experiments on four datasets show that \method has superior next-item prediction accuracy than state-of-the-art models. A case study of \method on the Reddit dataset confirms that assigned session contexts are unique and meaningful.

Cross-domain Prototype Learning from Contaminated Faces via Disentangling Latent Factors

This paper focuses on an emerging challenging problem called heterogeneous prototype learning (HPL) across face domains-It aims to learn the variation-free target domain prototype for a contaminated input image from the source domain and meanwhile preserve the personal identity. HPL involves two coupled subproblems, i.e., domain transfer and prototype learning. To address the two subproblems in a unified manner, we advocate disentangling the prototype and domain factors in their respected latent feature spaces, and replace the latent source domain features with the target domain ones to generate the heterogeneous prototype. To this end, we propose a disentangled heterogeneous prototype learning framework, dubbed DisHPL, which consists of one encoder-decoder generator and two discriminators. The generator and discriminators play adversarial games such that the generator learns to embed the contaminated image into a prototype feature space only capturing identity information and a domain-specific feature space, as well as generating a realistic-looking heterogeneous prototype. The two discriminators aim to predict personal identities and distinguish between real prototypes versus fake generated prototypes in the source/target domain. Experiments on various heterogeneous face datasets validate the effectiveness of DisHPL.

GradAlign+: Empowering Gradual Network Alignment Using Attribute Augmentation

Network alignment (NA) is the task of discovering node correspondences across different networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their satisfactory performance is not without prior anchor link information and/or node attributes, which may not always be available. In this paper, we propose Grad-Align+, a novel NA method using node attribute augmentation that is quite robust to the absence of such additional information. Grad-Align+ is built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers only a part of node pairs until all node pairs are found. Specifically, Grad-Align+ is composed of the following key components: 1) augmenting node attributes based on nodes' centrality measures, 2) calculating an embedding similarity matrix extracted from a graph neural network into which the augmented node attributes are fed, and 3) gradually discovering node pairs by calculating similarities between cross-network nodes with respect to the aligned cross-network neighbor-pair. Experimental results demonstrate that Grad-Align+ exhibits (a) superiority over benchmark NA methods, (b) empirical validation of our theoretical findings, and (c) the effectiveness of our attribute augmentation module.

Improving Graph-based Document-Level Relation Extraction Model with Novel Graph Structure

Document-level relation extraction is a natural language processing task for extracting relations among entities in a document. Compared with sentence-level relation extraction, there are more challenges to document-level relation extraction. To acquire mutual information among entities in a document, recent studies have designed mention-level graphs or improved pretrained language models based on co-occurrence or coreference information. However, these methods cannot utilize the anaphoric information of pronouns, which play an important role in document-level relation extraction. In addition, there is a possibility of losing lexical information of the relations among entities directly expressed in a sentence. To address this issue, we propose two novel graph structures: an anaphoric graph and a local-context graph. The proposed method outperforms the existing graph-based relation extraction method when applying the document-level relation extraction dataset, DocRED., an Improved Dataset for Visualization Recommendation

Visualization recommendation is a novel and challenging field of study, whose aim is to provide non-expert users with automatic tools for insight discovery from data. Advances in this research area are hindered by the absence of reliable datasets on which to train the recommender systems. To the best of our knowledge, Plotly corpus is the only publicly available dataset, but as complained by many authors and discussed in this article, it contains many labeling errors, which greatly limits its usefulness. We release an improved version of the original dataset, named, which we obtained through an automated procedure with minimal post-editing. In addition to a manual validation by a group of data science students, we demonstrate that when training two state-of-the-art abstract image classifiers on, systems' performance improves more than twice as much as when the original dataset is used, showing that facilitates the discovery of significant perceptual patterns.

GRETEL: Graph Counterfactual Explanation Evaluation Framework

Machine Learning (ML) systems are a building part of the modern tools which impact our daily life in several application domains. Due to their black-box nature, those systems are hardly adopted in application domains (e.g. health, finance) where understanding the decision process is of paramount importance. Explanation methods were developed to explain how the ML model has taken a specific decision for a given case/instance. Graph Counterfactual Explanations (GCE) is one of the explanation techniques adopted in the Graph Learning domain. The existing works on Graph Counterfactual Explanations diverge mostly in the problem definition, application domain, test data, and evaluation metrics, and most existing works do not compare exhaustively against other counterfactual explanation techniques present in the literature. We present GRETEL, a unified framework to develop and test GCE methods in several settings. GRETEL is a highly extensible evaluation framework which promotes Open Science and the reproducibility of the evaluation by providing a set of well-defined mechanisms to integrate and manage easily: both real and synthetic datasets, ML models, state-of-the-art explanation techniques, and evaluation measures. Lastly, we also show the experiments conducted to integrate and test several existing scenarios (datasets, measures, explainers).

CLNews: The First Dataset of the Chilean Social Outbreak for Disinformation Analysis

Disinformation is one of the main threats that loom on social networks. Detecting disinformation is not trivial and requires training and maintaining fact-checking teams, which is labor-intensive. Recent studies show that the propagation structure of claims and user messages allows a better understanding of rumor dynamics. Despite these findings, the availability of verified claims and structural propagation data is low. This paper presents a new dataset with Twitter claims verified by fact-checkers along with the propagation structure of retweets and replies. The dataset contains verified claims checked during the Chilean social outbreak, which allows for studying the phenomenon of disinformation during this crisis. We study propagation patterns of verified content in CLNews, showing differences between false rumors and other types of content. Our results show that false rumors are more persistent than the rest of verified contents, reaching more people than truthful news and presenting low barriers of readability to users. The dataset is fully available and helps understand the phenomenon of disinformation during social crises being one of the first of its kind to be released.

Do Graph Neural Networks Build Fair User Models? Assessing Disparate Impact and Mistreatment in Behavioural User Profiling

Recent approaches to behavioural user profiling employ Graph Neural Networks (GNNs) to turn users' interactions with a platform into actionable knowledge. The effectiveness of an approach is usually assessed with accuracy-based perspectives, where the capability to predict user features (such as gender or age) is evaluated. In this work, we perform a beyond-accuracy analysis of the state-of-the-art approaches to assess the presence of disparate impact and disparate mistreatment, meaning that users characterised by a given sensitive feature are unintentionally, but systematically, classified worse than their counterparts. Our analysis on two real-world datasets shows that different user profiling paradigms can impact fairness results. The source code and the preprocessed datasets are available at:

FwSeqBlock: A Field-wise Approach for Modeling Behavior Representation in Sequential Recommendation

Modeling users' historical behaviors is an essential task in many industrial recommender systems. The user interest representation, in previous works, is obtained through the following paradigm: concrete behaviors are firstly embedded as low-dimensional behavior representations, which are then aggregated conditioning on the target item for final user interest representation. Most existing researches focus on the aggregation process that explores the intrinsic structure of the behavior sequences. However, the quality of behavior representation is largely ignored. In this paper, we present a pluggable module, FwSeqBlock, to enhance the expressiveness of behavior representations. Specifically, FwSeqBlock introduces the multiplicative operation among users' historical behaviors and the target item, where a field memory unit is designed to dynamically identify the dominant features from the behavior sequence and filter out the noise. Extensive experiments validate that FwSeqBlock consistently generates higher-quality user representations compared with competitive methods. Besides, online A/B testing reports a 4.46% improvement in Click-Through Rate (CTR), confirming the effectiveness of the proposed method.

Robust Semi-supervised Domain Adaptation against Noisy Labels

Built upon clean/correct labels, semi-supervised domain adaptation (SSDA) is a well-explored task, which, however, may not be easily obtained. This paper considers a challenging but practical scenario, i.e., the noisy SSDA with polluted labels. Specifically, it is observed that abnormal samples appear to have more randomness and inconsistency among the various views. To this end, we have devised an anomaly score function to detect noisy samples based on the similarity of differently augmented instances. The noisy labeled target samples are re-weighted according to such anomaly scores where the abnormal data contribute less to model training. Moreover, pseudo labeling usually suffers from confirmation bias. To remedy it, we have introduced the adversarial disturbance to raise the divergence across differently augmented views. The experimental results on the contaminated SSDA benchmarks demonstrate the effectiveness of our method over the baselines in both robustness and accuracy.

Explainable Graph-based Fraud Detection via Neural Meta-graph Search

Though graph neural networks (GNNs)-based fraud detectors have received remarkable success in identifying fraudulent activities, few of them pay equal attention to models' performance and explainability. In this paper, we attempt to achieve high performance for graph-based fraud detection while considering model explainability. We propose NGS (Neural meta-Graph Search), in which the message passing process of a GNN is formalized as a meta-graph, and a differentiable neural architecture search is devised to determine the optimized message passing graph structure. We further enhance the model by aggregating multiple searched meta-graphs to make the final prediction. Experimental results on two real-world datasets demonstrate that NGS outperforms state-of-the-art baselines. In addition, the searched meta-graphs concisely describe the information used for prediction and produce reasonable explanations.

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level covariates. This strategy may not be optimal for complex large-scale problems, where indirect relations often exist among test-level covariates and auxiliary metrics or covariates. We incorporate auxiliary covariates among test-level covariates in a deep Black-Box framework (named as NeurT-FDR) which boosts statistical power and controls FDR for multiple hypothesis testing. Our method parametrizes the test-level covariates as a neural network and adjusts the auxiliary covariates through a regression framework, which enables flexible handling of high-dimensional features as well as efficient end-to-end optimization. We show that NeurT-FDR makes substantially more discoveries in three real datasets compared to competitive baselines.

Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search

BERT-style models pre-trained on the general corpus (e.g., Wikipedia) and fine-tuned on specific task corpus, have recently emerged as breakthrough techniques in many NLP tasks: question answering, text classification, sequence labeling and so on. However, this tech- nique may not always work, especially for two scenarios: a corpus that contains very different text from the general corpus Wikipedia, or a task that learns embedding spacial distribution for a specific purpose (e.g., approximate nearest neighbor search). In this paper, to tackle the above two scenarios that we have encountered in an industrial e-commerce search system, we propose customized and novel pre-training tasks for two critical modules: user intent detec- tion and semantic embedding retrieval. The customized pre-trained models after fine-tuning, being less than 10% of BERT-base's size in order to be feasible for cost-efficient CPU serving, significantly improve the other baseline models: 1) no pre-training model and 2) fine-tuned model from the official pre-trained BERT using general corpus, on both offline datasets and online system. We have open sourced our datasets 1 for the sake of reproducibility and future works.

SCC - A Test Collection for Search in Chat Conversations

We present SCC, a test collection for evaluating search in chat conversations. Chat applications such as Slack, WhatsApp and Wechat have become popular communication methods. Typical search requirements in these applications revolve around the task of known item retrieval, i.e. find information that the user has previously experienced in their chats. However, the search capabilities of these chat applications are often very basic. Our collection aims to support new research into building effective methods for chat conversations search. We do so by building a collection with 114 known item retrieval topics for searching over 437,893 Slack chat messages. An important aspect when searching through conversations is the unit of indexing (indexing granularity), e.g., it being a single message vs. an entire conversation. To support researchers to investigate this aspect and its influence on retrieval effectiveness, the collection has been processed with conversation disentanglement methods: these mark cohesive segments in which each conversation consists of messages whose senders interact with each other regarding a specific event or topic. This results in a total of 38,955 multi-participant conversations being contained in the collection. Finally, we also provide a set of baselines with related empirical evaluation, including traditional bag-of-words methods and zero-shot neural methods, at both indexing granularity levels.

A Model-Centric Explainer for Graph Neural Network based Node Classification

Graph Neural Networks (GNNs) learn node representations by aggregating a node's feature vector with its neighbors. They perform well across a variety of graph tasks. However, to enhance the reliability and trustworthiness of these models during use in critical scenarios, it is of essence to look into the decision making mechanisms of these models rather than treating them as black boxes. Our model-centric method gives insight into the kind of information learnt by GNNs about node neighborhoods during the task of node classification. We propose a neighborhood generator as an explainer that generates optimal neighborhoods to maximize a particular class prediction of the trained GNN model. We formulate neighborhood generation as a reinforcement learning problem and use a policy gradient method to train our generator using feedback from the trained GNN-based node classifier. Our method provides intelligible explanations of learning mechanisms of GNN models on synthetic as well as real-world datasets and even highlights certain shortcomings of these models.

Cost-constrained Minimal Steiner Tree Enumeration

The Steiner tree enumeration problem is a well-known problem that asks for enumerating Steiner trees. Although numerous theoretical works proposed algorithms for the problem and analyzed their complexity, there are no practical algorithms and empirical studies. In this paper, we first study the Steiner tree enumeration problem practically. First, we define a practical problem cost-constrained minimal Steiner tree enumeration problem, which enumerates minimal Steiner trees with costs not larger than a given threshold. Second, to address the problem, we propose a binary decision diagram (BDD)-based algorithm. The BDD-based algorithm constructs a BDD that compactly represents the set of minimal Steiner trees and then traverses the BDD for enumeration. We develop a novel frontier-based algorithm to construct such BDDs efficiently. Furthermore, we extend our algorithm to be scalable for large-scale graphs by preprocessing the given graph and controlling the number of generated Steiner trees to reduce memory and computation costs. We validate that our algorithm can efficiently enumerate minimal Steiner trees in real-world graphs.

Twin Papers: A Simple Framework of Causal Inference for Citations via Coupling

The research process includes many decisions, e.g., how to entitle and where to publish the paper. In this paper, we introduce a general framework for investigating the effects of such decisions. The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality. The key insight of our framework is inspired by the existing counterfactual analysis using twins, where the researchers regard twins as counterfactual units. The proposed framework regards a pair of papers that cite each other as twins. Such papers tend to be parallel works, on similar topics, and in similar communities. We investigate twin papers that adopted different decisions, observe the progress of the research impact brought by these studies, and estimate the effect of decisions by the difference in the impacts of these studies. We release our code and data, which we believe are highly beneficial owing to the scarcity of the dataset on counterfactual studies.

Measuring and Comparing the Consistency of IR Models for Query Pairs with Similar and Different Information Needs

A widespread use of supervised ranking models has necessitated an investigation on how consistent their outputs align with user expectations. While a match between the user expectations and system outputs can be sought at different levels of granularity, we study this alignment for search intent transformation across a pair of queries. Specifically, we propose a consistency metric, which for a given pair of queries - one reformulated from the other with at least one term in common, measures if the change in the set of the top-retrieved documents induced by this reformulation is as per a user's expectation. Our experiments led to a number of observations, such as DRMM (an early interaction based IR model) exhibits better alignment with set-level user expectations, whereas transformer-based neural models (e.g., MonoBERT) agree more consistently with the content and rank-based expectations of overlap.

Spatial-Temporal Identity: A Simple yet Effective Baseline for Multivariate Time Series Forecasting

Multivariate Time Series (MTS) forecasting plays a vital role in a wide range of applications. Recently, Spatial-Temporal Graph Neural Networks (STGNNs) have become increasingly popular MTS forecasting methods due to their state-of-the-art performance. However, recent works are becoming more sophisticated with limited performance improvements. This phenomenon motivates us to explore the critical factors of MTS forecasting and design a model that is as powerful as STGNNs, but more concise and efficient. In this paper, we identify the indistinguishability of samples in both spatial and temporal dimensions as a key bottleneck, and propose a simple yet effective baseline for MTS forecasting by attaching <u>S</u>patial and <u>T</u>emporal <u>ID</u>entity information (STID), which achieves the best performance and efficiency simultaneously based on simple Multi-Layer Perceptrons (MLPs). These results suggest that we can design efficient and effective models as long as they solve the indistinguishability of samples, without being limited to STGNNs.

A Graph-based Spatiotemporal Model for Energy Markets

Energy markets enable matching supply and demand through inter- and intra-region electricity trading. Due to the interconnected nature of the energy markets, the supply-demand constraints in one region can impact prices in another connected region. To incorporate these spatiotemporal relationships, we propose a novel graph neural network architecture incorporating multidimensional time-series features to forecast price (node attribute) and energy flow (edge attribute) between regions simultaneously. To the best of our knowledge, this paper is the first attempt to combine node and edge level forecasting in energy markets. We show that our proposed approach has a mean absolute prediction percentage error of 12.8%, which significantly beats the state-of-the-art baseline techniques.

Early Stage Sparse Retrieval with Entity Linking

Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-based dense retrievers introduce significant improvements in information retrieval tasks by exploiting low-dimensional contextualized representations of the corpus. While dense retrievers are known for their relative effectiveness, they suffer from lower efficiency and lack of generalization issues, when compared to sparse retrievers. For a lightweight retrieval task, high computational resources and time consumption are major barriers encouraging the renunciation of dense models despite potential gains. In this work, we propose boosting the performance of sparse retrievers by expanding both the queries and the documents with linked entities in two formats for the entity names: 1) explicit and 2) hashed. We employ a zero-shot end-to-end dense entity linking system for entity recognition and disambiguation to augment the corpus. By leveraging the advanced entity linking methods, we believe that the effectiveness gap between sparse and dense retrievers can be narrowed. We conduct our experiments on the MS MARCO passage dataset. Since we are concerned with the early stage retrieval in cascaded ranking architectures of large information retrieval systems, we evaluate our results using recall@1000. Our approach is also capable of retrieving documents for query subsets judged to be particularly difficult in prior work. We further demonstrate that the non-expanded and the expanded runs with both explicit and hashed entities retrieve complementary results. Consequently, we adopt a run fusion approach to maximize the benefits of entity linking.

PubMed Author-assigned Keyword Extraction (PubMedAKE) Benchmark

With the ever-increasing abundance of biomedical articles, improving the accuracy of keyword search results becomes crucial for ensuring reproducible research. However, keyword extraction for biomedical articles is hard due to the existence of obscure keywords and the lack of a comprehensive benchmark. PubMedAKE is an author-assigned keyword extraction dataset that contains the title, abstract, and keywords of over 843,269 articles from the PubMed open access subset database. This dataset, publicly available on Zenodo, is the largest keyword extraction benchmark with sufficient samples to train neural networks. Experimental results using state-of-the-art baseline methods illustrate the need for developing automatic keyword extraction methods for biomedical literature.

CStory: A Chinese Large-scale News Storyline Dataset

In today's massive news streams, storylines can help us discover related event pairs and understand the evolution of hot events. Hence many efforts have been devoted to automatically constructing news storylines. However, the development of these methods is strongly limited by the size and quality of existing storyline datasets since news storylines are expensive to annotate as they contain a myriad of unlabeled relationships growing quadratically with the number of news events. Working around these difficulties, we propose a sophisticated pre-processing method to filter candidate news pairs by entity co-occurrence and semantic similarity. With the filter reducing annotation overhead, we construct CStory, a large-scale Chinese news storyline dataset, which contains 11,978 news articles, 112,549 manually labeled storyline relation pairs, and 49,832 evidence sentences for annotation judgment. We conduct extensive experiments on CStory using various algorithms and find that constructing news storylines is challenging even for pre-trained language models. Empirical analysis shows that the sample unbalance issue significantly influences model performance, which shall be the focus of future works. Our dataset is now publicly available at

Multi-task Generative Adversarial Network for Missing Mobility Data Imputation

Mobility data collected from location-based social networks are imperative for user movement behaviour analysis and marketing strategy customization. However, due to personal privacy and temporary failure of GPS devices, mobility data suffer from missing data issues. The missing mobility data hide beneficial information that can lead to distorted data analysis. To this end, we propose a multi-task generative adversarial network, termed as MDI-MG, to mitigate the negative impact of missing mobility data by imputing possible missing records. Specifically, in MDI-MG, we first introduce region-awareness modelling to fully capture sequential dependencies. Then, the generator is designed as a multi-task network, which unifies two highly pertinent tasks, including the primary task and the auxiliary missing POI region imputation task. The joint training on the two tasks enhances presentation capabilities and brings additional benefits. Besides, we adopt a discriminator to evaluate the generated sequences. The generator and the discriminator are optimized with a minimax two-player game. Experiments on two real-world datasets show that, MDI-MG achieves better performance in terms of both imputation accuracy and effectiveness, compared with state-of-the-art methods.

On the Impact of Speech Recognition Errors in Passage Retrieval for Spoken Question Answering

Interacting with a speech interface to query a Question Answering (QA) system is becoming increasingly popular. Typically, QA systems rely on passage retrieval to select candidate contexts and reading comprehension to extract the final answer. While there has been some attention to improving the reading comprehension part of QA systems against errors that automatic speech recognition (ASR) models introduce, the passage retrieval part remains unexplored. However, such errors can affect the performance of passage retrieval, leading to inferior end-to-end performance. To address this gap, we augment two existing large-scale passage ranking and open domain QA datasets with synthetic ASR noise and study the robustness of lexical and dense retrievers against questions with ASR noise. Furthermore, we study the generalizability of data augmentation techniques across different domains; with each domain being a different language dialect or accent. Finally, we create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.

Data Oversampling with Structure Preserving Variational Learning

Traditional oversampling methods are well explored for binary and multi-class imbalanced datasets. In most cases, the data space is adapted for oversampling the imbalanced classes. It leads to various issues like poor modelling of the structure of the data, resulting in data overlapping between minority and majority classes that lead to poor classification performance of minority class(es). To overcome these limitations, we propose a novel data oversampling architecture called Structure Preserving Variational Learning (SPVL). This technique captures an uncorrelated distribution among classes in the latent space using an encoder-decoder framework. Hence, minority samples are generated in the latent space, preserving the structure of the data distribution. The improved latent space distribution (oversampled training data) is evaluated by training an MLP classifier and testing with unseen test dataset. The proposed SPVL method is applied to various benchmark datasets with i) binary and multi-class imbalance data, ii) high-dimensional data and, iii) large or small-scale data. Extensive experimental results demonstrated that the proposed SPVL technique outperforms the state-of-the-art counterparts.

Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models

Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with L0 regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.

ST-GAT: A Spatio-Temporal Graph Attention Network for Accurate Traffic Speed Prediction

Spatio-temporal models, which combine GNNs (Graph Neural Networks) and RNNs (Recurrent Neural Networks), have shown state-of-the-art accuracy in traffic speed prediction. However, we find that they consider the spatial and temporal dependencies between speeds separately in the two (i.e., space and time) dimensions, thereby unable to exploit the joint-dependencies of speeds in space and time. In this paper, with the evidence via preliminary analysis, we point out the importance of considering individual dependencies between two speeds from all possible points in space and time for accurate traffic speed prediction. Then, we propose an Individual Spatio-Temporal graph (IST-graph) that represents the Individual Spatio-Temporal dependencies (IST-dependencies) very effectively and a Spatio-Temporal Graph ATtention network (ST-GAT), a novel model to predict the future traffic speeds based on the IST-graph and the attention mechanism. The results from our extensive evaluation with five real-world datasets demonstrate (1) the effectiveness of the IST-graph in modeling traffic speed data, (2) the superiority of ST-GAT over 5 state-of-the-art models (i.e., 2-33% gains) in prediction accuracy, and (3) the robustness of our ST-GAT even in abnormal traffic situations.

A Preliminary Exploration of Extractive Multi-Document Summarization in Hyperbolic Space

Summary matching is a recently proposed paradigm for extractive summarization. It aims to calculate similarities between candidate summaries and their corresponding document and extract summaries by ranking similarities. Due to natural languages often exhibiting the inherent hierarchical structures ingrained with complex syntax and semantics, the latent hierarchical structures between candidate summaries and their corresponding document should be considered when calculating the summary-document similarities. However, the above structural property is hard to model in the Euclidean space. Inspired by the above issues, we explore extractive summarization in the hyperbolic space and propose a new Hyperbolic Siamese Network for the matching-based extractive summarization (HyperSiameseNet). Specifically, HyperSiameseNet projects candidate summaries and their corresponding document representations from the Euclidean space to the Hyperbolic space and then models the summary-document similarities via the squared poincaré distance. Finally, the summary-document similarities are optimized by the margin-based triplet loss for extracting the final summary. The results on the Multi-News dataset have shown the superiority of our model HyperSiameseNet by comparing with the state-of-the-art baselines.

Robust Time Series Dissimilarity Measure for Outlier Detection and Periodicity Detection

Dynamic time warping (DTW) is an effective dissimilarity measure in many time series applications. Despite its popularity, it is prone to noises and outliers, which leads to singularity problem and bias in the measurement. The time complexity of DTW is quadratic to the length of time series, making it inapplicable in real-time applications. In this paper, we propose a novel time series dissimilarity measure named RobustDTW to reduce the effects of noises and outliers. Specifically, the RobustDTW estimates the trend and optimizes the time warp in an alternating manner by utilizing our designed temporal graph trend filtering. To improve efficiency, we propose a multi-level framework that estimates the trend and the warp function at a lower resolution, and then repeatedly refines them at a higher resolution. Based on the proposed RobustDTW, we further extend it to periodicity detection and outlier time series detection. Experiments on real-world datasets demonstrate the superior performance of RobustDTW compared to DTW variants in both outlier time series detection and periodicity detection.

Targeted Influence with Community and Gender-Aware Seeding

When spreading information over social networks, seeding algorithms selecting users to start the dissemination play a crucial role. The majority of existing seeding algorithms focus solely on maximizing the total number of reached nodes, overlooking the issue of group fairness, in particular, gender imbalance. To tackle the challenge of maximizing information spread on certain target groups, e.g., females, we introduce the concept of the community and gender-aware potential of users. We first show that the network's community structure is closely related to the gender distribution. Then, we propose an algorithm that leverages the information about community structure and its gender potential to iteratively modify a seed set such that the information spread on the target group meets the target ratio. Finally, we validate the algorithm by performing experiments on synthetic and real-world datasets. Our results show that the proposed seeding algorithm achieves not only the target ratio but also the highest information spread, compared to the state-of-the-art gender-aware seeding algorithm.

Multi-Aspect Embedding of Dynamic Graphs

Graph embedding is regarded as one of the most advanced techniques for graph data analyses due to its significant performance. However, the majority of existing works only focus on static graphs while ignoring the ubiquitous dynamic graphs. In fact, the temporal evolution of edges in a dynamic graph sets a harsh challenge for the traditional embedding algorithms. To solve the problem, in this paper we propose a Dynamic Graph Multi-Aspect Embedding (DGMAE) to automatically learn the proper number of aspects and their distributions in each temporal duration based on a distance dependent Chinese Restaurant Process. The proposed method can encode the inherent property of varying interactions among nodes along the time and present different aspect-influences to nodes embedding. Our extensive experiments on several public datasets show the performance improvement over state-of-the-art works.

Confidence-Guided Learning Process for Continuous Classification of Time Series

In the real world, the class of a time series is usually labeled at the final time, but many applications require to classify time series at every time point. e.g. the outcome of a critical patient is only determined at the end, but he should be diagnosed at all times for timely treatment. Thus, we propose a new concept: Continuous Classification of Time Series (CCTS). It requires the model to learn data in different time stages. But the time series evolves dynamically, leading to different data distributions. When a model learns multi-distribution, it always forgets or overfits. We suggest that meaningful learning scheduling is potential due to an interesting observation: Measured by confidence, the process of model learning multiple distributions is similar to the process of human learning multiple knowledge. Thus, we propose a novel Confidence-guided method for CCTS (C3TS). It can imitate the alternating human confidence described by the Dunning-Kruger Effect. We define the objective-confidence to arrange data, and the self-confidence to control the learning duration. Experiments on four real-world datasets show that C3TS is more accurate than all baselines for CCTS.

Global and Local Feature Interaction with Vision Transformer for Few-shot Image Classification

Image classification is a classical machine learning task and has been widely used. Due to the high costs of annotation and data collection in real scenarios, few-shot learning has become a vital technique to improve image classification performances. However, most existing few-shot image classification methods only focus on modeling the global image feature or image local patches, which ignore the global-local interactions. In this study, we propose a new method, named GL-ViT, to integrate both global and local features to fully exploit the few-shot samples for image classification. Firstly, we design a feature extractor module to calculate the interactions between the global representation and local patch embeddings, where ViT is also adopted to achieve efficient and effective image representation. Then, Earth Mover's Distance is adopted to measure the similarity between two images. Abundant Experimental results on several widely-used open datasets show that GL-ViT outperforms state-of-the-art algorithms significantly, and our ablation studies also verify the effectiveness of both global-local features.

Improving Downstream Task Performance by Treating Numbers as Entities

Numbers are essential components of text, like any other word tokens, from which natural language processing (NLP) models are built and deployed. Though numbers are typically not accounted for distinctly in most NLP tasks, there is still an underlying amount of numeracy already exhibited by NLP models. For instance, in named entity recognition (NER), numbers are not treated as an entity with distinct tags. In this work, we attempt to tap the potential of state-of-the-art language models and transfer their ability to boost performance in related downstream tasks dealing with numbers. Our proposed classification of numbers into entities helps NLP models perform well on several tasks, including a handcrafted Fill-In-The-Blank (FITB) task and on question answering, using joint embeddings, outperforming the BERT and RoBERTa baseline classification.

ML-1M++: MovieLens-Compatible Additional Preferences for More Robust Offline Evaluation of Sequential Recommenders

Sequential recommendation is the task of predicting the next interacted item of a target user, given his/her past interaction sequence. Conventionally, sequential recommenders are evaluated offline with the last item in each sequence as the sole correct (relevant) label for the testing example of the corresponding user. However, little is known about how this sparsity of preference data affects the robustness of the offline evaluation's outcomes. To help researchers address this, we collect additional preference data via crowdsourcing. Specifically, we propose an assessment interface tailored to the sequential recommendation task and ask crowd workers to assess the (potential) relevance of each candidate item in MovieLens 1M, a commonly used dataset. Toward establishing a more robust evaluation methodology, we release the collected preference data, which we call ML-1M++, as well as the code of the assessment interface.

Leveraging the Graph Structure of Neural Network Training Dynamics

Understanding the training dynamics of deep neural networks (DNNs) is important as it can lead to improved training efficiency and task performance. Recent works have demonstrated that representing the wirings of neurons in feedforward DNNs as graphs is an effective strategy for understanding how architectural choices can affect performance. However, these approaches fail to model training dynamics since a single, static graph cannot capture how DNNs change over the course of training. Thus, in this work, we propose a compact, expressive temporal graph framework that effectively captures the dynamics of many workhorse architectures in computer vision. Specifically, our framework extracts an informative summary of graph properties (e.g., degree, eigenvector centrality) over a sequence of DNN graphs obtained during training. We demonstrate that the proposed framework captures useful dynamics by accurately predicting trained, task performance when using a summary over early training epochs (<5) across four different architectures and two image datasets. Moreover, by using a novel, highly-scalable DNN graph representation, we further demonstrate that the proposed framework captures generalizable dynamics as summaries extracted from smaller-width networks are effective when evaluated on larger widths.

Towards a Learned Cost Model for Distributed Spatial Join: Data, Code & Models

Geospatial data comprise around 60% of all the publicly available data. One of the essential and most complex operations that brings together multiple geospatial datasets is the spatial join operation. Due to its complexity, there is a lot of partitioning techniques and parallel algorithms for the spatial join problem. This leads to a complex query optimization problem: which algorithm to use for a given pair of input datasets that we want to join? With the rise of machine learning, there is a promise in addressing this problem with the use of various learned models. However, one of the concerns is the lack of a standard and publicly available data to train and test on, as well as the lack of accessible baseline models. This resource paper helps the research community to solve this problem by providing synthetic and real datasets for spatial join, source code for constructing more datasets, and several baseline solutions that researchers can further extend and compare to.

Self-supervision Meets Adversarial Perturbation: A Novel Framework for Anomaly Detection

Anomaly detection is a fundamental yet challenging problem in machine learning due to the lack of label information. In this work, we propose a novel and powerful framework, dubbed as SLA2P, for unsupervised anomaly detection. After extracting representative embeddings from raw data, we apply random projections to the features and regard features transformed by different projections as belonging to distinct pseudo-classes. We then train a classifier network on these transformed features to perform self-supervised learning. Next, we add adversarial perturbation to the transformed features to decrease their softmax scores of the predicted labels and design anomaly scores based on the predictive uncertainties of the classifier on these perturbed features. Our motivation is that because of the relatively small number and the decentralized modes of anomalies, 1) the pseudo label classifier's training concentrates more on learning the semantic information of normal data rather than anomalous data; 2) the transformed features of the normal data are more robust to the perturbations than those of the anomalies. Consequently, the perturbed transformed features of anomalies fail to be classified well and accordingly have lower anomaly scores than those of the normal samples. Extensive experiments on image, text, and inherently tabular benchmark datasets back up our findings and indicate that SLA2 achieves state-of-the-art anomaly detection performance consistently. Our code is made publicly available at

Hybrid Transfer in Deep Reinforcement Learning for Ads Allocation

Ads allocation, which involves allocating ads and organic items to limited slots in feed with the purpose of maximizing platform revenue, has become a research hotspot. Notice that, platforms (e.g., e-commerce platforms, video platforms, food delivery platforms and so on) usually have multiple entrances for different categories and some entrances have few visits. Data from these entrances has low coverage, which makes it difficult for the agent to learn. To address this challenge, we propose Similarity-based Hybrid Transfer for Ads Allocation (SHTAA), which effectively transfers samples as well as knowledge from data-rich entrance to data-poor entrance. Specifically, we define an uncertainty-aware similarity for MDP to estimate the similarity of MDP for different entrances. Based on this similarity, we design a hybrid transfer method, including instance transfer and strategy transfer, to efficiently transfer samples and knowledge from one entrance to another. Both offline and online experiments on Meituan food delivery platform demonstrate that the proposed method could achieve better performance for data-poor entrance and increase the revenue for the platform.

MNCM: Multi-level Network Cascades Model for Multi-Task Learning

Recently, multi-task learning based on the deep neural network has been successfully applied in many recommender system scenarios. The prediction quality of current mainstream multi-task models often relies on the extent to which the relationships among tasks are extracted. Much of the prior research work has focused on two important tasks in recommender systems: predicting click-through rate (CTR) and post-click conversion rate (CVR), which rely on sequential user action pattern of impression → click → conversion. Therefore, there exists sequential dependence between CTR and CVR tasks. However, there is no satisfactory solution to explicitly model the sequential dependence among tasks without sacrificing the first task in terms of the design of the model network structure. In this paper, inspired by the Multi-task Network Cascades (MNC) and Adaptive Information Transfer Multi-task (AITM) frameworks, we propose a Multi-level Network Cascades Model (MNCM) based on the pattern of specific and shared experts separation. In MNCM, we introduce two types of information transfer modules: Task-Level Information Transfer Module (TITM) and Expert-Level Information Transfer Module (EITM), which can learn transferred information adaptively from task level and task-specific experts level, respectively, thereby fully capture sequential dependence among tasks. Compared with AITM, MNCM effectively avoids the problem of the first task in a task sequence becoming the sacrificial side of the seesaw phenomenon and contributes to mitigating potential conflicts among tasks. We conduct considerable experiments based on open-source large-scale recommendation datasets. The experimental results demonstrate that MNCM outperforms AITM and the mainstream baseline models in the mixture-experts-bottom pattern and probability-transfer pattern. In addition, we conduct an ablation study on the necessity of introducing two kinds of information transfer modules and verify the effectiveness of this pattern.

Disentangled Contrastive Learning for Social Recommendation

Social recommendations utilize social relations to enhance the representation learning for recommendations. Most social recommendation models unify user representations for the user-item interactions (collaborative domain) and social relations (social domain). However, such an approach may fail to model the users' heterogeneous behavior patterns in two domains, impairing the expressiveness of user representations. In this work, to address such limitation, we propose a novel Disentangled contrastive learning framework for social Recommendations (DcRec). More specifically, we propose to learn disentangled users' representations from the item and social domains. Moreover, disentangled contrastive learning is designed to perform knowledge transfer between disentangled users' representations for social recommendations. Comprehensive experiments on various real-world datasets demonstrate the superiority of our proposed model.

Nonlinear Causal Discovery in Time Series

Recent years have witnessed the proliferation of the Functional Causal Model (FCM) for causal learning due to its intuitive representation and accurate learning results. However, existing FCM-based algorithms suffer from the ubiquitous nonlinear relations in time-series data, mainly because these algorithms either assume linear relationships, or nonlinear relationships with additive noise, or do not introduce additional assumptions but can only identify nonlinear causality between two variables. This paper contributes in particular to a practical FCM-based causal learning approach, which can maintain effectiveness for real-world nonstationary data with general nonlinear relationships and unlimited variable scale.Specifically, the non-stationarity of time series data is first exploited with the nonlinear independent component analysis, to discover the underlying components or latent disturbances. Then, the conditional independence between variables and these components is studied to obtain a relation matrix, which guides the algorithm to recover the underlying causal graph. The correctness of the proposal is theoretically proved, and extensive experiments further verify its effectiveness. To the best of our knowledge, the proposal is the first so far that can fully identify causal relationships under general nonlinear conditions.

HQANN: Efficient and Robust Similarity Search for Hybrid Queries with Structured and Unstructured Constraints

The in-memory approximate nearest neighbor search (ANNS) algorithms have achieved great success for fast high-recall query processing, but are extremely inefficient when handling hybrid queries with unstructured (i.e., feature vectors) and structured (i.e., related attributes) constraints. In this paper, we present HQANN, a simple yet highly efficient hybrid query processing framework which can be easily embedded into existing proximity graph-based ANNS algorithms. We guarantee both low latency and high recall by leveraging navigation sense among attributes and fusing vector similarity search with attribute filtering. Experimental results on both public and in-house datasets demonstrate that HQANN is 10x faster than the state-of-the-art hybrid ANNS solutions to reach the same recall quality and its performance is hardly affected by the complexity of attributes. It can reach 99% recall@10 in just around 50 microseconds On GLOVE-1.2M with thousands of attribute constraints.

Efficiently Answering Minimum Reachable Label Set Queries in Edge-Labeled Graphs

The reachability query is a fundamental problem in graph analysis. Recently, many studies focus on label-constraint reachability queries, which tries to verify whether two vertices are reachable under a given label set. However, in many real-life applications, it is more practical to find the minimum label set required to ensure the reachability of two vertices, which is neglected by previous research. To fill the gap, in this paper, we propose and investigate the minimum reachable label set (MRLS) problem in edge-labeled graphs. Specifically, given an edge-labeled graph and two vertices s, t, the MRLS problem aims to find a label set L with the minimum size such that s can reach t through L. We prove the hardness of our problem, and develop different optimization strategies to improve the scalability of the algorithms. Extensive experiments on 6 datasets demonstrate the advantages of the proposed algorithms.

Balancing Utility and Exposure Fairness for Integrated Ranking with Reinforcement Learning

Integrated ranking is critical in industrial recommendation systems and has attracted increasing attention. In an integrated ranking system, items from multiple channels are merged together and form an integrated list. During this process, apart from optimizing the system's utility like the total number of clicks, a fair allocation of the exposure opportunities over different channels also needs to be satisfied. To address this problem, we propose an integrated ranking model called <u>I</u>ntegrated <u>D</u>eep-<u>Q</u> <u>N</u>etwork (iDQN), which jointly considers user preferences, the platform's utility, and the exposure fairness. Extensive offline experiments validate the effectiveness of iDQN in managing the tradeoff between utility and fairness. Moreover, iDQN also has been deployed onto the online AppStore platform in Huawei, where the online A/B test shows iDQN outperforms the baseline by 1.87% and 2.21% in terms of utility and fairness, respectively.

Multi-granularity Fatigue in Recommendation

Personalized recommendation aims to provide appropriate items according to user preferences mainly from their behaviors. Excessive homogeneous user behaviors on similar items will lead to fatigue, which may decrease user activeness and degrade user experience. However, existing models seldom consider user fatigue in recommender systems. In this work, we propose a novel multi-granularity fatigue, modeling user fatigue from coarse to fine. Specifically, we focus on the recommendation feed scenario, where the underexplored global session fatigue and coarse-grained taxonomy fatigue have large impacts. We conduct extensive analyses to demonstrate the characteristics and influence of different types of fatigues in real-world recommender systems. In experiments, we verify the effectiveness of multi-granularity fatigue in both offline and online evaluations. Currently, the fatigue-enhanced model has also been deployed on a widely-used recommendation system of WeChat.

BidH: A Bidirectional Hierarchical Model for Nested Named Entity Recognition

Nested Name Entity Recognition is to identify the entities with nested relationships from sentences, which has various applications ranging from relation extraction to semantic understanding. However, existing methods have two drawbacks, i.e., 1) error propagation when identifying entities at different nesting levels and 2) unable to uncover and utilize the complex correlations between the inner and outer entities. To address these two defects, we propose a bidirectional hierarchical(BidH) model for nested name entity recognition. BidH consists of a forward module and a backward module, where the former first extracts the inner entities and then extracts the outer ones, while the latter extracts the entities in the opposite direction. Furthermore, we design an entity masked self attention mechanism to combine the two modules by fusing their predictions and hidden states layer by layer. BidH can effectively deal with error propagation and exploit the correlations between entities at different nesting levels to improve the recognition accuracy. Experiments on the GENIA dataset show that BidH outperforms the state-of-the-art nested named entity recognition models in terms of F1 score.

Modeling Latent Autocorrelation for Session-based Recommendation

Session-based Recommendation (SBR) aims to predict the next item for the current session, which consists of several clicked items in a short period by an anonymous user. Most of the sequential modeling approaches to SBR are focusing on adopting advanced Deep Neural Networks (DNNs), and these methods require increasingly longer training times. Existing studies have shown that some traditional SBR methods can outperform some DNN-based sequential models, however, few studies have attempted to investigate the effectiveness of traditional methods in recent years. In this paper, we propose a novel and concise SBR model inspired by the basic concept of autocorrelation in the Stochastic Process. Autocorrelation measures the correlation of a process at different moments. Therefore, it is natural to use it to model the correlation of clicked item sequences at different time shifts. Specifically, we use Fast Fourier Transforms (FFT) to compute the autocorrelation and combine it with several linear transformations to enhance the session representation. By this means, our proposed method can learn better session preferences and is more efficient than most DNN-based models. Extensive experiments on two public datasets show that the proposed method outperforms state-of-the-art models in both effectiveness and efficiency.

Texture BERT for Cross-modal Texture Image Retrieval

We propose Texture BERT, a model describing visual attributes of texture using natural language. To capture the rich details in texture images, we propose a group-wise compact bilinear pooling method, which represents the texture image by a set of visual patterns. The similarity between the texture image and the corresponding language description is determined by the cross-matching between the set of visual patterns from the texture image and the set of word features from the language description. We also exploit the self-attention transformer layers to provide the cross-modal context and enhance the effectiveness of matching. Our efforts achieve state-of-the-art accuracy on both text retrieval and image retrieval tasks, demonstrating the effectiveness of the proposed Texture BERT model in describing texture through natural language.

Visual Encoding and Debiasing for CTR Prediction

Extracting expressive visual features is crucial for accurate Click-Through-Rate (CTR) prediction in visual search advertising systems. Current commercial systems use off-the-shelf visual encoders to facilitate fast online service. However, the extracted visual features are coarse-grained and/or biased. In this paper, we present a visual encoding framework for CTR prediction to overcome these problems. The framework is based on contrastive learning which pulls positive pairs closer and pushes negative pairs apart in the visual feature space. To obtain fine-grained visual features, we present contrastive learning supervised by click-through data to fine-tune the visual encoder. To reduce sample selection bias, firstly we train the visual encoder offline by leveraging both unbiased self-supervision and click supervision signals. Secondly, we incorporate a debiasing network in the online CTR predictor to adjust the visual features by contrasting high impression items with selected, low impression items. We deploy the framework in a mobile E-commerce app. Offline experiments on billion-scale datasets and online experiments demonstrate that the proposed framework can make accurate and unbiased predictions.

Lightweight Unbiased Multi-teacher Ensemble for Review-based Recommendation

Review-based recommender systems (RRS) have received an increasing interest since reviews greatly enhance recommendation quality and interpretability. However, existing RRS suffer from high computational complexity, biased recommendation and poor generalization. The three problems make them inadequate to handle real recommendation scenarios. Previous studies address each issue separately, while none of them consider solving three problems together under a unified framework. This paper presents LUME (a Lightweight Unbiased Multi-teacher Ensemble) for RRS. LUME is a novel framework that addresses the three problems simultaneously. LUME uses multi-teacher ensemble and debiased knowledge distillation to aggregate knowledge from multiple pretrained RRS, and generates a small, unbiased student recommender which generalizes better. Extensive experiments on various real-world benchmarks demonstrate that LUME successfully tackles the three problems and has superior performance than state-of-the-art RRS and knowledge distillation based RS.

A Multi-granularity Network for Emotion-Cause Pair Extraction via Matrix Capsule

The task of Emotion-Cause Pair Extraction (ECPE) aims at extracting the clause pairs with the corresponding causality from the text.Existing approaches emphasize their multi-task settings. We argue that the clause-level encoders are ill-suited to the ECPE task where text information has many granularity features. In this paper, we design a Matrix Capsule-based multi-granularity framework (MaCa) for this task. Specifically, we first introduce a word-level encoder to obtain the token-aware representations. Then, two sentence-level extractors are used to generate emotion prediction and cause prediction. Finally, to obtain more fine-grained features of clause pairs, the matrix capsule is introduced, which can cluster the relationship of each clause pair. The empirical results on the widely used ECPE dataset show that our framework significantly outperforms most current methodsin the Emotion-Cause Extraction (ECE) and the challenging ECPE task.

Task Similarity Aware Meta Learning for Cold-Start Recommendation

In recommender systems, content-based methods and meta-learning involved methods usually have been adopted to alleviate the item cold-start problem. The former consider utilizing item attributes at the feature level and the latter aim at learning a globally shared initialization for all tasks to achieve fast adaptation with limited data at the task level. However, content-based methods only focus on the similarity of item attributes, ignoring the relationships established by user interactions. And for tasks with different distributions, most meta-learning-based methods are difficult to achieve better performance under a single initialization. To address the limitations mentioned above and combine the strengths of both methods, we propose a Task Similarity Aware Meta-Learning (TSAML) framework from two aspects. Specifically, at the feature level, we simultaneously introduce content information and user-item relationships to exploit task similarity. At the task level, we design an automatic soft clustering module to cluster similar tasks and generate the same initialization for similar tasks. Extensive offline experiments demonstrate that the TSAML framework has superior performance and recommends cold items t